Evaluate pex as a deployment option for streamparse #212

amontalenti · 2015-12-17T21:09:40Z

Suggested by one of our users, it might be nice to package Python environments in pex files, which would include code and dependencies. This could be a mechanism of eliminating the need to ever use virtualenv or fabric. I don't know what the other implications are of using pex, but someone could certainly explore. I don't think it would be too hard to get it working even with a current version of streamparse, based on the rough description in the docs.

There seem to be two options.

Create a .pex file out of dependencies and include it in the topology JAR with a standard name like topology.pex. Rather than calling /virtualenv/topology-venv/bin/python -m streamparse.run <class_name> to run a component, we actually call topology.pex -m streamparse.run <class_name>. Since pex supports -m similar to a Python interpreter, this should "just work". This seems like the preferred option -- my only concern here is a "platform build mismatch" issue, e.g. if a dependency is a C extension module that needs to be built for the target platform rather than the development platform, building the .pex file locally may not produce the right thing (?). This might not matter as much if a bdist exists for that module and pex's --platform argument is used.
We build the .pex environment "dynamically" upon topology startup, using the pex CLI tool. That is, rather than bundling topology.pex inside the JAR, we actually bundle the dependency list in requirements.txt format. We then make the topology entrypoint pex -r requirements.txt -m streamparse.run. This ensures that the environment is built on the remote server upon topology startup; the main downside is that this command will probably take a long time to run the first time (before a pip cache kicks in?) and I'm not entirely sure how friendly Storm will be to that. I wonder if invoking it once upon topology submit via Fabric could be a trick to warm up the pip cache while also catching requirement specification errors at submit-time rather than topology run-time.

Whichever option we pick, it seems like it could offer some improvements to the virtualenv approach, but I haven't dug into pex and tested it out too much yet.

The text was updated successfully, but these errors were encountered:

dan-blanchard · 2015-12-17T21:11:28Z

Completely agree on this one. Oh, and by the way, sparse submit works for the Python DSL on my branch now. I'm just adding more unit tests before I merge.

amontalenti · 2015-12-17T21:19:03Z

@dan-blanchard nice! 💯

amontalenti · 2015-12-26T20:40:52Z

This 20-minute talk on YouTube (done 100% at command-line) gives a nice overview of pex by a software engineer at Twitter. It makes it really clear to me that this is very do-able as a solution to this problem for streamparse.

https://www.youtube.com/watch?v=NmpnGhRwsu0

I added info to my issue body based on this talk.

dan-blanchard · 2015-12-27T03:16:15Z

I haven't had a chance to watch the video yet, but I definitely lean more toward solution one, with the exception that I think there should be one .pex file per Python version so that people could easily have some components use Python 2 and others use 3 (or pypy).

On Dec 26, 2015, at 3:40 PM, Andrew Montalenti [email protected] wrote:

This 20-minute talk on YouTube (done 100% at command-line) gives a nice overview of pex by a software engineer at Twitter. It makes it really clear to me that this is very do-able as a solution to this problem for streamparse.

https://www.youtube.com/watch?v=NmpnGhRwsu0

There seem to be two options.

Create a .pex file out of dependencies and include it in the topology JAR with a standard name like topology.pex. Rather than calling /virtualenv/topology-venv/bin/python -m streamparse.run <class_name> to run a component, we actually call topology.pex -m streamparse.run <class_name>. Since pex supports -m similar to a Python interpreter, this should "just work". This seems like the preferred option -- my only concern here is a "platform build mismatch" issue, e.g. if a dependency is a C extension module that needs to be built for the target platform rather than the development platform, building the .pex file locally may not produce the right thing (?). This might not matter as much if a bdist exists for that module and pex's --platform argument is used.

We build the .pex environment "dynamically" upon topology startup, using the pex CLI tool. That is, rather than bundling topology.pex inside the JAR, we actually bundle the dependency list in requirements.txt format. We then make the topology entrypoint pex -r requirements.txt -m streamparse.run. This ensures that the environment is built on the remote server upon topology startup; the main downside is that this command will probably take a long time to run the first time (before a pip cache kicks in?) and I'm not entirely sure how friendly Storm will be to that. I wonder if invoking it once upon topology submit via Fabric could be a trick to warm up the pip cache while also catching requirement specification errors at submit-time rather than topology run-time.

Whichever option we pick, it seems like it could offer some improvements to the venv approach, but I haven't dug into pex and tested it out too much yet.

—
Reply to this email directly or view it on GitHub.

msukmanowsky · 2016-03-02T03:56:43Z

Just discovered that pex does not support editable requirements (-e git+ssh...) so this may be a no go for streamparse as it'd probably be unfair to assume that all users can support their own pypi mirror to avoid the need for editable requirements.

dan-blanchard · 2016-03-02T14:21:08Z

This comment has a workaround. Basically we'd need to add a step to our process where we cloned all the projects separately first, because pex is fine with just chucking package directories in there.

kwlzn · 2016-03-15T00:17:14Z

it's worth noting that pex supports requirement resolution against loose Apache-style indexes (http:// stores) as well as directories (file:// stores) using --find-links. so in lieu of a full blown pypi index, your users could simply roll development sdist/bdists into an arbitrary directory or HTTP storage and resolve against that to compose a pex for testing.

e.g.:

[illuminati example]$ mkdir test_dists && pip wheel --no-cache-dir --wheel-dir=./test_dists/ requests
Collecting requests
  Downloading requests-2.9.1-py2.py3-none-any.whl (501kB)
    100% |████████████████████████████████| 501kB 12.7MB/s 
  Saved ./test_dists/requests-2.9.1-py2.py3-none-any.whl
Skipping requests, due to already being wheel.

[illuminati example]$ pex --find-links=./test_dists --no-pypi requests -v -v
:: Resolving distributions :: Translating /private/var/folders/3t/xkwqrkld4xxgklk2s4n41jb80000gn/T/tmpqjKDuq/requests-2.9.1-py2.py3-none-any.whl into distribution                                      
pex: Building pex: 64.3ms                                                                                                                                                           
pex:   Resolving interpreter: 0.7ms
pex:     Setting up interpreter /Users/kwilson/Python/CPython-2.7.11/bin/python2.7: 0.7ms
pex:   Resolving distributions: 19.6ms
pex:     Fetching file:///private/tmp/example/test_dists/requests-2.9.1-py2.py3-none-any.whl: 1.6ms
pex:       Fetching file:///private/tmp/example/test_dists/requests-2.9.1-py2.py3-none-any.whl: 1.2ms
pex:     Translating /private/var/folders/3t/xkwqrkld4xxgklk2s4n41jb80000gn/T/tmpqjKDuq/requests-2.9.1-py2.py3-none-any.whl into distribution: 4.0ms
Running PEX file at /var/folders/3t/xkwqrkld4xxgklk2s4n41jb80000gn/T/tmp7pKeHf with args []
pex: PEX.run invoking /Users/kwilson/Python/CPython-2.7.11/bin/python2.7 /var/folders/3t/xkwqrkld4xxgklk2s4n41jb80000gn/T/tmp7pKeHf
Python 2.7.11 (default, Dec 16 2015, 14:09:45) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import requests
>>>

happy to discuss further or help answer any questions you guys might have about pex.

dan-blanchard · 2016-03-15T18:59:36Z

Thanks for the info @kwlzn!

dan-blanchard · 2018-08-16T13:36:41Z

Closing in favor of #445.

amontalenti added enhancement help wanted labels Dec 17, 2015

dan-blanchard mentioned this issue Dec 21, 2015

Support selection of Python implementation on an individual component. #183

Open

dan-blanchard mentioned this issue Aug 9, 2016

Passing custom command to ShellSpoutSpec #299

Closed

dan-blanchard mentioned this issue Jul 10, 2017

JARs should be self-contained and not rely on external virtualenvs [or use Storm hooks and get rid of SSH] #99

Open

dan-blanchard mentioned this issue Aug 16, 2018

Consider using shiv instead of virtualenvs for deployment #445

Open

dan-blanchard closed this as completed Aug 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate pex as a deployment option for streamparse #212

Evaluate pex as a deployment option for streamparse #212

amontalenti commented Dec 17, 2015

dan-blanchard commented Dec 17, 2015

amontalenti commented Dec 17, 2015

amontalenti commented Dec 26, 2015

dan-blanchard commented Dec 27, 2015

msukmanowsky commented Mar 2, 2016

dan-blanchard commented Mar 2, 2016

kwlzn commented Mar 15, 2016

dan-blanchard commented Mar 15, 2016

dan-blanchard commented Aug 16, 2018

Evaluate pex as a deployment option for streamparse #212

Evaluate pex as a deployment option for streamparse #212

Comments

amontalenti commented Dec 17, 2015

dan-blanchard commented Dec 17, 2015

amontalenti commented Dec 17, 2015

amontalenti commented Dec 26, 2015

dan-blanchard commented Dec 27, 2015

msukmanowsky commented Mar 2, 2016

dan-blanchard commented Mar 2, 2016

kwlzn commented Mar 15, 2016

dan-blanchard commented Mar 15, 2016

dan-blanchard commented Aug 16, 2018