Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate pex as a deployment option for streamparse #212

Closed
amontalenti opened this issue Dec 17, 2015 · 9 comments
Closed

Evaluate pex as a deployment option for streamparse #212

amontalenti opened this issue Dec 17, 2015 · 9 comments

Comments

@amontalenti
Copy link
Contributor

Suggested by one of our users, it might be nice to package Python environments in pex files, which would include code and dependencies. This could be a mechanism of eliminating the need to ever use virtualenv or fabric. I don't know what the other implications are of using pex, but someone could certainly explore. I don't think it would be too hard to get it working even with a current version of streamparse, based on the rough description in the docs.

There seem to be two options.

  1. Create a .pex file out of dependencies and include it in the topology JAR with a standard name like topology.pex. Rather than calling /virtualenv/topology-venv/bin/python -m streamparse.run <class_name> to run a component, we actually call topology.pex -m streamparse.run <class_name>. Since pex supports -m similar to a Python interpreter, this should "just work". This seems like the preferred option -- my only concern here is a "platform build mismatch" issue, e.g. if a dependency is a C extension module that needs to be built for the target platform rather than the development platform, building the .pex file locally may not produce the right thing (?). This might not matter as much if a bdist exists for that module and pex's --platform argument is used.
  2. We build the .pex environment "dynamically" upon topology startup, using the pex CLI tool. That is, rather than bundling topology.pex inside the JAR, we actually bundle the dependency list in requirements.txt format. We then make the topology entrypoint pex -r requirements.txt -m streamparse.run. This ensures that the environment is built on the remote server upon topology startup; the main downside is that this command will probably take a long time to run the first time (before a pip cache kicks in?) and I'm not entirely sure how friendly Storm will be to that. I wonder if invoking it once upon topology submit via Fabric could be a trick to warm up the pip cache while also catching requirement specification errors at submit-time rather than topology run-time.

Whichever option we pick, it seems like it could offer some improvements to the virtualenv approach, but I haven't dug into pex and tested it out too much yet.

@dan-blanchard
Copy link
Member

Completely agree on this one. Oh, and by the way, sparse submit works for the Python DSL on my branch now. I'm just adding more unit tests before I merge.

@amontalenti
Copy link
Contributor Author

@dan-blanchard nice! 💯

@amontalenti
Copy link
Contributor Author

This 20-minute talk on YouTube (done 100% at command-line) gives a nice overview of pex by a software engineer at Twitter. It makes it really clear to me that this is very do-able as a solution to this problem for streamparse.

https://www.youtube.com/watch?v=NmpnGhRwsu0

I added info to my issue body based on this talk.

@dan-blanchard
Copy link
Member

I haven't had a chance to watch the video yet, but I definitely lean more toward solution one, with the exception that I think there should be one .pex file per Python version so that people could easily have some components use Python 2 and others use 3 (or pypy).

On Dec 26, 2015, at 3:40 PM, Andrew Montalenti [email protected] wrote:

This 20-minute talk on YouTube (done 100% at command-line) gives a nice overview of pex by a software engineer at Twitter. It makes it really clear to me that this is very do-able as a solution to this problem for streamparse.

https://www.youtube.com/watch?v=NmpnGhRwsu0

There seem to be two options.

Create a .pex file out of dependencies and include it in the topology JAR with a standard name like topology.pex. Rather than calling /virtualenv/topology-venv/bin/python -m streamparse.run <class_name> to run a component, we actually call topology.pex -m streamparse.run <class_name>. Since pex supports -m similar to a Python interpreter, this should "just work". This seems like the preferred option -- my only concern here is a "platform build mismatch" issue, e.g. if a dependency is a C extension module that needs to be built for the target platform rather than the development platform, building the .pex file locally may not produce the right thing (?). This might not matter as much if a bdist exists for that module and pex's --platform argument is used.

We build the .pex environment "dynamically" upon topology startup, using the pex CLI tool. That is, rather than bundling topology.pex inside the JAR, we actually bundle the dependency list in requirements.txt format. We then make the topology entrypoint pex -r requirements.txt -m streamparse.run. This ensures that the environment is built on the remote server upon topology startup; the main downside is that this command will probably take a long time to run the first time (before a pip cache kicks in?) and I'm not entirely sure how friendly Storm will be to that. I wonder if invoking it once upon topology submit via Fabric could be a trick to warm up the pip cache while also catching requirement specification errors at submit-time rather than topology run-time.

Whichever option we pick, it seems like it could offer some improvements to the venv approach, but I haven't dug into pex and tested it out too much yet.


Reply to this email directly or view it on GitHub.

@msukmanowsky
Copy link
Contributor

Just discovered that pex does not support editable requirements (-e git+ssh...) so this may be a no go for streamparse as it'd probably be unfair to assume that all users can support their own pypi mirror to avoid the need for editable requirements.

@dan-blanchard
Copy link
Member

This comment has a workaround. Basically we'd need to add a step to our process where we cloned all the projects separately first, because pex is fine with just chucking package directories in there.

@kwlzn
Copy link

kwlzn commented Mar 15, 2016

it's worth noting that pex supports requirement resolution against loose Apache-style indexes (http:// stores) as well as directories (file:// stores) using --find-links. so in lieu of a full blown pypi index, your users could simply roll development sdist/bdists into an arbitrary directory or HTTP storage and resolve against that to compose a pex for testing.

e.g.:

[illuminati example]$ mkdir test_dists && pip wheel --no-cache-dir --wheel-dir=./test_dists/ requests
Collecting requests
  Downloading requests-2.9.1-py2.py3-none-any.whl (501kB)
    100% |████████████████████████████████| 501kB 12.7MB/s 
  Saved ./test_dists/requests-2.9.1-py2.py3-none-any.whl
Skipping requests, due to already being wheel.

[illuminati example]$ pex --find-links=./test_dists --no-pypi requests -v -v
:: Resolving distributions :: Translating /private/var/folders/3t/xkwqrkld4xxgklk2s4n41jb80000gn/T/tmpqjKDuq/requests-2.9.1-py2.py3-none-any.whl into distribution                                      
pex: Building pex: 64.3ms                                                                                                                                                           
pex:   Resolving interpreter: 0.7ms
pex:     Setting up interpreter /Users/kwilson/Python/CPython-2.7.11/bin/python2.7: 0.7ms
pex:   Resolving distributions: 19.6ms
pex:     Fetching file:///private/tmp/example/test_dists/requests-2.9.1-py2.py3-none-any.whl: 1.6ms
pex:       Fetching file:///private/tmp/example/test_dists/requests-2.9.1-py2.py3-none-any.whl: 1.2ms
pex:     Translating /private/var/folders/3t/xkwqrkld4xxgklk2s4n41jb80000gn/T/tmpqjKDuq/requests-2.9.1-py2.py3-none-any.whl into distribution: 4.0ms
Running PEX file at /var/folders/3t/xkwqrkld4xxgklk2s4n41jb80000gn/T/tmp7pKeHf with args []
pex: PEX.run invoking /Users/kwilson/Python/CPython-2.7.11/bin/python2.7 /var/folders/3t/xkwqrkld4xxgklk2s4n41jb80000gn/T/tmp7pKeHf
Python 2.7.11 (default, Dec 16 2015, 14:09:45) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import requests
>>>

happy to discuss further or help answer any questions you guys might have about pex.

@dan-blanchard
Copy link
Member

Thanks for the info @kwlzn!

@dan-blanchard
Copy link
Member

Closing in favor of #445.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants