Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds -m option to saved .pex file interpreter emulation #418

Closed
wants to merge 1 commit into from

Conversation

markyen
Copy link

@markyen markyen commented Sep 27, 2017

This allows you to invoke a module from a saved PEX file with app.pex -m <module>, in addition to invoking a script with app.pex <file> and a REPL with app.pex, as could already be done previously.

This makes a saved PEX file more suitable as a basic replacement for the python binary on platforms that allow you to do that but that don't have good support for Python dependency management themselves, such as PySpark.

Copy link
Contributor

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR, this looks reasonable to me w/ one comment.

would also like to see an integration test exercising the new feature before landing.

pex/pex.py Outdated
import code
code.interact()
elif sys.argv[1] == '-m':
mod = sys.argv[2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this unchecked access of index position 2 would cause an unhandled IndexError for the case of ./a.pex -m.

@lorencarvalho
Copy link
Contributor

isn't this what PEX_MODULE=foo.bar:main and PEX_INTERPRETER=1 already do?

This allows you to invoke a module from a saved PEX file with
`app.pex -m <module>`, in addition to invoking a script with
`app.pex <file>` and a REPL with `app.pex`, as could already be
done previously.

This makes a saved PEX file more suitable as a basic replacement
for the `python` binary on platforms that allow you to do that but
that don't have good support for Python dependency management
themselves, such as PySpark.
@markyen
Copy link
Author

markyen commented Oct 2, 2017

Added an extra check to avoid possibleIndexErrors and added integration tests.

It's true that this new flag does something you could already accomplish by setting PEX_MODULE: instead of running ./app.pex -m <module>, you could run PEX_MODULE=<module> ./app.pex and get the same result.

But my use case involves trying to use the built PEX file as the Python binary with a framework that doesn't know about PEX_MODULE; this framework just takes a configurable Python binary tries to invoke it with $PYTHON <script> or $PYTHON -m <module>. I'd like to be able to point $PYTHON at a built PEX file.

@lorencarvalho
Copy link
Contributor

it might be worth noting that the .pexrc feature can inject environment variables for you if your environment doesn't support that. Placing a .pexrc file in the same CWD (or in $HOME) the pex file will read it.

Another way of achieving what you want without modifying pex source is to simply have a wrapper script that handles this sys.argv inspection and then runs an os.execve with environment variables set. Or even just a bash script, here's an example of one a tool I use builds:

#!/bin/bash

exec /usr/bin/env PEX_ROOT="$BASEDIR/libexec" PEX_MODULE="apollo:main" $BASEDIR/bin/server.pex "$@"

It should also be noted that you can use pex files as the shebang of a script a la #!/usr/bin/env yourpex.pex.

Not trying to discourage this change, just offering potential solutions that might work w/o cutting a new feature.

@kwlzn
Copy link
Contributor

kwlzn commented Oct 2, 2017

my understanding is that the -m <module> mode is specifically to smooth pex usage in PySpark. it's not the first time I've heard this request and since the change is limited to the default entrypoint handler it seems perfectly fine to me.

@markyen
Copy link
Author

markyen commented Oct 2, 2017

Indeed, in order to get pex working with PySpark now, I'm doing something very similar to what @sixninetynine suggested: with every PySpark job we write, I include a bash script that contains ./app.pex interpreter.py $@, and an interpreter.py that contains a copy of the pex.execute_interpreter function with the added ability of handling the -m flag, just as it appears in this PR.

This has been working fine for us for the last couple months, but as the number of PySpark jobs grows, so too does the number of places where this set of shim scripts needs to be duplicated, and shipped along with every run of every job.

I feel it would be cleaner to move the functionality into the upstream interpreter logic, but of course I'll leave the final design decision to you!

Copy link
Contributor

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding tests - one last comment and we should be good to merge.

code.interact()
elif len(sys.argv) > 2 and sys.argv[1] == '-m':
mod = sys.argv[2]
sys.argv = sys.argv[2:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.argv should not be modified here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting that sys.argv is modified below in the existing code too.. but I don't think that's necessary either. mind yanking it too while you're in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually.. scratch that.. I see why the sys.argv modification is needed now (sorry about the thrash).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure what the reason for sys.argv modification was, so I was just trying to match the behavior below. If a different modification is needed here, let me know and I can make that change.

@markyen
Copy link
Author

markyen commented Jan 5, 2018

@kwlzn If you let me know what the desired behavior is for modifying sys.argv, I'm happy to make a change to match it.

@markyen
Copy link
Author

markyen commented Feb 12, 2018

If anything else needs to be changed before this can be merged, please let me know and I'll be happy to change it.

@markyen
Copy link
Author

markyen commented Mar 12, 2018

@kwlzn Any thoughts on what further changes are needed?

@markyen
Copy link
Author

markyen commented Nov 9, 2020

Superseded by #563

@markyen markyen closed this Nov 9, 2020
@markyen markyen deleted the module branch November 9, 2020 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants