-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --scie
option to produce native PEX exes.
#2466
Conversation
You can now specify `--scie {eager,lazy}` when building a PEX file and one or more additional native executable PEX scies will be produced along side the PEX file. These PEX scies will contain a portable CPython interpreter from [Python Standalone Builds][PBS] in the `--scie eager` case and will instead fetch a portable CPython interpreter just in time on first boot on a given machine if needed in the `--scie lazy` case. Although Pex will pick the target platforms and target portable CPython interpreter version automatically, if more control is desired over which platforms are targeted and which Python version is used, then `--scie-platform`, `--scie-pbs-release`, and `--scie-python-version` can be specified. Closes pex-tool#636 Closes pex-tool#1007 Closes pex-tool#2096 [PBS]: https://github.com/indygreg/python-build-standalone
Reviewers - yet another big one. Thanks in advance for any time you can spare. This 1st commit has no tests, those are coming in a bit, but I wanted to get this out in case you wanted to start reading. There has been pretty extensive manual testing, both for perf (see binding command that resulted to bring perf down to |
"args": ["{scie.bindings.configure:PEX}"], | ||
} | ||
], | ||
"bindings": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important for perf. Its pretty bad to package up your "native Python executable" and see it take ~70ms (for cowsay) when a plain --venv --sh-boot
cowsay PEX gets ~20ms. The binding gets any PEX that is scie'd up to --sh-boot
perf levels.
@attr.s(frozen=True) | ||
class ScieConfiguration(object): | ||
@classmethod | ||
def from_tags( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused, but would / will be used by pex3 scie create pre-existing.pex
. The idea is to get the wheel tags for all the distributions in the pre-existing PEX and use those to determine the platforms to target.
Looking to carve off some time this evening to review this, but before I start, would it be safe to say that this is a (strict?) subset of the equivalent functionality when using |
Yes. You'll find a nod to this and a pointer to science docs in the |
This stresses the full matrix of basic cases (no `--scie-*` options).
"scie making for a larger file, but requiring no internet access to boot. If you have " | ||
"customization needs not addressed by the Pex `--scie*` options, consider using " | ||
"`science` to build your scies (which is what Pex uses behind the scenes); see: " | ||
"https://science.scie.app.".format(lazy=ScieStyle.LAZY, eager=ScieStyle.EAGER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sureshjoshi I see that your Pants plugin accepts an optional custom lift manifest, parses it if present, then injects bits into it. I think to support that sort of thing in a principled way, I'd have to parse the user supplied manifest and confirm they do not set the following keys:
- ptex
- scie_jump
- files: with matching names
- interpreters or interpreter_groups: with matching ids
- commands: with a default command (I use this to launch the PEX)
- bindings: with a matching name (needed for the default command to work)
Additionally, I'd have to advertise that I bind ptex to "ptex" for lazy scies, and always bind configure:PYTHON
and configure:PEX
.
Without all this I don't see how the user supplied manifest can work with Pex needs fruitfully. Can you think of any other corners? Perhaps I'm overthinking. Do you need this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for ptex
and scie_jump
I could allow user-specified versions (but no more) IFF those versions were compatible with a lower bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hacking around tonight, trying to envision how I'd re-build something like pantsible
(for example).
One idea was to manipulate the embedded manifest after pex
generates it (add the custom bindings and whatnot by piping the file to another tool), but then I realized I don't think I'd want to be able to dynamically modify the manifest of what should be a "sealed" binary, as that would be crazy for supply chain purposes - and I don't want to be able to dynamically alter the commands the executable could call.
In the case of the plugin (which, I wouldn't really use as a reference for anything - as I made it a few years ago to solve an immediate deployment problem on a client project), I think we try to use the optional lift.toml where possible and inject the target names under certain conditions.
For this PR, I don't see any problems with deferring all of those concerns, but I'm of two minds.
pex
being able to accept a custom manifest template that has to be perfectly structured, with/without certain keys feels a bit hacky- Using a separate tool (
science
), which overlaps with a lot of whatpex
would provide, feels off too
Would it make sense/be possible for science
to defer to pex
in some way, for the embedded Python interpreter? I'm trying to envision some sort of cleaner composition between two tools which have similar base functionality - but science
allows some added knobs.
[lift]
name = "pantsible"
description = "Ansible with an embedded Python interpreter."
platforms = ... inferred from pex ...
[[lift.interpreters]] -> ... inferred from pex ...
[[lift.files]]
name = "pex"
[[lift.commands]]
name = "ansible"
exe = "{scie.bindings.venv}/venv/bin/ansible"
args = []
...
Although, one immediate problem I see here... I think I'm conflating a pex
file with the pex
CLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through the PR, another thought that popped into my head is allowing for the pex
CLI's generated TOML to act as an overlay or merge-manifest with a local one.
Whether that functionality is in science
or pex
CLI - overlaying/overwriting the user created manifest seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hacking around tonight, trying to envision how I'd re-build something like pantsible (for example).
Well, pantsible
uses a feature specific to scies over and above a PEX, namely the BusyBox support. It makes sense to me to just directly support this with --scie-busybox [list of entry points]
. If you specify that then Pex emits a manifest with no default command and just named commands for each listed entry point.
Would it make sense/be possible for science to defer to pex in some way, for the embedded Python interpreter? I'm trying to envision some sort of cleaner composition between two tools which have similar base functionality - but science allows some added knobs.
Well, science is general purpose - Any language; so it doesn't really make sense for it to know about Python let alone Pex. It does have a Provider interface to supply interpreters and that has exactly 1 implementation currently, that provides PBS interpreters. A PEX provider might make sense.
That said, Pex creates PEXes - single file executables. These do not have:
- BusyBox support: You need conscript, for example, for that.
- Bindings support: I.E.: Pex offers you no way to do pre-launch setup. You just have to write Python code to do 1 time setup in your main if you want that or provide alternate entry points fired off with
{PEX_MODULE=foo,PEX_SCRIPT=bar} ./my.pex
As such, I think it makes sense for Pex to offer the ability to take your PEX file and turn it into a scie that behaves exactly the same, with nothing extra except maybe running faster. Everything you'd do in a custom manifest, afaict, would add things the PEX cannot already do. At that point, having to move up a layer and use science yourself with a custom lift manifest to build your app not using Pex directly makes sense. I.E.: what scie-pants has to do. The Pants app is more complex than just what the Pants PEX does / has tight perf overhead concerns; so it makes sense to move up to the higher layer.
Reading through the PR, another thought that popped into my head is allowing for the pex CLI's generated TOML to act as an overlay or merge-manifest with a local one.
That's exactly what I meant by all this: #2466 (comment) It seems to me you can't just overlay, you must confirm the key mechanisms Pex uses in its lift are not destroyed by the merge before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to downstream tools like science
, not pex, in this case. As in "once you've created a pex, then ..."
Anyways, the things I have in my mind are probably out of scope of this PR, and if they're important enough, or strongly enough use-cased, I can open a new ticket later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. So I think the PEX interpreter Provider would just use the pex3 scie create ...
logic I referenced here: #2466 (comment)
I.E.: not create the scie, but use the ScieConfiguration.from_tags
API + a given PEX file to source the tags to implement platform / interpreter selection via the calculated ScieConfiguration
's ScieTarget
targets which include platform, pbs_release and python_version.
That said, the current science Provider interface only allows providing an interpreter and not a set of platforms; so new API work would need to be done in science anyhow it seems to plug all this in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the current API does allow enough for a PEX interpreter Provider to error when asked to produce an interpreter distribution via Provider.distribution(platform)
for a platform the PEX does not support. That's probably actually enough:
[lift]
name = "example"
platforms = [
"linux-aarch64",
"linux-x86_64",
"macos-aarch64",
]
[[lift.files]]
name = "pex"
[[lift.interpreters]]
id = "cpython"
provider = "PEX"
pex = "{pex}"
Here if I ran science lift --file pex=my-py37.pex build ...
the PEX interpreter Provider could fail since CPython 3.7 is not supported and if I ran science lift --file pex=my-py38.pex build ...
it could fail fast if, for example, there were no 3.8 linux-aarch64 distributions in the latest PBS release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, there we go - that's the kinda thing I see value in. One less place where head scratching can take place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restricting use case for --scie-platform
I mentioned now has a test in 61f55a4 as does auto platforms detection.
@@ -39,6 +42,7 @@ def register_options(parser): | |||
|
|||
parser.add_argument( | |||
"--scie", | |||
"--par", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the issues attached to this PR prove, people in the world know "PAR"; so it seems to make sense to add a --par
alias to this one option for discoverability by those people. If they need more than the default --par
treatment, then they really must learn about scies and --scie-*
advanced options anyhow.
Will try and make some time to review this in a few hours. But very cool feature! |
OK, CI is now down to erroring on the Linux runners having ... and again the face-palm was mine own. This was an issue in the |
Exercise platform / interpreter auto detection as well as explicit restriction via `--scie-platform`. Also open up control via `PEX_*` env vars: there was no need to mask thse for proper scie operation and the end result is a scie that works exactly like the PEX it was built from.
Alright reviewers, the tests are now complete. Good for a final review. @sureshjoshi I'm happy to break off a feature request for either or both of the |
Yep, after this lands, I can play around with it a bit more and see where it leads to. In the meantime, I want to confirm that this is the expected behaviour. # foo.py
import uvicorn
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
return {"message": "Hello World"}
if __name__ == "__main__":
uvicorn.run(app, host="localhost", port=8000)
In my example, as the pex was built with python3.12, the pex shebang is Based on the comment in the thread above:
% python3.11 ./foo.pex
Python 3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> The current behaviour matches what would happen if I just ran the pex with python3.11, so everything seems to line up and I'm just confirming my understanding of the feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me after the back and forth. Would still be good to get eyeballs from someone more familiar with pex
itself than me.
My familiarity was more with the scie
side of things.
AFAICT that is currently basically no one except me. |
😆 Good point |
I don't have time to review this but i do have a question. If one needs to customize the binaries, they would need to use |
Well ... you did a super weird thing too though. What do you think you meant by the trailing All that weird aside, what actually happened here is this: You build a platform specific PEX for Python 3.12, but instead of letting Pex use that to configure a 3.12 PBS, you overrode that and said 3.11 is fine - which it's not. When the boot binding runs, Pex is smart enough to test the current PBS 3.11 interpreter, find it can't load the PEX, then continue on to try other Pythons on the PATH. It finds a python3.12, which works to load the PEX and then writes out these bindings on my machine:
So, as for I guess I probably should blank out :; git diff pex/scie/science.py
diff --git a/pex/scie/science.py b/pex/scie/science.py
index 61f9f7ac..50935894 100644
--- a/pex/scie/science.py
+++ b/pex/scie/science.py
@@ -114,6 +114,7 @@ def create_manifests(
{
"env": {
"default": env_default,
+ "remove_exact": ["PATH"],
"remove_re": ["PEX_.*"],
"replace": {
"PEX_INTERPRETER": "1",
:; git diff pex/pex_bootstrapper.py
diff --git a/pex/pex_bootstrapper.py b/pex/pex_bootstrapper.py
index a097736f..e3609efc 100644
--- a/pex/pex_bootstrapper.py
+++ b/pex/pex_bootstrapper.py
@@ -314,7 +314,7 @@ def find_compatible_interpreter(interpreter_test=None):
path=(
os.pathsep.join(ENV.PEX_PYTHON_PATH)
if ENV.PEX_PYTHON_PATH
- else os.getenv("PATH")
+ else os.getenv("PATH", "(The PATH is empty!)")
)
)
) Gives:
What do you think @sureshjoshi? Keep it behaving just like the PEX and bouncing down the PATH to find an interpreter that works (this means we shipped the wrong Python but the target machine had the right one), or keep things hermetic and fail as my experiment above does? FWIW, I debugged all this with 2 techniques:
|
@zmanji in short, probably yes. You could use As per my debug session above of @sureshjoshi's test rig case, you can also just use Pex to build your scie, then split it into its components with |
On being hermetic, I will just say that pex's strength is being hermetic out of the box with flags to disable that if needed. I think a pex built with this feature should strip the PATH by default. |
I like it! Even though this breaks the "PEX scie works just like the PEX" ~guaranty, it breaks the one part about a PEX this fixes, which is sealing in the interpreter. The only reason the PEX needs to bounce around to find a compatible Python if there even is one, is because of that 1 glaring bit of non-hermiticity in traditional PEXes. |
In my case, I wasn't trying to generate an exe or script - I was just trying to make a packaged repl with fastapi, uvicorn, and my foo.py (which seemed to work, as far as I could tell). I grabbed that example from something I was doing a couple of weeks ago on one of my many weird side-tangents. I'm sure there's a better way, but it worked one time I tried it, and I just ran with it since it's just a scratchpad.
Alright, yeah, my behavioural expectation test was presuming the goal was: "PEX scie works just like the PEX" - which it does. BUT, having said that, I think being hermetic is preferable. Building with and bundling different interpreters is an easy blunder to make, and the last place you want to find that error is after deployment. |
@sureshjoshi it did not. The foo.py was not included. I think you are confused by how Pex works when you don't specify |
🤦🏽 It was just loading the local foo.py all along. Whelp, at least my pain and suffering led to a hermetic scie. |
@sureshjoshi yes. Thanks for that though - as you said, everything is better as a result - except perhaps your sanity. So, people seem to never Hopefully very (power?) user friendly:
|
Yeah, I unzipped and grepped, but I had the file referenced otherwise - so it showed up in my grep, but it was just a filename, not the file itself. As I said, very weird tangents I was messing around with 🤦🏽 |
@benjyw I'm headed to the hills for a bit; so I'm going to proceed to merge this and get out a release. I feel good about the current commitments, but I'll circle back if you spot bugs or have questions. |
Sounds fine, I'll take a look ASAP - I'm on vacation in Europe so code reviews are backing up. |
You can now specify
--scie {eager,lazy}
when building a PEX file andone or more additional native executable PEX scies will be produced
along side the PEX file. These PEX scies will contain a portable CPython
interpreter from Python Standalone Builds in the
--scie eager
case and will instead fetch a portable CPython interpreter just in time
on first boot on a given machine if needed in the
--scie lazy
case.Although Pex will pick the target platforms and target portable CPython
interpreter version automatically, if more control is desired over which
platforms are targeted and which Python version is used, then
--scie-platform
,--scie-pbs-release
, and--scie-python-version
canbe specified.
Closes #636
Closes #1007
Closes #2096