Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for --scie-busybox. #2468

Merged
merged 16 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions build-backend/pex_build/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@

import os

# When running under MyPy, this will be set to True for us automatically; so we can use it as a typing module import
# guard to protect Python 2 imports of typing - which is not normally available in Python 2.
# When running under MyPy, this will be set to True for us automatically; so we can use it as a
# typing module import guard to protect Python 2 imports of typing - which is not normally available
# in Python 2.
TYPE_CHECKING = False


Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,6 @@ Guide:

whatispex
buildingpex
scie
recipes
api/vars
310 changes: 310 additions & 0 deletions docs/scie.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,310 @@
# PEX with included Python interpreter

You can include a CPython interpreter in your PEX by adding `--scie eager` to your `pex` command
line. Instead of a traditional [PEP-441](https://peps.python.org/pep-0441/) PEX zip file, you'll
get a native executable that contains both a CPython interpreter and your PEX'd code.

## Background

Traditional PEX files allow you to build and ship a hermetic Python application environment to other
machines by just copying the PEX file there. There is a major caveat though: the machine must have a
Python interpreter installed and on the `PATH` that is compatible with the application for the PEX
to be able to run. Complicating things further, when executing the PEX file directly (e.g.:
`./my.pex`), the PEX's shebang must align with the names of Python binaries installed on the
machine. If the shebang is looking for `python` but the machine only has `python3` - even if the
underlying Python interpreter would be compatible - the operating system will fail to launch the PEX
file. This usually can be mitigated by using `--sh-boot` to alter the boot mechanism from Python to
a Posix-compatible shell at `/bin/sh`. Although almost all Posix-compatible systems have a
`/bin/sh` shell, that still leaves the problem of having a compatible Python pre-installed on that
system as well.

When you add the `--scie eager` option to your `pex` command line, Pex uses the [science](
https://science.scie.app/) [projects](https://github.com/a-scie/) to produce what is known as a
`scie` (pronounced like "ski") binary powered by the [Python Standalone Builds](
https://github.com/indygreg/python-build-standalone) CPython distributions. The end product looks
and behaves like a traditional PEX except in two aspects:
+ The PEX scie file is larger than the equivalent PEX file since it contains a CPython distribution.
+ The PEX scie file is a native executable binary.

For example, here we create a traditional PEX, a `--sh-boot` PEX and a PEX scie and examine the
resulting files:
```sh
# Create a cowsay PEX in each style:
:; pex cowsay -c cowsay --inject-args=-t --venv -o cowsay.pex
:; pex cowsay -c cowsay --inject-args=-t --venv --sh-boot -o cowsay-sh-boot.pex
:; pex cowsay -c cowsay --inject-args=-t --venv --scie eager -o cowsay

# See what these files look like:
:; head -1 cowsay*
==> cowsay <==
ELF>��@�8

==> cowsay-sh-boot.pex <==
#!/bin/sh

==> cowsay.pex <==
#!/usr/bin/env python3.11

:; file cowsay*
cowsay: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, BuildID[sha1]=f1f01ca2ad165fed27f8304d4b2fad02dcacdffe, stripped
cowsay-sh-boot.pex: POSIX shell script executable (binary data)
cowsay.pex: Zip archive data, made by v2.0 UNIX, extract using at least v2.0, last modified, last modified Sun, Jan 01 1980 00:00:00, uncompressed size 0, method=deflate

:; ls -sSh1 cowsay*
31M cowsay
728K cowsay-sh-boot.pex
724K cowsay.pex
```

The PEX scie can even be inspected like a traditional PEX file:
```sh
:; for pex in cowsay*; do echo $pex; unzip -l $pex | tail -7; echo; done
cowsay
warning [cowsay]: 31525759 extra bytes at beginning or within zipfile
(attempting to process anyway)
7 1980-01-01 00:00 .deps/cowsay-6.1-py3-none-any.whl/cowsay-6.1.dist-info/top_level.txt
873 1980-01-01 00:00 PEX-INFO
7588 1980-01-01 00:00 __main__.py
0 1980-01-01 00:00 __pex__/
7561 1980-01-01 00:00 __pex__/__init__.py
--------- -------
2634753 217 files

cowsay-sh-boot.pex
7 1980-01-01 00:00 .deps/cowsay-6.1-py3-none-any.whl/cowsay-6.1.dist-info/top_level.txt
873 1980-01-01 00:00 PEX-INFO
7588 1980-01-01 00:00 __main__.py
0 1980-01-01 00:00 __pex__/
7561 1980-01-01 00:00 __pex__/__init__.py
--------- -------
2634753 217 files

cowsay.pex
7 1980-01-01 00:00 .deps/cowsay-6.1-py3-none-any.whl/cowsay-6.1.dist-info/top_level.txt
873 1980-01-01 00:00 PEX-INFO
7588 1980-01-01 00:00 __main__.py
0 1980-01-01 00:00 __pex__/
7561 1980-01-01 00:00 __pex__/__init__.py
--------- -------
2634753 217 files
```

The performance of the PEX scie compares favorably, as you'd hope.
```sh
:; hyperfine -w2 './cowsay.pex Moo!' './cowsay-sh-boot.pex Moo!' './cowsay Moo!'
Benchmark 1: ./cowsay.pex Moo!
Time (mean ± σ): 99.2 ms ± 3.7 ms [User: 86.4 ms, System: 13.7 ms]
Range (min … max): 96.1 ms … 110.7 ms 30 runs

Benchmark 2: ./cowsay-sh-boot.pex Moo!
Time (mean ± σ): 17.6 ms ± 0.3 ms [User: 15.2 ms, System: 2.2 ms]
Range (min … max): 16.8 ms … 18.7 ms 165 runs

Benchmark 3: ./cowsay Moo!
Time (mean ± σ): 16.3 ms ± 0.4 ms [User: 13.4 ms, System: 2.7 ms]
Range (min … max): 15.5 ms … 18.6 ms 180 runs

Summary
./cowsay Moo! ran
1.08 ± 0.03 times faster than ./cowsay-sh-boot.pex Moo!
6.09 ± 0.27 times faster than ./cowsay.pex Moo!
```

But, unlike traditional PEXes, you can run the PEX scie anywhere:
```sh
# Traditional Python shebang boot:
:; env -i PATH= ./cowsay.pex Moo!
/usr/bin/env: 'python3.11': No such file or directory

# A --sh-boot /bin/sh boot:
:; env -i PATH= ./cowsay-sh-boot.pex Moo!
Failed to find any of these python binaries on the PATH:
python3.11
python3.13
...
python3
python2
pypy3
pypy2
python
pypy
Either adjust your $PATH which is currently:

Or else install an appropriate Python that provides one of the binaries in this list.

# A hermetic scie boot:
:; env -i PATH= ./cowsay Moo!
____
| Moo! |
====
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
```

## Lazy scies

Specifying `--scie eager` includes a full CPython distribution in your PEX scie. If you ship more
than one PEX scie to a machine using the same Python version, this can be wasteful in transfer
bandwidth and disk space. If your deployment machines have internet access, you can specify
`--scie lazy` and the Python distribution will then be fetched from the internet, but only if
needed. If a PEX scie (whether eager or lazy) using the same Python distribution has run previously
on the machine, the fetch will be skipped and the local distribution used instead. This lazy
fetching feature is powered by the [`ptex` binary](https://github.com/a-scie/ptex) from the science
projects, and you can read more there if you're curious.

If your network access is restricted, you can re-point the download location of the Python
distribution by ensuring the machine has the environment variable `PEX_BOOTSTRAP_URLS` set to the
path of a json file containing the new Python distribution URL. That file should look like:
```json
{
"ptex": {
"cpython-3.12.4+20240713-x86_64-unknown-linux-gnu-install_only.tar.gz": "<internal URL>"
}
}
```

You can run `SCIE=inspect <your PEX scie> | jq '{ptex:.ptex}'` to get a starter file with the
correct default entries for your scie. You can then just edit the URLs. URLs of the form
`file://<absolute path>` are accepted. The only restriction for any custom URL is that it returns a
bytewise-identical copy of the Python distribution pointed to by the original URL. If the file
content hash does not match, the PEX scie will fail to boot. For example:
```sh
# Build a lazy PEX scie:
:; pex cowsay -c cowsay --inject-args=-t --scie lazy -o cowsay

# Generate a starter file for the alternate URLs:
:; SCIE=inspect ./cowsay | jq '{ptex:.ptex}' > starter.json

# Copy to pythons.json and edit it to point to a file that does not contain the original Python
# distribution:
:; jq 'first(.ptex | .[]) = "file:///etc/hosts"' starter.json > pythons.json
:; diff -u --label starter.json starter.json --label pythons.json pythons.json
--- starter.json
+++ pythons.json
@@ -1,5 +1,5 @@
{
"ptex": {
- "cpython-3.11.9+20240726-x86_64-unknown-linux-gnu-install_only.tar.gz": "https://github.com/indygreg/python-build-standalone/releases/download/20240726/cpython-3.11.9%2B20240726-x86_64-unknown-linux-gnu-install_only.tar.gz"
+ "cpython-3.11.9+20240726-x86_64-unknown-linux-gnu-install_only.tar.gz": "file:///etc/hosts"
}
}

# Clear the scie cache and try to run the lazy PEX scie:
:; rm -rf ~/.cache/nce
:; PEX_BOOTSTRAP_URLS=pythons.json ./cowsay Moo!
Error: Failed to establish atomic directory /home/jsirois/.cache/nce/0770bcb55edb6b8089bcc8cbe556d3f737f4a5e3a5cbc45e716206de554c0df9/locks/configure-ce4ae7966f25868830154e6fa8d56b0dd6e09cd2902ab837a4af55d51dc84d92. Population of work directory failed: Failed to establish atomic directory /home/jsirois/.cache/nce/f6e955dc9ddfcad74e77abe6f439dac48ebca14b101ed7c85a5bf3206ed2c53d/cpython-3.11.9+20240726-x86_64-unknown-linux-gnu-install_only.tar.gz. Population of work directory failed: The tar.gz destination /home/jsirois/.cache/nce/f6e955dc9ddfcad74e77abe6f439dac48ebca14b101ed7c85a5bf3206ed2c53d/cpython-3.11.9+20240726-x86_64-unknown-linux-gnu-install_only.tar.gz of size 410 had unexpected hash: 16183c427758316754b82e4d48d63c265ee46ec5ae96a40d9092e694dd5f77ab

The ./cowsay scie contains no alternate boot commands.
```

Here we see the error `.../cpython-3.11.9+20240726-x86_64-unknown-linux-gnu-install_only.tar.gz of
size 410 had unexpected hash: 16183c427758316754b82e4d48d63c265ee46ec5ae96a40d9092e694dd5f77ab`. We
can correct this by re-pointing to a valid file:
```sh
# Download the expected dstribution:
:; curl -fL https://github.com/indygreg/python-build-standalone/releases/download/20240726/cpython-3.11.9%2B20240726-x86_64-unknown-linux-gnu-install_only.tar.gz > /tmp/example

# Re-point to the now valid copy of the expected Python distribution:
:; jq 'first(.ptex | .[]) = "file:///tmp/example"' starter.json > pythons.json
:; diff -u --label starter.json starter.json --label pythons.json pythons.json
--- starter.json
+++ pythons.json
@@ -1,5 +1,5 @@
{
"ptex": {
- "cpython-3.11.9+20240726-x86_64-unknown-linux-gnu-install_only.tar.gz": "https://github.com/indygreg/python-build-standalone/releases/download/20240726/cpython-3.11.9%2B20240726-x86_64-unknown-linux-gnu-install_only.tar.gz"
+ "cpython-3.11.9+20240726-x86_64-unknown-linux-gnu-install_only.tar.gz": "file:///tmp/example"
}
}

# And lazy bootstrapping from the Python distribution in /tmp/example now works:
:; PEX_BOOTSTRAP_URLS=pythons.json ./cowsay Moo!
____
| Moo! |
====
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
```

## BusyBox scies

Scies support multiple commands, but, by default, `pex --scie ...` generates a PEX scie that always
executes the entry point you configured for your PEX. You can, of course, run the scie using
`PEX_INTERPRETER`, `PEX_MODULE` and `PEX_SCRIPT` to modify the entry point just like you can with
a normal PEX, but sometimes it can be convenient to seal in a small set of commands you wish to use
for easier access. You do this by adding `--scie-busybox` to your `pex` command line with a list of
entry points you wish to expose. These entry points can be arbitrary modules or functions within a
module. They can also be console scripts from distributions in the PEX. The BusyBox entry point
specifications accepted are detailed below:

| Form | Example | Effect |
|--------------------------|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
| `<name>=<module>` | `json=json.tool` | Add a `json` command that invokes the `json.tool` module. |
| `<name>=<module>:<func>` | `uuid=uuid:uuid4` | Add a `uuid` command that invokes the `uuid4` function in the `uuid` module. |
| `<name>` | `cowsay` | Add a `cowsay` command for the `cowsay` console script found anywhere in the PEX distributions. |
| `!<name>` | `!cowsay` | Exclude all `cowsay` console scripts found in the PEX distributions (For use with `@` and `@<project>`). |
| `<name>@<project>` | `ansible@ansible-core` | Add an `ansible` command for the `ansible` console script found in the `ansible-core` distributions in the PEX. |
| `!<name>@<project>` | `!ansible@ansible-core` | Exclude the `ansible` console script found in the `ansible-core` distributions in the PEX (For use with `@` and `@ansible-core`). |
| `@<project>` | `@ansible-core` | Add a command for all console scripts found in the `ansible-core` distributions in the PEX. |
| `!@<project>` | `!@ansible-core` | Exclude all console scripts found in the `ansible-core` distributions in the PEX (For use with `@`). |
| `@` | `@` | Add a command for all console scripts found in all project distributions in the PEX. |

For example, to build a BusyBox with tools both useful and frivolous:
```sh
# Build a PEX scie BusyBox with 3 commands:
:; pex cowsay -c cowsay --inject-args=-t --scie lazy --scie-busybox json=json.tool,uuid=uuid:uuid4,cowsay -otools

# Run the BusyBox to discover what commands it contains:
:; ./tools
Error: Could not determine which command to run.

Please select from the following boot commands:

cowsay
json
uuid

You can select a boot command by setting the SCIE_BOOT environment variable or else by passing it as the 1st argument.

# Use the tools:
:; ./tools uuid
16269f0f-76f5-4374-9da2-e0e873c40835
:; ./tools uuid
00e2584d-a5d3-40d9-9217-9a873fe7cac8
:; echo '{"Hello":"World!"}' | ./tools json
{
"Hello": "World!"
}

# Install the tools on the $PATH individually for convenient access:
:; mkdir /tmp/bin
:; export PATH=/tmp/bin:$PATH
:; SCIE=install ./tools /tmp/bin
:; ls -1 /tmp/bin/
cowsay
json
uuid
:; which cowsay
/tmp/bin/cowsay
:; cowsay Moo!
____
| Moo! |
====
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
```
Loading