-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make pex output deterministic / reproducible #716
Comments
This will be an excellent property to have. We can break out issues linked here as we pick off individual bits. There may or may not need to be upstream work to support stability of the archives embedded in built pexes - namely exploded wheels in .deps/ that pex sometimes has to build via |
Some initial thoughts on how to approach this.
Rather than first doing a complete audit, does a more whack-a-mole approach sound reasonable? Steps:
Reason I'd prefer this approach is that I anticipate becoming much more familiar with Pex and reproducibility through actually modifying the code, rather than trying to study it at a high level and only then changing things after the audit. |
The whack-a-mole approach sounds reasonable, but I'm not sure that adding the failing acceptance tests at the beginning will be significantly more useful than doing some manual diffing of exploded pexes using something like:
... there are diminishing returns on making the output of the acceptance tests insanely useful when they fail when it is so easy to inspect a failure using unix tools. But up to you! |
Agreed. The intent of starting with an acceptance test in #717 is to:
Also agreed. I think what you propose in #717 (comment) sounds reasonable, but we shouldn't go more than that in acceptance tests, especially because I think we will want unit tests to make regressions easy to debug. |
So first major issue: Only targeting Python 3.7+ is not acceptable because Pants must support 3.6 in CI. The Pep proposes that the way forward is to simply not distribute Note the Pep references python/cpython#296 as a way to get deterministic timestamps, but it looks like that was a PR to modify CPython itself that was rejected, and that we can't leverage its work. I can look more closely if we're interested, but it would dramatically increase our complexity. |
I think a flag to turn off precompile (already an option in the code of PEXBuilder.freeze et. al.) is the way to go. Better even might be a flag to turn on reproduceable builds as a whole that can serve as an umbrella for all the knobs you end up discovering we need. |
Agreed re: the umbrella flag. |
Per #716, we want to make Pex builds reproducible / deterministic. As the first step in that process, we add failing acceptance tests that will act as the indicator of whether the issue is complete or not. As we get tests passing, such as the base test of no args, we will unskip those tests to avoid any regressions. Here, we test the major permutations of building a Pex via `-o my-name.pex`, per https://pex.readthedocs.io/en/stable/buildingpex.html#saving-pex-files. We use Python's [`filecmp`](https://docs.python.org/3/library/filecmp.html) for a byte-by-byte comparison. We must also add a `safe_sleep()` function to ensure Python 2 is not flaky, as `time.sleep()` is not guaranteed to run for the specified time until Python 3.5.
Not to move the goalposts too much (and perhaps this should even be a separate issue?), but: another level of reproducibility that is useful is the ability to ignore the timestamps on input files. In particular, in the context of the remote execution API (usage-of-which I owe you a http://github.com/pantsbuild/pants/ ticket for; EDIT: pantsbuild/pants#7649), timestamps are not preserved: only names, contents, type, and the "execute" bit (no file ownership, no timestamps, etc). This means that if you run the same command twice with the remote execution API, the files that will be extracted into the workspace that pex will be running in will have different timestamps each time. @Eric-Arellano : Having typed this out, I believe that fixing the issue of input timestamps would be fine as a followup to the remoting-in-pantsbuild-pants work (to further optimize a relatively rare case). @illicitonion : My feeling is that ignoring input timestamps primarily helps in the "recovering from a dirty node" case, rather than our ability to get cache hits in the first place... so it doesn't feel like it blocks remoting for tests. Would you agree? |
I think this is a non-issue, just a test proving my assertion is all that's needed. No part of PEX or the tools it uses looks at input file timestamps. Excepting:
|
…e to its non-determinism (#718) ### Problem Until Python 3.7 via https://www.python.org/dev/peps/pep-0552/, `.pyc` files were never reproducible because they included the timestamp. See the discussion at #716 (comment). We are aiming to ensure reproducible builds a la #716 so that we can safely remote PEXes in CI. Reproducible builds also play an important role in security, per https://reproducible-builds.org. ### Solution Allow users to toggle on and off `.pyc` files via the new `--compile` / `--no-compile` flag. We currently default to including `.pyc`, because this has been the precedent for 9 years. In a future release, e.g. 1.70, we will change the default to not include `.pyc` so that builds are reproducible by default, but people can still opt into including the files if they'd like via the flag. #### Alternative flag considered: `--reproducible` Originally, we were going to use `--reproducible` to toggle on multiple behaviors like not including `.pyc` and using a hardcoded timestamp instead of system time. However, after deciding that in a future release we are going to default to reproducible, it doesn't make sense for people to ever call `--not-reproducible`, but it does make sense to opt out of specific behaviors like wanting to use `--include-pyc`. So, we use behavior-specific flags instead of a universal flag. #### Alternative solution: ensure our own `.pyc` timestamp It may be possible to still include `.pyc` _and_ have their timestamp be deterministic #718 (comment). This is not pursued for now because it is more complex than necessary, especially because currently we do not ship `.pyc` files if multiple platforms are used in the PEX. But it could be a followup PR if deemed worth it. ### Result When `--no-compile` is set, PEXes will no longer include `.pyc`. This results in all of the [reproducible build acceptance tests](https://github.com/pantsbuild/pex/blob/master/tests/test_integration.py#L1378) passing the exploded pex portion of their tests, excluding the `bdist_wheel` test. #### Impact on performance Not using `.pyc` has a slight speedup of ~0.2 seconds to the _creation_ of simple PEXes. * `time python -m pex -o normal.pex --compile` averaged around 0.49 seconds * `time python -m pex -o repro.pex --no-compile` averaged around 0.31 seconds Not using `.pyc` has a slight slowdown of ~0.06 seconds to the startup time of simple PEXes. * `time ./normal.pex --version` averaged around 0.26 seconds * `time ./repro.pex --version` averaged around 0.32 seconds `python -m pex -m pydoc -o pydoc.pex` likewise ran 0.07 seconds slower when not including `.pyc`. Note that the inclusion of `.pyc` only impacts startup time, not actual runtime performance. See https://stackoverflow.com/questions/2998215/if-python-is-interpreted-what-are-pyc-files for what `.pyc` files do / why they exist.
I'm not sure I follow what this is proposing to be the problem. Are you saying that the remote (and local) execution system should use fixed timestamps when materialising files? Or that we have some pieces of code in pants/pex/... which read the ctime/mtime of files which we need to fix? Or that tools we invoke read ctime/mtime in problematic ways? Or something else? |
My thinking was that pex included input source files in the output zip mostly verbatim, so if it was given differing input timestamps, those would end up in the output. @jsirois implied above that that is not the case, but would be good to test. |
…in built PEX (#722) ### Problem Part of #716 to make built PEXes be reproducible. One of the major sources of unreproducibility is timestamps embedded into the zipfile. Python's `zipfile.py` uses the current system time when creating a new zipfile, per https://github.com/python/cpython/blob/45e92fc02d57a7219f4f4922179929f19b3d1c28/Lib/zipfile.py#L509. So, the Pex cannot be reproducible if built after the original run. ### Solution Introduce a new flag `--use-system-time` / `--no-use-system-time` that will toggle whether we use a deterministic timestamp or keep the default behavior of using the current system time. We will default to `--use-system-time` for now to keep prior behavior, and then in a future release like 1.70 will switch over to default to reproducible builds. If `--no-use-system-time` is specified, we use midnight at January 1, 1980, which is the start of the MS-DOS time used by Zipfiles. We then use this value to construct a `ZipInfo` with the desired date. #### Not respecting `SOURCE_DATE_EPOCH` We do not respect the standard env var `$SOURCE_DATE_EPOCH` often used by tools for deterministic timestamps, as doing so risks reducing the reproducibility of built pexes across different platforms, e.g. if two platforms set the env var differently. Usually, this is supposed to be set for the sake of security to check that a shipped binary was not tampered with, but that is not our primary use case, so we do not respect it both for simplicity of our codebase and to avoid this potential reduction in reproducibility. Refer to https://reproducible-builds.org/docs/source-date-epoch/ for more information. ### Result 4 / 7 of the acceptance tests for reproducibility now pass consistently! When setting CI to always use a deterministic timestamp, [CI was green](https://travis-ci.org/pantsbuild/pex/builds/527997029) so things will work as expected when we change the default to deterministic timestamps.
This has been confirmed to work both in multiple acceptance tests and in the wild with Pants creating a |
…7734) Pex 1.6.7 added support for reproducible Pex builds (pex-tool/pex#716), meaning that the output of `pex -o 1.pex` will be byte for byte identical to `pex -o 2.pex`. This specifically means that the PEX will use a deterministic timestamp of January 1, 1980 and that the built PEX will not include `.pyc`, which is inherently non-reproducible. In Pex 1.7.0, Pex will default to this reproducible behavior. Originally we were going to wait for that to land to avoid making any changes to Pex. However, a [user has a strong desire](pex-tool/pex#718 (comment)) for reproducible PEX builds so we are turning on the behavior earlier. Note that we do not introduce any flags to turn off reproducible builds, because we expect that all users will want this behavior. If we get feedback that users do not, e.g. that they want to include `.pyc` files, then we will add options to turn this behavior off.
To have deterministic / reproducible output, pex should produce a byte-for-byte identical output given identical inputs.
There are a few common cases where this breaks down:
It is out of scope (for this ticket) to stabilize the input files to pex (ie, adding lockfile support). So in cases where the network is involved, structures should be sorted.
It's not clear which combination of these issues might be in play in pex, so it would be good to start by getting a reproducibility test/experiment harness in place that makes it easy to compare two pex outputs and identify the above issues.
The text was updated successfully, but these errors were encountered: