Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pex output deterministic / reproducible #716

Closed
stuhood opened this issue Apr 29, 2019 · 12 comments
Closed

Make pex output deterministic / reproducible #716

stuhood opened this issue Apr 29, 2019 · 12 comments
Assignees

Comments

@stuhood
Copy link

stuhood commented Apr 29, 2019

To have deterministic / reproducible output, pex should produce a byte-for-byte identical output given identical inputs.

There are a few common cases where this breaks down:

  1. unstable timestamps included in output archives (zip/tar files, generally)
    • This can be fixed by ensuring that pex uses hardcoded timestamps for the entries that it places in archives.
  2. unstable shas/timestamps intentionally included in metadata
    • Fixing things like this might involve adding an option to either disable including this info, or to stabilize it.
  3. unstable ordering of hash iteration between machines
    • Harder to hunt down, and harder to defend against. But fixing it involves using order preserving or sorted structures.
  4. use of absolute paths, or paths that are host specific.

It is out of scope (for this ticket) to stabilize the input files to pex (ie, adding lockfile support). So in cases where the network is involved, structures should be sorted.


It's not clear which combination of these issues might be in play in pex, so it would be good to start by getting a reproducibility test/experiment harness in place that makes it easy to compare two pex outputs and identify the above issues.

@jsirois
Copy link
Member

jsirois commented Apr 29, 2019

This will be an excellent property to have. We can break out issues linked here as we pick off individual bits. There may or may not need to be upstream work to support stability of the archives embedded in built pexes - namely exploded wheels in .deps/ that pex sometimes has to build via setup.py execution.

@Eric-Arellano Eric-Arellano self-assigned this Apr 29, 2019
@Eric-Arellano
Copy link
Contributor

Some initial thoughts on how to approach this.

  • Use TDD. Add failing tests that we skip.
  • Use Python stdlib's filecmp and maybe difflib modules. Former is like shell's cmp program, will compare byte by byte. Latter will help to print diffs for easier to understand tests.
  • Acceptance tests:
    • pex -o 1.pex; pex -o 2.pex; cmp 1.pex 2.pex
    • pex requests cryptography -o 1.pex; pex requests cryptography -o 2.pex; cmp 1.pex 2.pex
      • The actual dependencies used is subject to change. Idea is to test a universal wheel, ABI wheel, and we should also test a wheel that gets built from sdist.
    • pex -m pydoc -o 1.pex; pex -m pydoc -o 2.pex; cmp 1.pex 2.pex
    • Test -c scenario. Haven't grokked what -c does yet, but seems important.
    • Specify --python and --python-shebang.
  • Over time, identify which unit tests will be meaningful to include.
    • It's going to be very difficult to debug the acceptance tests, because Pex files are bytes.
    • Unit tests can make sure each of the important pieces, like the timestamp, shebang, etc, are reproducible.

Rather than first doing a complete audit, does a more whack-a-mole approach sound reasonable? Steps:

  1. Add the failing acceptance tests.
  2. Start by fixing one simple thing, like the timestamp. Add a unit test for this.
  3. Keep going until acceptance tests pass.

Reason I'd prefer this approach is that I anticipate becoming much more familiar with Pex and reproducibility through actually modifying the code, rather than trying to study it at a high level and only then changing things after the audit.

@stuhood
Copy link
Author

stuhood commented Apr 30, 2019

The whack-a-mole approach sounds reasonable, but I'm not sure that adding the failing acceptance tests at the beginning will be significantly more useful than doing some manual diffing of exploded pexes using something like:

ls -l $exploded_one > one.txt
ls -l $exploded_two > two.txt
diff -u3 one.txt two.txt

... there are diminishing returns on making the output of the acceptance tests insanely useful when they fail when it is so easy to inspect a failure using unix tools. But up to you!

@Eric-Arellano
Copy link
Contributor

but I'm not sure that adding the failing acceptance tests at the beginning will be significantly more useful than doing some manual diffing of exploded pexes using something like:

Agreed. The intent of starting with an acceptance test in #717 is to:

  1. Put to pen (i.e. code) the requirements of this project. Helps me to understand all the edge cases we need to handle.
  2. Act as a milestone for my motivation. For example, my first goal is to get the base case of no args passing. Only then will I start to work on the 3rd party dependencies case.
    • This is just like our Py3 CI blacklist idea.
    • Once we get the base case going, we can temporarily comment out the cases that still fail so that we can avoid any regressions.

there are diminishing returns on making the output of the acceptance tests insanely useful

Also agreed. I think what you propose in #717 (comment) sounds reasonable, but we shouldn't go more than that in acceptance tests, especially because I think we will want unit tests to make regressions easy to debug.

@Eric-Arellano
Copy link
Contributor

So first major issue: .pyc is not deterministic until Python 3.7 via PEP 552.

Only targeting Python 3.7+ is not acceptable because Pants must support 3.6 in CI.

The Pep proposes that the way forward is to simply not distribute .pyc. We seem to distribute the original source code for every file, so this is tenable, although it means losing the benefit of caching. We can default to shipping .pyc and only turn off with an env var or CLI flag at build time.

Note the Pep references python/cpython#296 as a way to get deterministic timestamps, but it looks like that was a PR to modify CPython itself that was rejected, and that we can't leverage its work. I can look more closely if we're interested, but it would dramatically increase our complexity.

@jsirois
Copy link
Member

jsirois commented Apr 30, 2019

I think a flag to turn off precompile (already an option in the code of PEXBuilder.freeze et. al.) is the way to go. Better even might be a flag to turn on reproduceable builds as a whole that can serve as an umbrella for all the knobs you end up discovering we need.

@stuhood
Copy link
Author

stuhood commented Apr 30, 2019

Agreed re: the umbrella flag.

Eric-Arellano added a commit that referenced this issue May 1, 2019
Per #716, we want to make Pex builds reproducible / deterministic.

As the first step in that process, we add failing acceptance tests that will act as the indicator of whether the issue is complete or not. As we get tests passing, such as the base test of no args, we will unskip those tests to avoid any regressions.

Here, we test the major permutations of building a Pex via `-o my-name.pex`, per https://pex.readthedocs.io/en/stable/buildingpex.html#saving-pex-files. We use Python's [`filecmp`](https://docs.python.org/3/library/filecmp.html) for a byte-by-byte comparison.

We must also add a `safe_sleep()` function to ensure Python 2 is not flaky, as `time.sleep()` is not guaranteed to run for the specified time until Python 3.5.
@stuhood
Copy link
Author

stuhood commented May 1, 2019

Not to move the goalposts too much (and perhaps this should even be a separate issue?), but: another level of reproducibility that is useful is the ability to ignore the timestamps on input files.

In particular, in the context of the remote execution API (usage-of-which I owe you a http://github.com/pantsbuild/pants/ ticket for; EDIT: pantsbuild/pants#7649), timestamps are not preserved: only names, contents, type, and the "execute" bit (no file ownership, no timestamps, etc). This means that if you run the same command twice with the remote execution API, the files that will be extracted into the workspace that pex will be running in will have different timestamps each time.

@Eric-Arellano : Having typed this out, I believe that fixing the issue of input timestamps would be fine as a followup to the remoting-in-pantsbuild-pants work (to further optimize a relatively rare case). @illicitonion : My feeling is that ignoring input timestamps primarily helps in the "recovering from a dirty node" case, rather than our ability to get cache hits in the first place... so it doesn't feel like it blocks remoting for tests. Would you agree?

@jsirois
Copy link
Member

jsirois commented May 2, 2019

I think this is a non-issue, just a test proving my assertion is all that's needed. No part of PEX or the tools it uses looks at input file timestamps. Excepting:

  • zipfile: this is being fixed anyhow
  • pex/compiler.py: indirectly via py_compile, but since there will be no .pyc lying around in the compile case (which remote exec won't use anyway!) bytecode will always be compiled.

Eric-Arellano added a commit that referenced this issue May 2, 2019
…e to its non-determinism (#718)

### Problem
Until Python 3.7 via https://www.python.org/dev/peps/pep-0552/, `.pyc` files were never reproducible because they included the timestamp. See the discussion at #716 (comment).

We are aiming to ensure reproducible builds a la #716 so that we can safely remote PEXes in CI. Reproducible builds also play an important role in security, per https://reproducible-builds.org.

### Solution
Allow users to toggle on and off `.pyc` files via the new `--compile` / `--no-compile` flag.

We currently default to including `.pyc`, because this has been the precedent for 9 years. In a future release, e.g. 1.70, we will change the default to not include `.pyc` so that builds are reproducible by default, but people can still opt into including the files if they'd like via the flag.

#### Alternative flag considered: `--reproducible`
Originally, we were going to use `--reproducible` to toggle on multiple behaviors like not including `.pyc` and using a hardcoded timestamp instead of system time. However, after deciding that in a future release we are going to default to reproducible, it doesn't make sense for people to ever call `--not-reproducible`, but it does make sense to opt out of specific behaviors like wanting to use `--include-pyc`. So, we use behavior-specific flags instead of a universal flag.

#### Alternative solution: ensure our own `.pyc` timestamp
It may be possible to still include `.pyc` _and_ have their timestamp be deterministic #718 (comment). This is not pursued for now because it is more complex than necessary, especially because currently we do not ship `.pyc` files if multiple platforms are used in the PEX. But it could be a followup PR if deemed worth it.

### Result
When `--no-compile` is set, PEXes will no longer include `.pyc`. This results in all of the [reproducible build acceptance tests](https://github.com/pantsbuild/pex/blob/master/tests/test_integration.py#L1378) passing the exploded pex portion of their tests, excluding the `bdist_wheel` test.

#### Impact on performance
Not using `.pyc` has a slight speedup of ~0.2 seconds to the _creation_ of simple PEXes.

* `time python -m pex -o normal.pex --compile` averaged around 0.49 seconds
* `time python -m pex -o repro.pex --no-compile` averaged around 0.31 seconds

Not using `.pyc` has a slight slowdown of ~0.06 seconds to the startup time of simple PEXes.

* `time ./normal.pex --version` averaged around 0.26 seconds
* `time ./repro.pex --version` averaged around 0.32 seconds

`python -m pex -m pydoc -o pydoc.pex` likewise ran 0.07 seconds slower when not including `.pyc`.

Note that the inclusion of `.pyc` only impacts startup time, not actual runtime performance. See https://stackoverflow.com/questions/2998215/if-python-is-interpreted-what-are-pyc-files for what `.pyc` files do / why they exist.
@illicitonion
Copy link
Contributor

Not to move the goalposts too much (and perhaps this should even be a separate issue?), but: another level of reproducibility that is useful is the ability to ignore the timestamps on input files.

In particular, in the context of the remote execution API (usage-of-which I owe you a http://github.com/pantsbuild/pants/ ticket for; EDIT: pantsbuild/pants#7649), timestamps are not preserved: only names, contents, type, and the "execute" bit (no file ownership, no timestamps, etc). This means that if you run the same command twice with the remote execution API, the files that will be extracted into the workspace that pex will be running in will have different timestamps each time.

@Eric-Arellano : Having typed this out, I believe that fixing the issue of input timestamps would be fine as a followup to the remoting-in-pantsbuild-pants work (to further optimize a relatively rare case). @illicitonion : My feeling is that ignoring input timestamps primarily helps in the "recovering from a dirty node" case, rather than our ability to get cache hits in the first place... so it doesn't feel like it blocks remoting for tests. Would you agree?

I'm not sure I follow what this is proposing to be the problem. Are you saying that the remote (and local) execution system should use fixed timestamps when materialising files? Or that we have some pieces of code in pants/pex/... which read the ctime/mtime of files which we need to fix? Or that tools we invoke read ctime/mtime in problematic ways? Or something else?

@stuhood
Copy link
Author

stuhood commented May 2, 2019

Or that we have some pieces of code in pants/pex/... which read the ctime/mtime of files which we need to fix?

My thinking was that pex included input source files in the output zip mostly verbatim, so if it was given differing input timestamps, those would end up in the output. @jsirois implied above that that is not the case, but would be good to test.

Eric-Arellano added a commit that referenced this issue May 8, 2019
…in built PEX (#722)

### Problem
Part of #716 to make built PEXes be reproducible.

One of the major sources of unreproducibility is timestamps embedded into the zipfile. Python's `zipfile.py` uses the current system time when creating a new zipfile, per https://github.com/python/cpython/blob/45e92fc02d57a7219f4f4922179929f19b3d1c28/Lib/zipfile.py#L509. So, the Pex cannot be reproducible if built after the original run.

### Solution
Introduce a new flag `--use-system-time` / `--no-use-system-time` that will toggle whether we use a deterministic timestamp or keep the default behavior of using the current system time. We will default to `--use-system-time` for now to keep prior behavior, and then in a future release like 1.70 will switch over to default to reproducible builds.

If `--no-use-system-time` is specified, we use midnight at January 1, 1980, which is the start of the MS-DOS time used by Zipfiles. We then use this value to construct a `ZipInfo` with the desired date.


#### Not respecting `SOURCE_DATE_EPOCH`
We do not respect the standard env var `$SOURCE_DATE_EPOCH` often used by tools for deterministic timestamps, as doing so risks reducing the reproducibility of built pexes across different platforms, e.g. if two platforms set the env var differently. Usually, this is supposed to be set for the sake of security to check that a shipped binary was not tampered with, but that is not our primary use case, so we do not respect it both for simplicity of our codebase and to avoid this potential reduction in reproducibility. Refer to https://reproducible-builds.org/docs/source-date-epoch/ for more information.

### Result
4 / 7 of the acceptance tests for reproducibility now pass consistently!

When setting CI to always use a deterministic timestamp, [CI was green](https://travis-ci.org/pantsbuild/pex/builds/527997029) so things will work as expected when we change the default to deterministic timestamps.
@Eric-Arellano
Copy link
Contributor

This has been confirmed to work both in multiple acceptance tests and in the wild with Pants creating a pants.pex binary.

stuhood pushed a commit to pantsbuild/pants that referenced this issue May 16, 2019
…7734)

Pex 1.6.7 added support for reproducible Pex builds (pex-tool/pex#716), meaning that the output of `pex -o 1.pex` will be byte for byte identical to `pex -o 2.pex`. This specifically means that the PEX will use a deterministic timestamp of January 1, 1980 and that the built PEX will not include `.pyc`, which is inherently non-reproducible.

In Pex 1.7.0, Pex will default to this reproducible behavior. Originally we were going to wait for that to land to avoid making any changes to Pex. However, a [user has a strong desire](pex-tool/pex#718 (comment)) for reproducible PEX builds so we are turning on the behavior earlier.

Note that we do not introduce any flags to turn off reproducible builds, because we expect that all users will want this behavior. If we get feedback that users do not, e.g. that they want to include `.pyc` files, then we will add options to turn this behavior off.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants