-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding tests to ensure pip install is reproducible #7808
Comments
This makes sense if during the build you were specifying |
We build wheels using
and install them using
|
Thanks. A straightforward way to fix this without requiring too much change is to recurse over the unpacked wheel directory explicitly and use |
This should be fixed by #8541, however we will need tests before calling this done. Tests would look like:
and would probably go in |
Am I missing something here? There's no mention of |
It's not us, but the standard library. If we're using either of I don't have a stake in this, so I won't drive the discussion more than that. Anyone can feel free to pick this up and put time into it. :) |
Ah, OK. In which case, I'd say that pip's test should simply be that we pass None to the stdlib function (and hence that we follow stdlib behaviour). We shouldn't test that the stdlib does what it says it will do. If, however, we want to document for pip what guarantees we give around reproducibility, then yes we should have a test that ensures we deliver that behaviour. But I'd say document first, then test - we don't need any more tests that check undocumented behaviour 🙁
Nor do I, beyond not wanting to be expected to support a behaviour that I don't understand and personally have no need for. So I'll say no more for now. As you say, someone who has an interest in this can define what they want and push the discussion on that basis. |
The previous attempt wasn't covering all of the bases. It relied on invoking that pip-install-hook, and didn't apply to pip itself. The core issue is that the generated .pyc files embed some of the temporary paths, which are randomly generated. See https://r13y.com/diff/bf8c3ca3148ebff9ecf41f294cc60b9f209c006d49699e356969ff32d736f1c6-8806a7cca91fdd300e48736bfcd57c4d0b54c1cc2fd61609f35143170862b59c.html In this new attempt, the approach is to patch the TempFile implementation directly, so that it creates stable temporary directories. We also assume that if SOURCE_DATE_EPOCH is set, we are in a scenario where reproducible builds are desirable and enter that branch. See also pypa/pip#7808
The previous attempt wasn't covering all of the bases. It relied on invoking that pip-install-hook, and didn't apply to pip itself. The core issue is that the generated .pyc files embed some of the temporary paths, which are randomly generated. See https://r13y.com/diff/bf8c3ca3148ebff9ecf41f294cc60b9f209c006d49699e356969ff32d736f1c6-8806a7cca91fdd300e48736bfcd57c4d0b54c1cc2fd61609f35143170862b59c.html In this new attempt, the approach is to patch the TempFile implementation directly, so that it creates stable temporary directories. We also assume that if SOURCE_DATE_EPOCH is set, we are in a scenario where reproducible builds are desirable and enter that branch. See also pypa/pip#7808
@pfmoore if PR was made that would make argument of said compile functions None to use standard library convention instead, would it be accepted? This is where it's documented https://docs.python.org/3/whatsnew/3.7.html#py-compile |
The end result is essentially checked hash so bytecode is still validated against source for changes but not based on mtime but instead through hash. |
Looks like https://github.com/pypa/pip/blob/master/src/pip/_vendor/distlib/util.py#L595 is fine as is. It will result in whatever standard library uses. |
Similarly https://github.com/pypa/pip/blob/master/src/pip/_internal/operations/install/wheel.py#L715 looks fine as is. The default is checked hash assuming Python works as intended. |
Any chance the title could be reflected that this issue is not in fact about fixing pip but adding tests? :) |
I can confirm the issue is indeed resolved with 20.2. Thank you. It would be great to have tests to avoid this issue from occurring again. |
Reproducible builds of pyproject projects using pip is resolved. Fixes pypa/pip#7808 Fixes #81441 The more recent c409f69 caused trouble with pyproject troubles and had to be reverted anyway. #102222 (comment) Revert "pythonPackages.pip: make reproducible (#102222)" This reverts commit c409f69. Revert "python3Packages.pip: allow setting reproducible temporary directory via NIX_PIP_INSTALL_TMPDIR" This reverts commit aedbade.
Reproducible builds of pyproject projects using pip is resolved. Fixes pypa/pip#7808 Fixes #81441 The more recent c409f69 caused trouble with pyproject troubles and had to be reverted anyway. #102222 (comment) Revert "pythonPackages.pip: make reproducible (#102222)" This reverts commit c409f69. Revert "python3Packages.pip: allow setting reproducible temporary directory via NIX_PIP_INSTALL_TMPDIR" This reverts commit aedbade.
PRs adding tests are welcome! |
Uagh. I just read through a reproducible report out of our build system and apparently without the nixos patch pip using non-deterministic pip-req-build path results in eg in debuginfo for cython ending up with these paths hardcoded. I think the core thing here is that you can't externally affect the seed of mkdtemp. (which is not really a problem in pip but in CPython) |
There was a bug open about this https://bugs.python.org/issue32276 but it was rejected. IMHO it's silly to blame everything around CPython about not being reproducible when the source of unreproducibility is in CPython. |
We compile from tar.gz, instead of using pre-built binary wheels from PyPI. (or if the dep is pure-python, use tar.gz instead of "source-only" wheel) ----- Some unorganised things below for future reference. ``` $ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory --- triple: 'x86_64-apple-darwin' binary-path: 'dist1/hid.cpython-39-darwin.so' ... ``` ``` $ nm -pa dist1/hid.cpython-39-darwin.so ``` - https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable - pypa/pip#6505 - pypa/pip#7808 (comment) - NixOS/nixpkgs#91272 - cython/cython#1576 - https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
We compile from tar.gz, instead of using pre-built binary wheels from PyPI. (or if the dep is pure-python, use tar.gz instead of "source-only" wheel) ----- Some unorganised things below for future reference. ``` $ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory --- triple: 'x86_64-apple-darwin' binary-path: 'dist1/hid.cpython-39-darwin.so' ... ``` ``` $ nm -pa dist1/hid.cpython-39-darwin.so ``` - https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable - pypa/pip#6505 - pypa/pip#7808 (comment) - NixOS/nixpkgs#91272 - cython/cython#1576 - https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
We compile from tar.gz, instead of using pre-built binary wheels from PyPI. (or if the dep is pure-python, use tar.gz instead of "source-only" wheel) ----- Some unorganised things below for future reference. ``` $ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory --- triple: 'x86_64-apple-darwin' binary-path: 'dist1/hid.cpython-39-darwin.so' ... ``` ``` $ nm -pa dist1/hid.cpython-39-darwin.so ``` - https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable - pypa/pip#6505 - pypa/pip#7808 (comment) - NixOS/nixpkgs#91272 - cython/cython#1576 - https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
We compile from tar.gz, instead of using pre-built binary wheels from PyPI. (or if the dep is pure-python, use tar.gz instead of "source-only" wheel) ----- Some unorganised things below for future reference. ``` $ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory --- triple: 'x86_64-apple-darwin' binary-path: 'dist1/hid.cpython-39-darwin.so' ... ``` ``` $ nm -pa dist1/hid.cpython-39-darwin.so ``` - https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable - pypa/pip#6505 - pypa/pip#7808 (comment) - NixOS/nixpkgs#91272 - cython/cython#1576 - https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
Environment
Description
Using pip at 19.3.1 building of Python packages was reproducible, however, since we upgraded to 20.0.2 that is no longer the case (NixOS/nixpkgs#81441). The issue is that bytecode now refers to a unpacked-wheel folder (see e.g. https://r13y.com/diff/b03e875784ba16e60ec59d1d5e720b13fc7ae1311edf409b819b0b264acd5393-aba2cac73c5234af8fbf694100bc631800ca397ad7afeed335d38895322a7c71.html).
The function for this unpacked-wheel was added in #7483, I have not bisected this any further though cc @chrahunt @raboof
Expected behavior
Reproducible build.
How to Reproduce
The text was updated successfully, but these errors were encountered: