Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic and verifiable operation #60

Closed
indygreg opened this issue Feb 2, 2020 · 3 comments
Closed

Deterministic and verifiable operation #60

indygreg opened this issue Feb 2, 2020 · 3 comments

Comments

@indygreg
Copy link

indygreg commented Feb 2, 2020

get-pip.py doesn't pin its dependencies nor their SHA-256 hashes. At run-time, it does the equivalent of pip install --upgrade pip setuptools wheel, which will pull in the latest versions of these packages (plus dependencies, if any) without hash checking, only relying on TLS x509 certificate checking. So despite claims that python get-pip.py is secure, it is only secure as far as the trusted root CA system is secure. You are still vulnerable to a MitM attack by any server with a valid x509 certificate chaining up to a trusted root CA. And since the common root CA lists contain some CAs associated with governments with... questionable practices, this trust only goes so far. (If I were one of these questionable governments or a nefarious actor, I could MitM PyPI and inject malware into pip, setuptools, wheel because get-pip.py doesn't verify the SHA-256 hashes of files it downloads off the Internet at run time. That would be a very attractive attack target given the sheer volume of machines that would run the poisoned code within minutes and the potential to spread malware by infecting packages built with a poisoned version of pip/setuptools.)

If you don't buy into the tin foil hat arguments, another issue with the current approach is it isn't deterministic over time. Even if I download a specific version of get-pip.py today and verify its SHA-256 is a trusted value, the results from running it today could be different from running it tomorrow because a new version of pip, setuptools, or wheel is published on PyPI. This lack of reproducibility can be extremely annoying. For example, I try to enforce deterministic and reproducible tests and CI in my projects to the maximal extent possible. If I'm using pip 20 in commit X, I want tests/CI to use pip 20 for all of time. I don't want pip 21 to be silently used when I check out this commit 1 year from now. (An issue related to this is whenever a new version of pip, setuptools, wheel, or get-pip.py are published, random processes that don't pin dependencies can break due to incompatibilities in the new version.)

So, I have a feature request for get-pip.py: deterministic and verifiable mode.

In this mode, executing get-pip.py would install a deterministic version of all packages and would specify SHA-256 hashes for all those packages. In this mode, get-pip.py would be resistant to MitM attacks against the package repository it downloads pip, setuptools, wheel, etc from. It would also (hopefully) guarantee reproducible execution.

The way I see this working is get-pip.py gains a new CLI flag. Say --reproducible. In this mode, the invocation of pip behind the scenes specifies a requirements/constraints file with pinned SHA-256 hashes and --require-hashes mode is enabled.

Establishing this feature would require updating a pip requirements file/manifest at release time or whenever dependency version is bumped. So it is a bit of extra work for the get-pip.py maintainers.

I think it is worthwhile to implement this feature in get-pip.py itself because bootstrapping packaging tools from a Python distribution that doesn't have them (e.g. the Windows embeddable zip file distributions) in a deterministic and reproducible manner is really difficult. You have to install setuptools, pip, wheel, etc from source, taking care to download and verify deterministic versions of each. I would prefer for Python's packaging tools to offer high levels of security and guarantees of determinism by default. pip itself can already achieve this with --require-hashes mode. But get-pip.py does not and that undermines the security and integrity of the whole packaging chain.

@indygreg
Copy link
Author

indygreg commented Feb 2, 2020

FWIW I've spent the better part of 2 hours trying to figure out how to bootstrap the Python packaging tools in a secure, deterministic, and reproducible manner with the Windows embeddable zip distribution (this distribution lacks ensurepip and venv for some reason). Since setuptools can't be self-installed post version 33.1, I first reached for ez_setup.py to bootstrap setuptools so you can install setuptools, and then you can use setuptools to install pip.

But looking at the ez_setup.py source code, it downloads a file from the Internet without strong integrity checking (read: verifying hashes). So that's a non-starter because it doesn't guarantee security. So it is looking like I'll need to use setuptools 33 to bootstrap modern setuptools somehow. But as of me writing this, I still can't figure out how to coerce setuptools 33 to install on this Windows distribution.

And I am pretty familiar with a lot of low-level Python packaging details. If I can't figure this out, there's little hope for most Python users.

indygreg added a commit to indygreg/PyOxidizer that referenced this issue Feb 2, 2020
get-pip.py is not deterministic nor does it validate content hashes
when downloading files from the Internet. See
pypa/get-pip#60. This makes naive usage
inappropriate for PyOxidizer, which wants to ensure downstream
consumers can achieve determinism and isn't the weak link in the
security chain.

Way too much effort was spent developing this commit and figuring
out how to get the packaging tools to install securely and
deterministically. See the long comment in packaging_tool.rs for
details.
@indygreg
Copy link
Author

indygreg commented Feb 2, 2020

The commit referenced above has a detailed comment about how I finally got this working and what didn't work. tl;dr I had to modify get-pip.py to allow usage of -r requirements.txt with pip, setuptools, and wheel pinned. Naive usage of get-pip.py -r requirements.txt didn't work because get-pip.py always includes pip as a CLI argument and pip complains about lack of hash pinning. There's arguably a bug in pip where it doesn't merge command line and requirements file entries if they are duplicates. But that's for another issue.

@pradyunsg
Copy link
Member

#73 describes the hack that you need to do this right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants