Select file to download based on existing hashes when --require-hashes #3634

methane · 2016-04-29T16:16:49Z

Pip version: 8.1.1
Python version: Any
Operating System: Linux

Description:

I released manylinux1 wheel in this week.
It broke crossbar build 1.

They used hash checking for sdist and pip download manylinux1 wheel.
Could pip retry sdist when hash mismatch happens?

oberstet · 2016-04-29T17:24:58Z

FWIW, rgd this specific incidence, I don't care anymore. We'll replace msgpack-python with u-msgpack-python: crossbario/autobahn-python#656

Note, dstufft explained some things on IRC:

<oberstet_> because this is what I think just happened with one of our dependencies, and that broke our (version and hash pinned) builds https://github.com/msgpack/msgpack-python/issues/183
* anthony25_ is now known as anthony25
<dstufft> oberstet_: correct
<oberstet_> dstufft: thanks. too bad. we're gonna switch to u-msgpack, as upstream doesn't want to change what he is doing =(
<dstufft> oberstet_: Ideally pip would be smarter here, and if there are 5 files available and one of them doesn't match a hash, it'll just ignore that file
<oberstet_> right. or seal the whole set of artifacts for a package version. or, what does ToR (was it?) have for "doing watertight releases" again?
<oberstet_> but thanks for comfirming!
<oberstet_> dstufft: meejah pointed me to something state of the art rgd all that (how to move bits to users tamperfree etc), but I forgot

methane · 2016-04-29T23:10:46Z

<oberstet_> dstufft: thanks. too bad. we're gonna switch to u-msgpack, as upstream doesn't want to change what he is doing =(

It's OK to switch u-msgpack.
But I can't stand about you treated me as I ill-mannered person.
Releasing wheels after sdist is so bad thing?

About this specific version, I just tried manylinux1 wheel. (pip install msgpack-python is VERY FAST, yay!).
But I build wheels in other machines I usually use, especially windows. So I sometimes release wheels few day after sdist.

Sometimes, wheels is released much later because dependency problem.
For example, I'm waiting MySQL Connector/C supporting VC14. After it released, I'll add cp35-win wheels to PyPI.

methane · 2016-04-29T23:15:48Z

For example other than my project, Cython added manylinux1 wheels.
https://pypi.python.org/pypi/Cython/0.22

Cython 0.22 is released 2015-02-12. And manylinux1 wheels are added at 2016-04-22.
It makes me happy since it makes builds on Travis much faster.

Thanks to manylinux1, many projects are adding binary wheel for linux and CI on linux gets faster and faster.

methane · 2016-04-29T23:23:08Z

To stop creating such a dissension, I think:

pip uses sdist when hash mismatch happens
Officially recommend --no-binary option when using --require-hashes.
Officially recommend "release wheels at same time to sdist, increase version number only for adding wheels after".

oberstet · 2016-04-30T06:50:24Z

Releasing wheels after sdist is so bad thing?

In my opinion, changing the contents of a released package after the fact (4 months in this case) is bad practice. It is similar to moving a published Git release tag, which is even worse, granted. Technically, you can move tags around on a public repo, but you really shouldn't.

A released package on PyPI is identified by its version. Adding wheels to a release changes the set of artifacts inside this package. And that effectively changes the meaning of: msgpack==0.4.7 meant something different in 2016/01 (when it was released) as it means as of today.

And the problems with such practice are many fold. For example, would we (Crossbar.io) already use hashed/pinned requirements instead of only open ended requirements for the PyPI release (which luckily, we don't yet do - but we totally want to), then above would have broken not only our CI, but also our released package! And would have forced us to push a new release - only because upstream added a new wheel to an older release.

Looking at https://pypi.python.org/pypi/Cython/0.23. They do the same. The release artifacts are dated from: 2015-08-08, 2015-08-09 and 2016-04-22!

I uphold my view that this is really bad practice. @methane you are right then, it is probably even "common practice" - but it is bad common practice IMO.

Changing the set of release artifacts does change the meaning of a release (as in it is identified by the version only), and that should only be done in a new release, on a new release version of the package.

So instead of adding new wheels to an old release, you could have pushed a new release (0.4.8) with an extended set of artifacts.

Note that analyzing this took me like an hour alone, as the new artifacts only broke some of our CI platforms. Eg. CPython 3.3 continued to work - because wheels only were published for other versions (3.4./3.5).

oberstet · 2016-04-30T06:59:02Z

Now, that being said, here is my take on how pip could improve things:

pip install --require-hashes -r requirements.txt

should only ever consider the artifacts for which there are hashes in requirements.txt. It should ignore artifacts that it discovers as eligible when looking on PyPI, but for which it doesn't have hashes.

Currently, it will bail out. By changing that behavior, a package releaser can effectively seal the set of permissible dependents.

So, for example, in our case, we had a hash for the sdist of msgpack-python-0.4.7 inside our requirements.txt - but only that single hash (because the sdist was the only thing available at the time I did pip freeze our package).

Now, when a new artifact suddenly appears for 0.4.7 (like msgpack_python-0.4.7-cp35-cp35m-manylinux1_x86_64.whl), there is no hash in requirements.txt, and pip would simply have ignored that, even if the wheel would in principle be the preferred artifact (preferred over the sdist).

oberstet · 2016-04-30T07:11:58Z

As to my view why not sealing a released package is bad: I am concerned about security. Eg by allowing "open ended" dependencies, this effectively creates an uncontrolled vector of attack. Its just a matter of pushing a new artifact to an already released package (extending the set) which is preferred over other artifacts - this allows to inject anything of course.

I do understand that the PyPI/pip packaging infrastructure isn't (yet?) designed for such water tight releases. As a publisher of a security sensitive networking software, I want to protect my users. I want to assure them that they exactly run the bits that we have tested.

We do release binary OS packages for that matter (deb/rpm/pkg) which provide more of that confidence. These packages, for the same reasons, contain not only Crossbar.io, but everything but the libc! They bundle PyPy. They bundle OpenSSL. And hence, they are nearly self-contained (other than the C run-time library, which simply isn't possible to bundle).

FWIW, in my impression (thanks to @meejah for pointing me this!), this is going in the right direction: https://theupdateframework.github.io/

methane · 2016-04-30T07:27:43Z

In my opinion, changing the contents of a released package after the fact (4 months in this case) is bad practice.

No, I didn't change contents of package.
I added another format package of exactly same version.
Hash mismatch is happen because pip compares tar.gz hash with .whl.

oberstet · 2016-04-30T07:30:41Z

You did change the contents of the package (which is the set of all artifacts as far as PyPI is concerned) by adding new artifacts (the wheels) that weren't there before.

And yes, I do understand how hashing works.

methane · 2016-04-30T07:31:41Z

Hash checking is very new feature of pip. I think this issue is problem of current pip behavior.
When pinning package hash, it should pin package format too.

oberstet · 2016-04-30T07:32:39Z

See, we simply disagree on this. And that is just fine! I do not want to insult you. I just disagree. And I care enough about this stuff that I am willing to bite the bullet and move to a different dependency.

methane · 2016-04-30T07:34:52Z

I don't care about moving umsgpack. It's good pure Python implementation. Bye-bye.

I'm talking about how hash pinning feature should work.

methane · 2016-04-30T07:46:51Z

You did change the contents of the package (which is the set of all artifacts as far as PyPI is concerned) by adding new artifacts (the wheels) that weren't there before.

pip installs one artifact, not set of artifacts. Do you mean "pip installs not package, but one part of package?". I don't want to do such a word game.

If uploading one artifact first and another artifact after is bad practice, it should be officially deprecated on packaging.python.org and PyPI should provide some way to publish new version atomically.

If it's not a bad practice, pip should behave more friendly.

oberstet · 2016-04-30T07:53:13Z

PyPI should provide some way to publish new version atomically

I agree with that.

Either that, or being able to "seal" a release.

Like as in: have a hash of hashes of all artifacts of a release associated with the release version. And then I could get away with sticking that single hash into my requirement.txt (instead of having to stick many hashes in there).

pfmoore · 2016-04-30T09:42:49Z

I added another format package of exactly same version.

In my view, this is a perfectly legitimate thing to do. The last thing we want is to discourage people from publishing wheels, simply because they have to publish them with the initial release, or inflict a version bump containing no changes on users.

Hash mismatch is happen because pip compares tar.gz hash with .whl

This sounds to me like it's simply a bug in the hash pinning code. No big deal, just something that needs fixing, I'd say the hash checking code needs to be more clever about looking for "the same thing it downloaded originally" (whether that's a sdist, or a wheel that has since been superseded by a more specific wheel). But I don't use the hash pinning feature, so I'll leave the design to those that do - just don't impose new rules like "you can't add wheels to an already released version" that affect people who don't use hash pinning in the process of fixing the issue.

oberstet · 2016-04-30T09:55:01Z

Hash mismatch is happen because pip compares tar.gz hash with .whl

It happens because pip decides the wheel is superior to source dist, and if the hash for the wheel isn't in the requirements.txt, it will bail out.

Put differently, --require-hashes requires all hashes for all build artifacts to be in the requirements.txt. I can do that - but if a dependency chooses to upload more artifacts, I have to update the requirements.txt and do another release myself. So essentially, this just move the problem further down the line ..

If pip --require-hashes would instead only consider artifacts eligible for which there is a hash in the requirments.txt, all would be fine. But that is not how it works today:

Hash verification is an all-or-nothing proposition.. which imposes several other security restrictions: hashes are required for all requirements. Hashes are required for all dependencies.

Source: https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode

I can't find were it says: "Hashes are required for all release artifacts" though ..

dstufft · 2016-09-07T21:42:10Z

This is harder to do than it appears on the surface because the hash checking mode is based off of the hash of the file we're downloading, but we don't know that hash until we've downloaded it so there is a Catch-22. We could possibly let PyPI optionally tell us the hashes of the files and then if PyPI provides them then we can use that to make a better guess about what files are acceptable (while obviously still actually verifying what file we finally download).

taion · 2017-11-27T20:32:34Z

I just got bit by this. The maintainers of protobuf at 3.5.0.post1 added some wheels 5 days after the version was originally uploaded: https://pypi.python.org/pypi/protobuf/3.5.0.post1.

As I'd generated my Pipfile.lock between the initial publish and the wheel upload, my Pipfile.lock did not include the hashes for the new wheels. As such, my builds started failing once the new wheels were available, and Pip ended up downloading those instead of the earlier source distribution.

ncoghlan · 2017-11-29T02:56:44Z

For the "All packages for a release should be released as a single unit" option, see pypi/warehouse#720 The general concept there is to start releases in some form of staging mode that allows artifacts to be freely added and removed (but doesn't make them available through the regular download APIs), and then locks them against further changes (including addition of new artifacts) once they're promoted to fully released.

I don't think we want to prohibit backfilling wheel archives for earlier releases though, so I like the idea of falling back to checking the sdist hash if the wheel hash doesn't match. (Adding a query API for artifact hashes wouldn't be as helpful, since it would take a while for caching proxies to catch up, whereas "try the sdist if the wheel hash fails to validate" is a client-side only change)

taion · 2017-11-29T04:54:52Z

That sounds reasonable.

Is there some sort of hole there where it would make builds less reproducible, though? Say initially we resolve to hash A for some wheel, then later another wheel makes us resolve to hash B, in which case the fallback logic might point us to the sdist with hash C?

For the build to be fully reproducible, I think there ought to be the option to get back exactly the same wheel.

taion · 2017-11-29T07:50:39Z

Thinking about this a bit more, it's possible that the implementations for getting hashes in pip-tools and Pipenv are just not quite right. Currently, both tools lock down all hashes available for a package version on PyPI.

For packages with an sdist or a universal wheel, I probably do want to lock down the hash for that specific artifact. If my dev environment (or original build environment) is using the universal wheel, I want to keep using that universal wheel – I don't want to fall back to the sdist without action on my part, because the point of using hashes is to not do so.

It's tricky in a pragmatic sense for packages that aren't pure Python, though. For a typical dev setup where dev boxes use OS X, but deployment boxes use Linux, there isn't really a good answer, I think, unless I want to always use the sdist, but not all packages even have that (e.g. tensorflow-gpu, but also building NumPy from source is just not a great idea).

I don't really have an answer here in general, but I do think that if I'm currently using a universal wheel, I should be able to continue using that universal wheel, even if a more specific wheel becomes available.

Kentzo · 2017-12-22T00:09:43Z

@taion

I don't want to fall back to the sdist without action on my part, because the point of using hashes is to not do so.

You should be able to control this with the --only-binary option.

taion · 2017-12-22T00:26:39Z

@Kentzo That wouldn't solve the problem where a new wheel could cause a build failure when using --require-hashes, right? Because I'd get the new wheel, and I wouldn't have the hash for that wheel.

Kentzo · 2017-12-22T00:39:04Z

@taion I'd expect pip to always bring a distribution with a matching hash. That was the topic of my original issue.

The --only-binary / --no-binary are good enough to pick between wheels and sdists.
Picking among wheels is more complicated. I believe pip should prefer the most specific version by default, but with user specifying a hash it makes sense to respect this choice. Only a warning is acceptable.

pradyunsg · 2018-03-05T06:01:29Z

I am a +1 to this. This is basically a bug in pip's current version pinning logic.

I'm willing to fix this as a part of bringing in a proper resolver, so I'll assign this to myself. If someone else wants to fix this before that, I'm happy to help with that too. :)

uranusjr · 2019-03-30T10:49:58Z

I took a look at this. The patch required (if my assessment is correct) would benefit from #5971, so I’ll probably wait to see whether that works out.

A question to people interested in this: What would be the expected error if hashes are provided, but no files match any of them? Hash mismatch (same as the one raised currently), or No matching distribution found (as the one if there’s no applicable files)?

oberstet · 2019-03-30T11:10:50Z

Thanks for reopening and working on this! Cool.

So my main goal is to get to 100% reproducible builds (to the single bit .. I am aware this probably has more to it than only this issue).

pip install --require-hashes -r requirements.txt

should only ever consider the artifacts for which there are hashes in requirements.txt. It should ignore artifacts that it discovers as technically eligible when looking on PyPI, but for which it doesn't have hashes in requirements.txt.

Essentially, my argument is: when there is no hash, it shouldn't be considered eligible no matter what.

Now, rgd your Q: I don't care much about the specific error raised: reuse existing (a) "hash mismatch" vs (b) "no matching distribution" are both fine. As long as the pip install comes back with a non-zero return code breaking the build I am cool. A build broken because of this should require user/developer investigation ..

That being said: (b) seems more explicit, as given the fixed and sealed set of hashes in requirements.txt does not contain a hash for the supposedly new wheel added after the fact (for which there is no hash in requirements.txt - the premise / situation this issue is about). Hence, it is not exactly a "mismatch", but a "no match" ..

cjerdonek · 2019-07-15T00:31:36Z

FYI, this has been addressed by PR #6699 (also see issue #5874 for more recent discussion).

estan mentioned this issue May 2, 2016

Consider creating anylinux wheels for PyPI h5py/h5py#706

Closed

dstufft changed the title ~~"Hash-Checking Mode" fails when wheel package added~~ Select file to download based on existing hashes when --require-hashes Sep 7, 2016

dstufft mentioned this issue Sep 7, 2016

Add option to specify preference of archive formats #3622

Closed

xavfernandez mentioned this issue Oct 3, 2016

Uploading a bad wheel breaks installs formerly relying on source distribution #3996

Closed

taion mentioned this issue Nov 27, 2017

Restrict deleting packages pypa/packaging-problems#112

Closed

taion mentioned this issue Nov 30, 2017

Draft release feature on main archive to allow testing a release before it goes live pypi/warehouse#726

Open

pradyunsg added the type: feature request Request for a new feature label Mar 4, 2018

pradyunsg added type: bug A confirmed bug or unintended behavior and removed type: feature request Request for a new feature labels Mar 5, 2018

darrenlucas mentioned this issue Nov 29, 2018

Can't install sass/libsass-python#273

Closed

uranusjr mentioned this issue Mar 30, 2019

Hash-checking mode breaks after new distributions are added #5874

Closed

minrk mentioned this issue Jun 28, 2019

18.0.1 files re-released on Jun 27 with new hashes zeromq/pyzmq#1304

Closed

cjerdonek closed this as completed Jul 15, 2019

cjerdonek mentioned this issue Jul 16, 2019

Address #5874: Prefer candidates with allowed hashes #6699

Merged

lock bot added the auto-locked Outdated issues that have been locked by automation label Aug 14, 2019

lock bot locked as resolved and limited conversation to collaborators Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select file to download based on existing hashes when --require-hashes #3634

Select file to download based on existing hashes when --require-hashes #3634

methane commented Apr 29, 2016

oberstet commented Apr 29, 2016

methane commented Apr 29, 2016 •

edited

Loading

methane commented Apr 29, 2016

methane commented Apr 29, 2016

oberstet commented Apr 30, 2016 •

edited

Loading

oberstet commented Apr 30, 2016 •

edited

Loading

oberstet commented Apr 30, 2016

methane commented Apr 30, 2016

oberstet commented Apr 30, 2016 •

edited

Loading

methane commented Apr 30, 2016

oberstet commented Apr 30, 2016

methane commented Apr 30, 2016

methane commented Apr 30, 2016

oberstet commented Apr 30, 2016

pfmoore commented Apr 30, 2016

oberstet commented Apr 30, 2016 •

edited

Loading

dstufft commented Sep 7, 2016

taion commented Nov 27, 2017

ncoghlan commented Nov 29, 2017

taion commented Nov 29, 2017

taion commented Nov 29, 2017

Kentzo commented Dec 22, 2017

taion commented Dec 22, 2017

Kentzo commented Dec 22, 2017

pradyunsg commented Mar 5, 2018

uranusjr commented Mar 30, 2019

oberstet commented Mar 30, 2019

cjerdonek commented Jul 15, 2019

Select file to download based on existing hashes when --require-hashes #3634

Select file to download based on existing hashes when --require-hashes #3634

Comments

methane commented Apr 29, 2016

Description:

oberstet commented Apr 29, 2016

methane commented Apr 29, 2016 • edited Loading

methane commented Apr 29, 2016

methane commented Apr 29, 2016

oberstet commented Apr 30, 2016 • edited Loading

oberstet commented Apr 30, 2016 • edited Loading

oberstet commented Apr 30, 2016

methane commented Apr 30, 2016

oberstet commented Apr 30, 2016 • edited Loading

methane commented Apr 30, 2016

oberstet commented Apr 30, 2016

methane commented Apr 30, 2016

methane commented Apr 30, 2016

oberstet commented Apr 30, 2016

pfmoore commented Apr 30, 2016

oberstet commented Apr 30, 2016 • edited Loading

dstufft commented Sep 7, 2016

taion commented Nov 27, 2017

ncoghlan commented Nov 29, 2017

taion commented Nov 29, 2017

taion commented Nov 29, 2017

Kentzo commented Dec 22, 2017

taion commented Dec 22, 2017

Kentzo commented Dec 22, 2017

pradyunsg commented Mar 5, 2018

uranusjr commented Mar 30, 2019

oberstet commented Mar 30, 2019

cjerdonek commented Jul 15, 2019

methane commented Apr 29, 2016 •

edited

Loading

oberstet commented Apr 30, 2016 •

edited

Loading

oberstet commented Apr 30, 2016 •

edited

Loading

oberstet commented Apr 30, 2016 •

edited

Loading

oberstet commented Apr 30, 2016 •

edited

Loading