-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select file to download based on existing hashes when --require-hashes #3634
Comments
FWIW, rgd this specific incidence, I don't care anymore. We'll replace
|
It's OK to switch u-msgpack. About this specific version, I just tried manylinux1 wheel. ( Sometimes, wheels is released much later because dependency problem. |
For example other than my project, Cython added manylinux1 wheels. Cython 0.22 is released 2015-02-12. And manylinux1 wheels are added at 2016-04-22. Thanks to manylinux1, many projects are adding binary wheel for linux and CI on linux gets faster and faster. |
To stop creating such a dissension, I think:
|
In my opinion, changing the contents of a released package after the fact (4 months in this case) is bad practice. It is similar to moving a published Git release tag, which is even worse, granted. Technically, you can move tags around on a public repo, but you really shouldn't. A released package on PyPI is identified by its version. Adding wheels to a release changes the set of artifacts inside this package. And that effectively changes the meaning of: And the problems with such practice are many fold. For example, would we (Crossbar.io) already use hashed/pinned requirements instead of only open ended requirements for the PyPI release (which luckily, we don't yet do - but we totally want to), then above would have broken not only our CI, but also our released package! And would have forced us to push a new release - only because upstream added a new wheel to an older release. Looking at https://pypi.python.org/pypi/Cython/0.23. They do the same. The release artifacts are dated from: 2015-08-08, 2015-08-09 and 2016-04-22! I uphold my view that this is really bad practice. @methane you are right then, it is probably even "common practice" - but it is bad common practice IMO. Changing the set of release artifacts does change the meaning of a release (as in it is identified by the version only), and that should only be done in a new release, on a new release version of the package. So instead of adding new wheels to an old release, you could have pushed a new release (0.4.8) with an extended set of artifacts.
|
Now, that being said, here is my take on how pip could improve things:
should only ever consider the artifacts for which there are hashes in Currently, it will bail out. By changing that behavior, a package releaser can effectively seal the set of permissible dependents. So, for example, in our case, we had a hash for the Now, when a new artifact suddenly appears for 0.4.7 (like |
As to my view why not sealing a released package is bad: I am concerned about security. Eg by allowing "open ended" dependencies, this effectively creates an uncontrolled vector of attack. Its just a matter of pushing a new artifact to an already released package (extending the set) which is preferred over other artifacts - this allows to inject anything of course. I do understand that the PyPI/pip packaging infrastructure isn't (yet?) designed for such water tight releases. As a publisher of a security sensitive networking software, I want to protect my users. I want to assure them that they exactly run the bits that we have tested. We do release binary OS packages for that matter (deb/rpm/pkg) which provide more of that confidence. These packages, for the same reasons, contain not only Crossbar.io, but everything but the FWIW, in my impression (thanks to @meejah for pointing me this!), this is going in the right direction: https://theupdateframework.github.io/ |
No, I didn't change contents of package. |
You did change the contents of the package (which is the set of all artifacts as far as PyPI is concerned) by adding new artifacts (the wheels) that weren't there before. And yes, I do understand how hashing works. |
Hash checking is very new feature of pip. I think this issue is problem of current pip behavior. |
See, we simply disagree on this. And that is just fine! I do not want to insult you. I just disagree. And I care enough about this stuff that I am willing to bite the bullet and move to a different dependency. |
I don't care about moving umsgpack. It's good pure Python implementation. Bye-bye. I'm talking about how hash pinning feature should work. |
pip installs one artifact, not set of artifacts. Do you mean "pip installs not package, but one part of package?". I don't want to do such a word game. If uploading one artifact first and another artifact after is bad practice, it should be officially deprecated on packaging.python.org and PyPI should provide some way to publish new version atomically. If it's not a bad practice, pip should behave more friendly. |
I agree with that. Either that, or being able to "seal" a release. Like as in: have a hash of hashes of all artifacts of a release associated with the release version. And then I could get away with sticking that single hash into my requirement.txt (instead of having to stick many hashes in there). |
In my view, this is a perfectly legitimate thing to do. The last thing we want is to discourage people from publishing wheels, simply because they have to publish them with the initial release, or inflict a version bump containing no changes on users.
This sounds to me like it's simply a bug in the hash pinning code. No big deal, just something that needs fixing, I'd say the hash checking code needs to be more clever about looking for "the same thing it downloaded originally" (whether that's a sdist, or a wheel that has since been superseded by a more specific wheel). But I don't use the hash pinning feature, so I'll leave the design to those that do - just don't impose new rules like "you can't add wheels to an already released version" that affect people who don't use hash pinning in the process of fixing the issue. |
It happens because pip decides the wheel is superior to source dist, and if the hash for the wheel isn't in the Put differently, If
Source: https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode I can't find were it says: "Hashes are required for all release artifacts" though .. |
This is harder to do than it appears on the surface because the hash checking mode is based off of the hash of the file we're downloading, but we don't know that hash until we've downloaded it so there is a Catch-22. We could possibly let PyPI optionally tell us the hashes of the files and then if PyPI provides them then we can use that to make a better guess about what files are acceptable (while obviously still actually verifying what file we finally download). |
I just got bit by this. The maintainers of As I'd generated my |
For the "All packages for a release should be released as a single unit" option, see pypi/warehouse#720 The general concept there is to start releases in some form of staging mode that allows artifacts to be freely added and removed (but doesn't make them available through the regular download APIs), and then locks them against further changes (including addition of new artifacts) once they're promoted to fully released. I don't think we want to prohibit backfilling wheel archives for earlier releases though, so I like the idea of falling back to checking the sdist hash if the wheel hash doesn't match. (Adding a query API for artifact hashes wouldn't be as helpful, since it would take a while for caching proxies to catch up, whereas "try the sdist if the wheel hash fails to validate" is a client-side only change) |
That sounds reasonable. Is there some sort of hole there where it would make builds less reproducible, though? Say initially we resolve to hash A for some wheel, then later another wheel makes us resolve to hash B, in which case the fallback logic might point us to the sdist with hash C? For the build to be fully reproducible, I think there ought to be the option to get back exactly the same wheel. |
Thinking about this a bit more, it's possible that the implementations for getting hashes in pip-tools and Pipenv are just not quite right. Currently, both tools lock down all hashes available for a package version on PyPI. For packages with an sdist or a universal wheel, I probably do want to lock down the hash for that specific artifact. If my dev environment (or original build environment) is using the universal wheel, I want to keep using that universal wheel – I don't want to fall back to the sdist without action on my part, because the point of using hashes is to not do so. It's tricky in a pragmatic sense for packages that aren't pure Python, though. For a typical dev setup where dev boxes use OS X, but deployment boxes use Linux, there isn't really a good answer, I think, unless I want to always use the sdist, but not all packages even have that (e.g. tensorflow-gpu, but also building NumPy from source is just not a great idea). I don't really have an answer here in general, but I do think that if I'm currently using a universal wheel, I should be able to continue using that universal wheel, even if a more specific wheel becomes available. |
You should be able to control this with the |
@Kentzo That wouldn't solve the problem where a new wheel could cause a build failure when using |
@taion I'd expect pip to always bring a distribution with a matching hash. That was the topic of my original issue. The |
I am a +1 to this. This is basically a bug in pip's current version pinning logic. I'm willing to fix this as a part of bringing in a proper resolver, so I'll assign this to myself. If someone else wants to fix this before that, I'm happy to help with that too. :) |
I took a look at this. The patch required (if my assessment is correct) would benefit from #5971, so I’ll probably wait to see whether that works out. A question to people interested in this: What would be the expected error if hashes are provided, but no files match any of them? Hash mismatch (same as the one raised currently), or No matching distribution found (as the one if there’s no applicable files)? |
Thanks for reopening and working on this! Cool. So my main goal is to get to 100% reproducible builds (to the single bit .. I am aware this probably has more to it than only this issue).
should only ever consider the artifacts for which there are hashes in Essentially, my argument is: when there is no hash, it shouldn't be considered eligible no matter what. Now, rgd your Q: I don't care much about the specific error raised: reuse existing (a) "hash mismatch" vs (b) "no matching distribution" are both fine. As long as the That being said: (b) seems more explicit, as given the fixed and sealed set of hashes in |
Description:
I released manylinux1 wheel in this week.
It broke crossbar build 1.
They used hash checking for sdist and pip download manylinux1 wheel.
Could pip retry sdist when hash mismatch happens?
cc: @oberstet
The text was updated successfully, but these errors were encountered: