Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shouldn't pip fail installing modified wheel file? (checking hash against RECORD) #2752

Closed
m1keil opened this issue May 5, 2015 · 15 comments
Labels
auto-locked Outdated issues that have been locked by automation C: wheel The wheel format and 'pip wheel' command type: security Has potential security implications

Comments

@m1keil
Copy link

m1keil commented May 5, 2015

Hello,
I've noticed that unpacking wheel, making some edits in its files and zipping everything right back without updating the RECORD file, doesn't result on any warnings or errors from pip.

Reading PEP 0427 it's not clear to me if the hash records in dist-info/RECORD should always be verified or only when wheel is signed. In case it's the second option, sorry for the noise :-)

Small example just in case:

(tmpenv)wheel > curl -sLO https://pypi.python.org/packages/2.7/c/certifi/certifi-2015.04.28-py2.py3-none-any.whl

(tmpenv)wheel > unzip certifi-2015.04.28-py2.py3-none-any.whl
Archive:  certifi-2015.04.28-py2.py3-none-any.whl
  inflating: certifi/__init__.py
  ...
  ...

(tmpenv)wheel > mv certifi-2015.04.28-py2.py3-none-any.whl{,.orig}

(tmpenv)wheel > echo -e '\n#just a friendly comment' >> certifi/__init__.py
(tmpenv)wheel > cat certifi/__init__.py
from .core import where
#just a friendly comment

(tmpenv)wheel > zip -r certifi-2015.04.28-py2.py3-none-any.whl certifi certifi-2015.04.28.dist-info/
  adding: certifi/ (stored 0%)
  adding: certifi/__init__.py (stored 0%)
  ...
  ...

(tmpenv)wheel > pip install certifi-2015.04.28-py2.py3-none-any.whl
Processing ./certifi-2015.04.28-py2.py3-none-any.whl
Installing collected packages: certifi
Successfully installed certifi-2015.4.28
(tmpenv)wheel > python -m certifi
certifi/cacert.pem

(tmpenv)wheel > pip list
certifi (2015.4.28)
pip (6.1.1)
setuptools (15.0)
@pfmoore
Copy link
Member

pfmoore commented May 5, 2015

The RECORD file is intended to provide an integrity check (by means of the checksums) as you say. There are two places where checks can be made:

  1. On installation, to verify that the contents of the wheel match the embedded RECORD file.
  2. On uninstallation, to verify that the files (from RECORD) that are being removed have not been modified since installation.

However, installers are tot required by the PEP to do these checks, and indeed pip currently does neither check. I don't think there's any problem in principle with adding such checks, it's just not been done. They are of limited value in practice, of course, because anyone with access to the files to modify them also has access to the RECORD file to update the hash. So it provides no protection against attackers, and little protection against accidental change (it's hard to imagine how someone could accidentally unzip a wheel, modify a file, and re-zip the wheel...)

One possible issue with including hash checks is that for a large wheel (think something like scipy at over 85MB for Christoph Gohlke's Windows wheels) calculating hashes for every file would take a non-trivial amount of time.

Basically, not enough value to justify the cost of adding the checks.

@m1keil
Copy link
Author

m1keil commented May 5, 2015

I see, thank you very much for clear and detailed answer.

@m1keil m1keil closed this as completed May 5, 2015
@dholth
Copy link
Member

dholth commented May 5, 2015

The specification does require hash checks on install. I know we didn't use the word MUST in all caps. It doesn't take very long to compute SHA256.

https://www.python.org/dev/peps/pep-0427/#the-dist-info-directory

"During extraction, wheel installers verify all the hashes in RECORD against the file contents. Apart from RECORD and its signatures, installation will fail if any file in the archive is not both mentioned and correctly hashed in RECORD. "

@pfmoore
Copy link
Member

pfmoore commented May 5, 2015

Patches would be welcome, obviously :-) Personally, I'd like to see a benchmark against an 80MB+ wheel as well, but I could probably run that test for myself if a PR was submitted.

As a data point, it looks to me as if pip and "wheel install" don't check hashes on install but distlib does. The fact that distlib does is empirical evidence that the overhead of checking isn't an issue. (I'm still not 100% sure precisely what damage it's intended to protect against, though - after all, it appears that the OP was trying to do something that the check would have prevented unless he updated the hashes, so in that case the check would actually have made his life harder).

@dstufft
Copy link
Member

dstufft commented May 5, 2015

For the record, sha256 can get well over 100 MiB/s.

I don't believe it provides any meaning full protection against any attack. It may have another purpose

On May 5, 2015, at 11:37 AM, Paul Moore [email protected] wrote:

Patches would be welcome, obviously :-) Personally, I'd like to see a benchmark against an 80MB+ wheel as well, but I could probably run that test for myself if a PR was submitted.

As a data point, it looks to me as if pip and "wheel install" don't check hashes on install but distlib does. The fact that distlib does is empirical evidence that the overhead of checking isn't an issue. (I'm still not 100% sure precisely what damage it's intended to protect against, though - after all, it appears that the OP was trying to do something that the check would have prevented unless he updated the hashes, so in that case the check would actually have made his life harder).


Reply to this email directly or view it on GitHub.

@dholth
Copy link
Member

dholth commented May 5, 2015

It is designed this way so that you can sign RECORD to verify the integrity of the archive and include the signature inside the wheel file so that it travels with the archive and is much more likely to be available when you install, similar to jar files.

wheel install itself does check the hashes by using a ZipFile subclass. https://bitbucket.org/pypa/wheel/src/tip/wheel/install.py?at=default#cl-413

@dholth dholth reopened this May 5, 2015
@pfmoore
Copy link
Member

pfmoore commented May 5, 2015

wheel install itself does check the hashes by using a ZipFile subclass.

Ah. I thought it only did that for signed wheels. It never occurred to me that ZipFile.read() would raise KeyError if a member didn't exist in the zipfile. That's bizarre :-( But now that I've worked that out, I see what's going on. Apologies for the misinformation.

@m1keil
Copy link
Author

m1keil commented May 5, 2015

Thanks for further explanations.

Just to clarify my initial intentions: I wasn't trying to do anything
smart. It was just a hacky solution to some problem I was trying to solve.

But after I saw this actually works, it made me wonder why it did... And if
it works, should I worry if it will break in the future due to someone
fixing/implementing this bug/feature.

So the bottom line for me is: if I edit wheel file directly, I should
update RECORD. :-)
On May 5, 2015 21:14, "Paul Moore" [email protected] wrote:

wheel install itself does check the hashes by using a ZipFile subclass.

Ah. I thought it only did that for signed wheels. It never occurred to me
that ZipFile.read() would raise KeyError if a member didn't exist in the
zipfile. That's bizarre :-( But now that I've worked that out, I see
what's going on. Apologies for the misinformation.


Reply to this email directly or view it on GitHub
#2752 (comment).

@dholth
Copy link
Member

dholth commented May 6, 2015

We should build a simple tool to zip up a directory while updating the RECORD

@Russell-Jones
Copy link

pip wheel --dismantle <wheel url/path> and pip wheel --reassemble <local wheel path> ?

@pfmoore But they're wheels, not boxes :(
@dholth ah yes, python -m wheel -h Oh well, best to be consistent.

@pfmoore
Copy link
Member

pfmoore commented May 8, 2015

... or wheel unpack and wheel pack. I don't think they are a good fit as pip subcommands, but they seem like a reasonable idea for subcommands provided by the wheel project.

@dholth
Copy link
Member

dholth commented May 8, 2015

wheel unpack is already there. we would just have to add the pack command or even 'update record' in which case you could zip it yourself.

@m1keil
Copy link
Author

m1keil commented May 8, 2015

wheel pack would be ideal from user's side view.

On Fri, May 8, 2015 at 10:51 PM, Daniel Holth [email protected]
wrote:

wheel unpack is already there. we would just have to add the pack command
or even 'update record' in which case you could zip it yourself.


Reply to this email directly or view it on GitHub
#2752 (comment).

@pradyunsg pradyunsg added type: security Has potential security implications C: wheel The wheel format and 'pip wheel' command labels Oct 17, 2017
@byronyi
Copy link

byronyi commented Jun 9, 2019

Any updates from this thread? We recently hit an issue for reorganizing the packed wheel and rezipping wheel isn't the best way to do it. See tensorflow/tensorflow#29561 for context.

@chrahunt
Copy link
Member

chrahunt commented Sep 2, 2019

Hi @byronyi. The wheel pack command was implemented in pypa/wheel#157 so should be usable now as part of the wheel tool.

The general issue of checking hashes on installation is also tracked by #4705. I'll close this issue in favor of that one since it is a bit more focused.

@chrahunt chrahunt closed this as completed Sep 2, 2019
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Oct 2, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: wheel The wheel format and 'pip wheel' command type: security Has potential security implications
Projects
None yet
Development

No branches or pull requests

8 participants