Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get hash from link #1670

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions piptools/repositories/pypi.py
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,9 @@ def _get_matching_candidates(
return candidates_by_version[matching_versions[0]]

def _get_file_hash(self, link: Link) -> str:
if link.hash_name == FAVORITE_HASH:
Copy link
Member

@atugushev atugushev Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this with pip-22.2 and it works fine:

# requirements.in
--index-url https://m.devpi.net/root/pypi/+simple/

numpy
root@py311:/# pip --version
pip 22.2 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)

root@py311:/# pip-compile --generate-hashes -v
Using indexes:
  https://m.devpi.net/root/pypi/+simple/

                          ROUND 1
Current constraints:
  numpy (from -r requirements.in (line 3))

Finding the best candidates:
  found candidate numpy==1.23.5 (constraint was <any>)

Finding secondary dependencies:
  numpy==1.23.5             requires -
------------------------------------------------------------
Result of round 1: stable, done

Generating hashes:
  numpy
    Getting hash from link numpy-1.23.5-cp311-cp311-macosx_11_0_arm64.whl
    Getting hash from link numpy-1.23.5.tar.gz
    Getting hash from link numpy-1.23.5-cp39-cp39-win_amd64.whl
    Getting hash from link numpy-1.23.5-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
...

But with pip-22.3 it doesn't work:

root@py311:/# pip --version
pip 22.3.1 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)

root@py311:/# pip-compile --generate-hashes -v
Using indexes:
  https://m.devpi.net/root/pypi/+simple/

                          ROUND 1
Current constraints:
  numpy (from -r requirements.in (line 3))

Finding the best candidates:
  found candidate numpy==1.23.5 (constraint was <any>)

Finding secondary dependencies:
  numpy==1.23.5             requires -
------------------------------------------------------------
Result of round 1: stable, done

Generating hashes:
  numpy
    Hashing numpy-1.23.5-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
      |██                              |   6%^C

Using git bisect I've tracked down the problematic commit pypa/pip@bad03ef from pypa/pip#11111

Copy link
Member

@atugushev atugushev Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc'ing @cosmicexplorer for the insights. What could go wrong with Link.hash in 22.3?

Copy link

@hajapy hajapy Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a curious onlooker (not someone intimately familiar with pip internals), I think the above behaviour is coming from this part of the change: pypa/pip@bad03ef#diff-3919ca53335487395177edeffff8b60bd360f1ad18c95593287f6814b9ecefb8R403-L209.

In pip 22.3 and pip 22.2, I think the internal Link._hashes dict contains the "sha256" key but as of 22.3, that inner dict is not checked for the hash in the Link.hash_name property.

I think in the simple api, in the json form the hashes will come from the hashes dict, whereas in the http form it will be in the url fragment, and I think your above test (with index https://m.devpi.net/root/pypi/+simple/) is returning the json form of the simple api. Only for the latter will HashLink (as of pip 22.3) be populated with a hash/hash_name, so I think there is a regression in functionality for the simple api + json response introduced by pypa/pip#11111.

I think the check here could be if FAVORITE_HASH in link._hashes and that would work for both pip 22.2 and pip 22.3. Accessing the value would be link._hashes[FAVORITE_HASH]. Having to use that internal dict seems suboptimal :/

FWIW, the current implementation is still working on pip 22.3 with aws codeartifact as the index, which seems to be going down the path of responding not as json but as http and using Link.from_element rather than Link.from_json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it and update the PR to support both pip < 22.3 and >= 22.3.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pypa/pip@0233bf2 once again changes the internal logic for retrieving the hash_name which should restore the functionality as before.

Could we keep the change as-is and note that there is a still an inefficiency for affected pip versions and artifact stores that use the json form of the simple api? IMO, this is no worse than what we have today in this one specific exception and better in all other scenarios.

Alternatively we could use a workaround for the affected pip versions to have parity across all pip versions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this comment seems to confirm that hashes were missing from json pypa/pip#11692 (comment) and it was an issue in more ways for pip than just this proposed change.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update on this / interest in reviving it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, haven't had time to work on this. Our automation basically uses this branch therefore the pressure to get it upstream is low :/

I'll try to find some time, but I'm also happy if someone else wants to take over.

log.debug(f"Getting hash from link {link.show_url}")
return ":".join([FAVORITE_HASH, link.hash])
log.debug(f"Hashing {link.show_url}")
h = hashlib.new(FAVORITE_HASH)
with open_local_or_remote_file(link, self.session) as f:
Expand Down
13 changes: 13 additions & 0 deletions tests/test_repository_pypi.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,19 @@ def test_get_file_hash_without_interfering_with_each_other(from_line, pypi_repos
)


def test_get_file_hash_from_link(pypi_repository):
assert (
pypi_repository._get_file_hash(
Link(
"https://example.com/simple/click/8.1.3/"
"click-8.1.3-py3-none-any.whl"
"#sha256=bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48"
)
)
== "sha256:bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48"
)


def test_get_hashes_editable_empty_set(from_editable, pypi_repository):
ireq = from_editable("git+https://github.com/django/django.git#egg=django")
assert pypi_repository.get_hashes(ireq) == set()
Expand Down