Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the pip-powered resolve in pex 2 will re-tokenize --find-links pages on each transitive requirement #887

Closed
cosmicexplorer opened this issue Feb 12, 2020 · 2 comments · Fixed by pex-tool/pip#5 or #890

Comments

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Feb 12, 2020

We've been running into an issue when trying to test the feature in pantsbuild/pants#8793 (inspired by #789) with pex versions later than 2. When the vendored pip resolver fetches html pages provided as --find-links arguments, it appears to always fetch with the cachecontrol header max-age: 0, which appears to mean it always re-fetches and re-tokenizes every --find-links html page every time it tries to resolve any requirement. This leads to extremely long resolve times when resolving against a large remote --find-links html page in pex 2 (12 minutes vs 1.5 minutes for a particular intransitive resolve).

Application of the max-age header: https://github.com/pantsbuild/pex/blob/e078b88659b2992c839c110ca3447f64e6838f08/pex/vendor/_vendored/pip/pip/_internal/index/collector.py#L130-L143

Output excerpt with -vvvvvvvvv:

Fetching project page and analyzing links: https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO
  Getting page https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO
  Looking up "https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO" in the cache
  Request header has "max_age" as 0, cache bypassed
  Resetting dropped connection: INTERNAL-URL.com
  https://INTERNAL-URL.com:443 "GET /INTERNAL-FIND-LINKS-REPO HTTP/1.1" 200 858
  Updating cache with response from "https://INTERNAL-URL.com/INTERNAL-FIND-LINKS-REPO"

Size of our large --find-links page (when saved to disk):

> wc -c wow.html
1577275 wow.html

Will try to post a repro if possible which avoids dumping our internal repo into a public issue. Note that pex 1 does not appear to support local fileystem paths to html pages with --find-links, so --find-links=wow.html in pex 2 is not directly comparable to the result in pex 1.

@cosmicexplorer
Copy link
Contributor Author

Ok, have managed to find a nice repro for this by generating a massive --find-links page. Executing chmod +x wow.sh && ./wow.sh from this gist: https://gist.github.com/cosmicexplorer/7066947bcb37d3f44863796841c76998 should demonstrate a sizable regression when resolving tensorflow==1.14.0 from pex 1.6.12 to pex 2.1.2.

@cosmicexplorer
Copy link
Contributor Author

Ok, a PR has been created upstream at pypa/pip#7729, and when the changes from our pip fork at pex-tool/pip#5 are merged in, this issue will be resolved.

@benjyw benjyw mentioned this issue Feb 14, 2020
2 tasks
benjyw pushed a commit to pex-tool/pip that referenced this issue Feb 14, 2020
benjyw added a commit to benjyw/pex that referenced this issue Feb 14, 2020
New version is still a patched 20.0.dev0, with a new patch
to fix a performance issue.

Fixes pex-tool#887.
@benjyw benjyw mentioned this issue Feb 14, 2020
@benjyw benjyw reopened this Feb 14, 2020
benjyw added a commit that referenced this issue Feb 15, 2020
New version is still a patched 20.0.dev0, with a new patch
to fix a performance issue.

Fixes #887.
jsirois pushed a commit to jsirois/pip that referenced this issue Feb 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants