-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
send HTTP caching headers for index pages to further reduce bandwidth usage #12257
base: main
Are you sure you want to change the base?
Conversation
011f02e
to
b658989
Compare
@ewdurbin @uranusjr (not sure who to tag): this change adds |
I'm actually going to take out the interpreter compatibility caching, since that part adds significant complexity without any effect on network bandwidth usage. |
4e361c4
to
2c0405b
Compare
5830ea0
to
28d65ce
Compare
Primary bandwidth concern for PyPI is files.pythonhosted.org, by about 15 times. But everything is marginal so this would have a consequential impact over time. Nice stuff! |
2ecc6c4
to
3987be0
Compare
d935efa
to
5cc8a36
Compare
5cc8a36
to
03490f9
Compare
6dd7837
to
5c40e19
Compare
This looks reasonable to me, assuming we accept the dependant PRs. (Still some discussion needed on #12186, it seems.) |
When performing `install --dry-run` and PEP 658 .metadata files are available to guide the resolve, do not download the associated wheels. Rather use the distribution information directly from the .metadata files when reporting the results on the CLI and in the --report file. - describe the new --dry-run behavior - finalize linked requirements immediately after resolve - introduce is_concrete - funnel InstalledDistribution through _get_prepared_distribution() too - add test for new install --dry-run functionality (no downloading)
- catch an exception when parsing metadata which only occurs in CI - handle --no-cache-dir - call os.makedirs() before writing to cache too - catch InvalidSchema when attempting git urls with BatchDownloader - fix other test failures - reuse should_cache(req) logic - gzip compress link metadata for a slight reduction in disk space - only cache built sdists - don't check should_cache() when fetching - cache lazy wheel dists - add news - turn debug logs in fetching from cache into exceptions - use scandir over listdir when searching normal wheel cache - handle metadata email parsing errors - correctly handle mutable cached requirement - use bz2 over gzip for an extremely slight improvement in disk usage
- pipe in headers arg - provide full context in Link.comes_from - pull in etag and date and cache the outputs - handle --no-cache-dir - add NEWS - remove quotes from etag and use binary checksum to save a few bytes - parse http modified date to compress the cached representation
5c40e19
to
f28ecfd
Compare
This PR is on top of #12256, see the
+316/-36
diff against it at https://github.com/cosmicexplorer/pip/compare/link-metadata-cache...cosmicexplorer:pip:link-parsing-cache?expand=1.Background: Learning More about HTTP Requests
After taking up a suggestion from @dholth in #12208 to consider handling a
304 Not Modified
response in HTTP requests, I began to consider whether we could make use of HTTP caching headers to further reduce the time we wait for the network, without reintroducing the delays to see new package uploads described in #5670.Proposal: Send HTTP Caching Headers and Record
ETag
This change records the
ETag
andDate
headers from the HTTP response, then sets theIf-None-Match
andIf-Modified-Since
headers on future requests against project pages (e.g. https://pypi.org/simple/tensorflow). This allows the server to respond with a zero-length304 Not Modified
instead of a several-hundred KB HTML page:Result: Slight Performance Improvement
Recording these HTTP headers adds only ~3KB of disk space after a large resolve:
It has only a very slight (3.4%) performance benefit on top of #12256, converting a 6.1 second resolve to 5.9 seconds:
But more importantly, as described above, it avoids making multiple ~600KB requests against pypi on each resolve.
TODO