Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDM doesn't cache wheels based on ETag/Last-Modified headers #1496

Closed
1 task done
tgolsson opened this issue Nov 5, 2022 · 2 comments
Closed
1 task done

PDM doesn't cache wheels based on ETag/Last-Modified headers #1496

tgolsson opened this issue Nov 5, 2022 · 2 comments
Labels
🐛 bug Something isn't working

Comments

@tgolsson
Copy link
Contributor

tgolsson commented Nov 5, 2022

  • I have searched the issue tracker and believe that this is not a duplicate.

Make sure you run commands with -v flag before pasting the output.

Context

I'm trying to build a file-cache plugin for CI to avoid downloading from PYPI, instead hitting a local GCS bucket (or other file-based backend). This works by running a minimal resolve + link resolution (mimicking what happens up to and during PreparedCandidate.obtain), then filling the cache with only those files. Works awesome 99% of the time - except for a single 200 MB wheel, which is unfortunately the most important one due to the size. Even if I fill it properly, PDM replaces the file with a freshly downloaded variant on each run.

I'm 99% sure at this point that this is a misconfiguration somewhere leading PDM/Unearth/cachecontrol to ignore the ETag and/or Last-Modified headers. I'm posting here while investigating (and in-case it's a deliberate choice).

This is an example header configuration:

accept-ranges: bytes
age: 683
content-length: 199261216
content-type: binary/octet-stream
date: Sat, 05 Nov 2022 12:35:14 GMT
etag: "7dd8fd1edf6f171be8efc30773e1af3d-24"
last-modified: Thu, 27 Jan 2022 19:11:45 GMT
server: AmazonS3
via: 1.1 432d52d55ad517cddd9081b248b2f116.cloudfront.net (CloudFront)
x-amz-cf-id: 3AIysyU1PZd2Nu7QrgTD12CqDwVjkH-F5WhbifPv8nNCwaVm2sSQgA==
x-amz-cf-pop: ARN54-C1
x-amz-version-id: uTkfkquaVRwOzsDfqejmY0gUwSY.ZOer
x-cache: Hit from cloudfront

It looks like etag and and last-modified are constant across invocations from a bunch of different in Europe since yesterday evening, and I'd expect that to be enough to cache the file.

Furthermore; since the content of the file on disk (as written by cachecontrol) changes due to the date header; my cache also gets busted - I cache on sha256 sum, so if PDM changes a file on disk during the run I also invalidate my cache.

Steps to reproduce

Checkout this repository: https://github.com/tgolsson/pdm-repros/tree/main/constant-redownload

Following the readme; run the following:

$ pdm lock
$ pdm sync -v
$ pdm sync -v

Note that there is a freeze when it downloads torch indicating that the download is actually happen even during the second run. (It also does an unnecessary reinstall; which might be different issue - but maybe not.)

Actual behavior

The following is a double run. Note that Unearth downloads both torch and pdm-pep517. At least torch is an actual download, I'm not sure if pdm-pep517 hits the cache.

$ pdm install -v
Synchronizing working set with lock file: 0 to add, 1 to update, 0 to remove

pdm.termui: Removing distribution torch
unearth: Downloading <Link https://download.pytorch.org/whl/cpu/torch-1.10.2%2Bcpu-cp38-cp38-linux_x86_64.whl (from None)> to /tmp/pdm-build-uo1er8rh/torch-1.10.2+cpu-cp38-cp38-linux_x86_64.whl
  ✔ Update torch 1.10.2+cpu -> 1.10.2 successful
Installing the project as an editable package...
pdm.termui: Preparing isolated env for PEP 517 build...
pdm.termui: Running PEP 517 backend to build a wheel for <Link file:///home/ts/Repositories/pdm-repros/constant-redownload (from None)>
pdm.termui: ======== Start resolving requirements ========
pdm.termui:   pdm-pep517>=1.0.0
pdm.termui:   python>=3.8.10,<3.8.11
pdm.termui:   Adding requirement pdm-pep517>=1.0.0
unearth: Skip https://download.pytorch.org/whl/cpu/pdm-pep517/ because of Client Error(403): Forbidden.
pdm.termui:   Adding requirement python>=3.8.10,<3.8.11
pdm.termui: ======== Starting round 0 ========
pdm.termui: Pinning: python None
pdm.termui: ======== Ending round 0 ========
pdm.termui: ======== Starting round 1 ========
pdm.termui: Pinning: pdm-pep517 1.0.5
pdm.termui: ======== Ending round 1 ========
pdm.termui: ======== Starting round 2 ========
pdm.termui: ======== Resolution Result ========
pdm.termui: Stable pins:
pdm.termui:       python None
pdm.termui:   pdm-pep517 1.0.5
pdm.termui: Installing pdm-pep517 1.0.5
unearth: Downloading <Link https://files.pythonhosted.org/packages/fb/7b/249d92feb0f897b0a163776f45564a8e0ef95a378157fea1ed91887dba5c/pdm_pep517-1.0.5-py3-none-any.whl (from None)> to /tmp/pdm-build-jja4g4xd/pdm_pep517-1.0.5-py3-none-any.whl
pdm.termui: /tmp/pdm-build-env-fbhez7l0-shared/lib/python3.8/site-packages/pdm/pep517/wheel.py:231: PDMWarning: No license files are matched with glob patterns ['LICENSES/*', 'LICEN[CS]E*', 'COPYING*', 'NOTICE*', 'AUTHORS*'].
pdm.termui:   for license_file in self.find_license_files():
pdm.termui:  - Adding wasteful_redownload.pth
pdm.termui:  - Adding wasteful_redownload-0.0.0.dist-info/WHEEL
pdm.termui:  - Adding wasteful_redownload-0.0.0.dist-info/METADATA
  ✔ Install wasteful-redownload 0.0.0 successful

🎉 All complete!
$ pdm install -v
Synchronizing working set with lock file: 0 to add, 1 to update, 0 to remove

pdm.termui: Removing distribution torch
unearth: Downloading <Link https://download.pytorch.org/whl/cpu/torch-1.10.2%2Bcpu-cp38-cp38-linux_x86_64.whl (from None)> to /tmp/pdm-build-6eki9_rp/torch-1.10.2+cpu-cp38-cp38-linux_x86_64.whl
  ✔ Update torch 1.10.2+cpu -> 1.10.2 successful
Installing the project as an editable package...
pdm.termui: Removing distribution wasteful-redownload
pdm.termui: Preparing isolated env for PEP 517 build...
pdm.termui: Running PEP 517 backend to build a wheel for <Link file:///home/ts/Repositories/pdm-repros/constant-redownload (from None)>
pdm.termui: ======== Start resolving requirements ========
pdm.termui:   pdm-pep517>=1.0.0
pdm.termui:   python>=3.8.10,<3.8.11
pdm.termui:   Adding requirement pdm-pep517>=1.0.0
unearth: Skip https://download.pytorch.org/whl/cpu/pdm-pep517/ because of Client Error(403): Forbidden.
pdm.termui:   Adding requirement python>=3.8.10,<3.8.11
pdm.termui: ======== Starting round 0 ========
pdm.termui: Pinning: python None
pdm.termui: ======== Ending round 0 ========
pdm.termui: ======== Starting round 1 ========
pdm.termui: Pinning: pdm-pep517 1.0.5
pdm.termui: ======== Ending round 1 ========
pdm.termui: ======== Starting round 2 ========
pdm.termui: ======== Resolution Result ========
pdm.termui: Stable pins:
pdm.termui:       python None
pdm.termui:   pdm-pep517 1.0.5
pdm.termui: Installing pdm-pep517 1.0.5
unearth: Downloading <Link https://files.pythonhosted.org/packages/fb/7b/249d92feb0f897b0a163776f45564a8e0ef95a378157fea1ed91887dba5c/pdm_pep517-1.0.5-py3-none-any.whl (from None)> to /tmp/pdm-build-s2z37fii/pdm_pep517-1.0.5-py3-none-any.whl
pdm.termui: /tmp/pdm-build-env-x5gxh5dz-shared/lib/python3.8/site-packages/pdm/pep517/wheel.py:231: PDMWarning: No license files are matched with glob patterns ['LICENSES/*', 'LICEN[CS]E*', 'COPYING*', 'NOTICE*', 'AUTHORS*'].
pdm.termui:   for license_file in self.find_license_files():
pdm.termui:  - Adding wasteful_redownload.pth
pdm.termui:  - Adding wasteful_redownload-0.0.0.dist-info/WHEEL
pdm.termui:  - Adding wasteful_redownload-0.0.0.dist-info/METADATA
  ✔ Update wasteful-redownload 0.0.0 -> 0.0.0 successful

This isn't just a visual issue; the cache file is being updated and changing:

$ ls -alh ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
-rw-r--r-- 1 ts ts 191M Nov  5 13:16 
~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
$ sha256sum ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
504202d3f9e7f717ac2217d7919d31b1a9c30524e015ad10cae101e799132d8d  ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301

# snip reinstall

$ ls -alh ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
-rw-r--r-- 1 ts ts 191M Nov  5 13:20 
~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
$ sha256sum ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
7aa7433b42fe341efaacbef5087195cd17b4d52efaeb02cde8a708bfe95616e9  ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301

So the file is being touched and changing.

Expected behavior

Files downloaded with ETag or Last-Modified headers should be cached properly.

Environment Information

# Paste the output of `pdm info && pdm info --env` below:
PDM version:
  2.2.1
Python Interpreter:
  /home/ts/Repositories/pdm-repros/constant-redownload/.venv/bin/python (3.8)
Project Root:
  /home/ts/Repositories/pdm-repros/constant-redownload
Project Packages:
  None
{
  "implementation_name": "cpython",
  "implementation_version": "3.8.10",
  "os_name": "posix",
  "platform_machine": "x86_64",
  "platform_release": "5.15.68.1-microsoft-standard-WSL2",
  "platform_system": "Linux",
  "platform_version": "#1 SMP Mon Sep 19 19:14:52 UTC 2022",
  "python_full_version": "3.8.10",
  "platform_python_implementation": "CPython",
  "python_version": "3.8",
  "sys_platform": "linux"
}
@tgolsson tgolsson added the 🐛 bug Something isn't working label Nov 5, 2022
@tgolsson
Copy link
Contributor Author

tgolsson commented Nov 5, 2022

Interestingly, the issue goes away (at least locally) if I change my requirements to match the local tag that gets resolved in that case. Does PDM actively delete files from the cache if reinstalling or some such?

@tgolsson
Copy link
Contributor Author

tgolsson commented Nov 5, 2022

OK; so there's two bugs that are interplaying here:

  • in PDM, if I'm depending on torch==1.10.2 then if 1.10.2+cpu is installed it'll reinstall completely *even if that is the version being reinstalled
  • In cache-control it'll write new headers even if it got a 304 back

The first is the cause of the stall, the latter is the source of the file-modification. I'll open a new bug for the specific issue of reinstalling the same version...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant