You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the issue tracker and believe that this is not a duplicate.
Make sure you run commands with -v flag before pasting the output.
Context
I'm trying to build a file-cache plugin for CI to avoid downloading from PYPI, instead hitting a local GCS bucket (or other file-based backend). This works by running a minimal resolve + link resolution (mimicking what happens up to and during PreparedCandidate.obtain), then filling the cache with only those files. Works awesome 99% of the time - except for a single 200 MB wheel, which is unfortunately the most important one due to the size. Even if I fill it properly, PDM replaces the file with a freshly downloaded variant on each run.
I'm 99% sure at this point that this is a misconfiguration somewhere leading PDM/Unearth/cachecontrol to ignore the ETag and/or Last-Modified headers. I'm posting here while investigating (and in-case it's a deliberate choice).
This is an example header configuration:
accept-ranges: bytes
age: 683
content-length: 199261216
content-type: binary/octet-stream
date: Sat, 05 Nov 2022 12:35:14 GMT
etag: "7dd8fd1edf6f171be8efc30773e1af3d-24"
last-modified: Thu, 27 Jan 2022 19:11:45 GMT
server: AmazonS3
via: 1.1 432d52d55ad517cddd9081b248b2f116.cloudfront.net (CloudFront)
x-amz-cf-id: 3AIysyU1PZd2Nu7QrgTD12CqDwVjkH-F5WhbifPv8nNCwaVm2sSQgA==
x-amz-cf-pop: ARN54-C1
x-amz-version-id: uTkfkquaVRwOzsDfqejmY0gUwSY.ZOer
x-cache: Hit from cloudfront
It looks like etag and and last-modified are constant across invocations from a bunch of different in Europe since yesterday evening, and I'd expect that to be enough to cache the file.
Furthermore; since the content of the file on disk (as written by cachecontrol) changes due to the date header; my cache also gets busted - I cache on sha256 sum, so if PDM changes a file on disk during the run I also invalidate my cache.
Note that there is a freeze when it downloads torch indicating that the download is actually happen even during the second run. (It also does an unnecessary reinstall; which might be different issue - but maybe not.)
Actual behavior
The following is a double run. Note that Unearth downloads both torch and pdm-pep517. At least torch is an actual download, I'm not sure if pdm-pep517 hits the cache.
$ pdm install -v
Synchronizing working set with lock file: 0 to add, 1 to update, 0 to remove
pdm.termui: Removing distribution torch
unearth: Downloading <Link https://download.pytorch.org/whl/cpu/torch-1.10.2%2Bcpu-cp38-cp38-linux_x86_64.whl (from None)> to /tmp/pdm-build-uo1er8rh/torch-1.10.2+cpu-cp38-cp38-linux_x86_64.whl
✔ Update torch 1.10.2+cpu -> 1.10.2 successful
Installing the project as an editable package...
pdm.termui: Preparing isolated env for PEP 517 build...
pdm.termui: Running PEP 517 backend to build a wheel for <Link file:///home/ts/Repositories/pdm-repros/constant-redownload (from None)>
pdm.termui: ======== Start resolving requirements ========
pdm.termui: pdm-pep517>=1.0.0
pdm.termui: python>=3.8.10,<3.8.11
pdm.termui: Adding requirement pdm-pep517>=1.0.0
unearth: Skip https://download.pytorch.org/whl/cpu/pdm-pep517/ because of Client Error(403): Forbidden.
pdm.termui: Adding requirement python>=3.8.10,<3.8.11
pdm.termui: ======== Starting round 0 ========
pdm.termui: Pinning: python None
pdm.termui: ======== Ending round 0 ========
pdm.termui: ======== Starting round 1 ========
pdm.termui: Pinning: pdm-pep517 1.0.5
pdm.termui: ======== Ending round 1 ========
pdm.termui: ======== Starting round 2 ========
pdm.termui: ======== Resolution Result ========
pdm.termui: Stable pins:
pdm.termui: python None
pdm.termui: pdm-pep517 1.0.5
pdm.termui: Installing pdm-pep517 1.0.5
unearth: Downloading <Link https://files.pythonhosted.org/packages/fb/7b/249d92feb0f897b0a163776f45564a8e0ef95a378157fea1ed91887dba5c/pdm_pep517-1.0.5-py3-none-any.whl (from None)> to /tmp/pdm-build-jja4g4xd/pdm_pep517-1.0.5-py3-none-any.whl
pdm.termui: /tmp/pdm-build-env-fbhez7l0-shared/lib/python3.8/site-packages/pdm/pep517/wheel.py:231: PDMWarning: No license files are matched with glob patterns ['LICENSES/*', 'LICEN[CS]E*', 'COPYING*', 'NOTICE*', 'AUTHORS*'].
pdm.termui: for license_file in self.find_license_files():
pdm.termui: - Adding wasteful_redownload.pth
pdm.termui: - Adding wasteful_redownload-0.0.0.dist-info/WHEEL
pdm.termui: - Adding wasteful_redownload-0.0.0.dist-info/METADATA
✔ Install wasteful-redownload 0.0.0 successful
🎉 All complete!
$ pdm install -v
Synchronizing working set with lock file: 0 to add, 1 to update, 0 to remove
pdm.termui: Removing distribution torch
unearth: Downloading <Link https://download.pytorch.org/whl/cpu/torch-1.10.2%2Bcpu-cp38-cp38-linux_x86_64.whl (from None)> to /tmp/pdm-build-6eki9_rp/torch-1.10.2+cpu-cp38-cp38-linux_x86_64.whl
✔ Update torch 1.10.2+cpu -> 1.10.2 successful
Installing the project as an editable package...
pdm.termui: Removing distribution wasteful-redownload
pdm.termui: Preparing isolated env for PEP 517 build...
pdm.termui: Running PEP 517 backend to build a wheel for <Link file:///home/ts/Repositories/pdm-repros/constant-redownload (from None)>
pdm.termui: ======== Start resolving requirements ========
pdm.termui: pdm-pep517>=1.0.0
pdm.termui: python>=3.8.10,<3.8.11
pdm.termui: Adding requirement pdm-pep517>=1.0.0
unearth: Skip https://download.pytorch.org/whl/cpu/pdm-pep517/ because of Client Error(403): Forbidden.
pdm.termui: Adding requirement python>=3.8.10,<3.8.11
pdm.termui: ======== Starting round 0 ========
pdm.termui: Pinning: python None
pdm.termui: ======== Ending round 0 ========
pdm.termui: ======== Starting round 1 ========
pdm.termui: Pinning: pdm-pep517 1.0.5
pdm.termui: ======== Ending round 1 ========
pdm.termui: ======== Starting round 2 ========
pdm.termui: ======== Resolution Result ========
pdm.termui: Stable pins:
pdm.termui: python None
pdm.termui: pdm-pep517 1.0.5
pdm.termui: Installing pdm-pep517 1.0.5
unearth: Downloading <Link https://files.pythonhosted.org/packages/fb/7b/249d92feb0f897b0a163776f45564a8e0ef95a378157fea1ed91887dba5c/pdm_pep517-1.0.5-py3-none-any.whl (from None)> to /tmp/pdm-build-s2z37fii/pdm_pep517-1.0.5-py3-none-any.whl
pdm.termui: /tmp/pdm-build-env-x5gxh5dz-shared/lib/python3.8/site-packages/pdm/pep517/wheel.py:231: PDMWarning: No license files are matched with glob patterns ['LICENSES/*', 'LICEN[CS]E*', 'COPYING*', 'NOTICE*', 'AUTHORS*'].
pdm.termui: for license_file in self.find_license_files():
pdm.termui: - Adding wasteful_redownload.pth
pdm.termui: - Adding wasteful_redownload-0.0.0.dist-info/WHEEL
pdm.termui: - Adding wasteful_redownload-0.0.0.dist-info/METADATA
✔ Update wasteful-redownload 0.0.0 -> 0.0.0 successful
This isn't just a visual issue; the cache file is being updated and changing:
$ ls -alh ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
-rw-r--r-- 1 ts ts 191M Nov 5 13:16
~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
$ sha256sum ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
504202d3f9e7f717ac2217d7919d31b1a9c30524e015ad10cae101e799132d8d ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
# snip reinstall
$ ls -alh ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
-rw-r--r-- 1 ts ts 191M Nov 5 13:20
~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
$ sha256sum ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
7aa7433b42fe341efaacbef5087195cd17b4d52efaeb02cde8a708bfe95616e9 ~/.cache/pdm/http/2/d/a/f/2/2daf262723282e1487be1ac3815fbad33fb0020aa797bb6442a2f301
So the file is being touched and changing.
Expected behavior
Files downloaded with ETag or Last-Modified headers should be cached properly.
Interestingly, the issue goes away (at least locally) if I change my requirements to match the local tag that gets resolved in that case. Does PDM actively delete files from the cache if reinstalling or some such?
OK; so there's two bugs that are interplaying here:
in PDM, if I'm depending on torch==1.10.2 then if 1.10.2+cpu is installed it'll reinstall completely *even if that is the version being reinstalled
In cache-control it'll write new headers even if it got a 304 back
The first is the cause of the stall, the latter is the source of the file-modification. I'll open a new bug for the specific issue of reinstalling the same version...
Make sure you run commands with
-v
flag before pasting the output.Context
I'm trying to build a file-cache plugin for CI to avoid downloading from PYPI, instead hitting a local GCS bucket (or other file-based backend). This works by running a minimal resolve + link resolution (mimicking what happens up to and during PreparedCandidate.obtain), then filling the cache with only those files. Works awesome 99% of the time - except for a single 200 MB wheel, which is unfortunately the most important one due to the size. Even if I fill it properly, PDM replaces the file with a freshly downloaded variant on each run.
I'm 99% sure at this point that this is a misconfiguration somewhere leading PDM/Unearth/cachecontrol to ignore the ETag and/or Last-Modified headers. I'm posting here while investigating (and in-case it's a deliberate choice).
This is an example header configuration:
It looks like etag and and last-modified are constant across invocations from a bunch of different in Europe since yesterday evening, and I'd expect that to be enough to cache the file.
Furthermore; since the content of the file on disk (as written by cachecontrol) changes due to the
date
header; my cache also gets busted - I cache on sha256 sum, so if PDM changes a file on disk during the run I also invalidate my cache.Steps to reproduce
Checkout this repository: https://github.com/tgolsson/pdm-repros/tree/main/constant-redownload
Following the readme; run the following:
Note that there is a freeze when it downloads torch indicating that the download is actually happen even during the second run. (It also does an unnecessary reinstall; which might be different issue - but maybe not.)
Actual behavior
The following is a double run. Note that Unearth downloads both torch and pdm-pep517. At least torch is an actual download, I'm not sure if
pdm-pep517
hits the cache.This isn't just a visual issue; the cache file is being updated and changing:
So the file is being touched and changing.
Expected behavior
Files downloaded with ETag or Last-Modified headers should be cached properly.
Environment Information
The text was updated successfully, but these errors were encountered: