Poetry downloading same wheels multiple times within a single invocation #2415

bb · 2020-05-13T22:12:18Z

I am on the latest Poetry version.
I have searched the issues of this repo and believe that this is not a duplicate.
If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).

OS version and name: macOS 10.14.6
Poetry version: 1.0.5
Link of a Gist with the contents of your pyproject.toml file: https://gist.github.com/bb/501f33ad3f35eb8c26ce2513ca6074c8

Issue

When adding a new dependency, it is downloaded multiple times; I observed three downloads, two of those are unneccessary.

Starting with a pyproject.toml as in the Gist given above, I run

poetry add https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl

Then I see the following output (XXX added as markers for explanation below):

Updating dependencies XXX
Resolving dependencies... (276.1s)

Writing lock file
XXX

Package operations: 0 installs, 7 updates, 0 removals

  - Updating certifi (2019.11.28 -> 2020.4.5.1)
  - Updating urllib3 (1.25.8 -> 1.25.9)
  - Updating asgiref (3.2.3 -> 3.2.7)
  - Updating pytz (2019.3 -> 2020.1)
  - Updating django (3.0.4 -> 3.0.6)
  - Updating hu-core-ud-lg (0.3.1 -> 0.3.1 https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl)
XXX  - Updating psycopg2-binary (2.8.4 -> 2.8.5)

At the positions where the marker XXX is inserted, the same 1.3GB download is done again and again.

Similar, when adding another package later, again XXX marks the cursor position when the big download is done:

$ poetry add djangorestframework

Using version ^3.11.0 for djangorestframework

Updating dependencies
Resolving dependencies... (0.4s)

Writing lock file
XXX

Package operations: 1 install, 1 update, 0 removals

  - Installing djangorestframework (3.11.0)
  - Updating hu-core-ud-lg (0.3.1 -> 0.3.1 https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.3.1/hu_core_ud_lg-0.3.1-py3-none-any.whl)
XXX

I'd expect the file to be downloaded a most once and reused.

Slightly related but different issues: #999, #2094

The text was updated successfully, but these errors were encountered:

dimbleby · 2022-06-18T13:23:27Z

The first two downloads happen in

poetry/src/poetry/puzzle/provider.py

Lines 413 to 418 in 41a8a47

    
           def get_package_from_url(cls, url: str) -> Package: 
        
               file_name = os.path.basename(urllib.parse.urlparse(url).path) 
        
               with tempfile.TemporaryDirectory() as temp_dir: 
        
                   dest = Path(temp_dir) / file_name 
        
                   download_file(url, str(dest)) 
        
                   package = cls.get_package_from_file(dest)

which indeed use a temporary location that is immediately thrown away.

Presumably the right thing to share with would be the artifact cache as used by the Chef?

tall-josh · 2022-09-15T01:35:47Z

What's the reasoning for dumping the downloads to a temp_dir as @dimbleby shows in the snippet? Is it so the cache doesn't blow out to a massive size?

I'd be happy to try and contribute. Naively I'd check a cache wherever download_file is called (puzzle/provider.py and repositories/http.py) but there are likely some considerations I'm missing. If someone could advise I could put together a PR.

dimbleby · 2022-09-15T07:45:30Z

Suspect that code fragment uses a temporary directory for no particularly good reason.

poetry has a cache of downloaded files that it uses during installation, as managed by the curiously named Chef class. I'd think that is the right thing to share with.

Couple of problems though:

the chef uses such things as the current interpreter version to decide where to put these file, which an unwanted complication
it's not entirely clear how to refactor to make the chef cache available during solving

I'd start with an MR that updates the chef so that get_cache_directory_for_link only cares about the URL that the link is downloaded from - that should be straightforward, and will get maintainer opinion on whether this is a sensible track.

Then if that's accepted, follow up with some sort of rearrangement so that this cache can be shared by the chef and the solving code

tall-josh · 2022-09-15T21:25:14Z

Thanks @dimbleby I'll take a look and see what I can do.

dhdaines · 2022-09-30T13:34:50Z

This is a serious problem with packages like PyTorch which are extremely large. Unless there's a workaround for this I will definitely never use Poetry.

rbracco · 2022-10-12T17:55:48Z

Any update on a fix for this? I really like poetry but locking or adding a new dependency now takes > 5 minutes because I have to download wheels for torch, torchaudio, and torchvision. Is there a short-term workaround while a more permanent fix is made? Thank you.

neersighted · 2022-10-12T19:44:40Z

I suspect many are reading this issue without actually having experienced the issue -- Poetry downloads Torch once for metadata + hashing, and a second time for actual installation. After the cache is created, Poetry will not re-download Torch. We are downloading distfiles more often than needed as two parts of the code do not share a common cache, but we are not downloading every time poetry add occurs or anything similar.

rbracco · 2022-10-12T20:07:01Z

I suspect many are reading this issue without actually having experienced the issue -- Poetry downloads Torch once for metadata + hashing, and a second time for actual installation. After the cache is created, Poetry will not re-download Torch. We are downloading distfiles more often than needed as two parts of the code do not share a common cache, but we are not downloading every time poetry add occurs or anything similar.

Thanks for the reply. I am an active user of poetry running 1.2.1, I experience the issue as the pytorch wheel downloads every time I do add or lock and it takes around 80 seconds to download.

Kazam_screencast_00002.mp4

chopeen · 2022-10-13T15:16:40Z

Every time I run poetry update in my project, a large spaCy model gets download.

It is added to [tool.poetry.dependencies] this way:

en_core_web_lg = { url = "https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz" }`

nicolascedilnik · 2022-10-20T07:48:51Z

I think this is related: in a project I have these conditional URL dependencies defined

torch = [
    {url = "https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp39-cp39-linux_x86_64.whl", markers = "sys_platform == 'linux'"},
    {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_10_9_x86_64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'x86_64'"},
    {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_11_0_arm64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'arm64'"}
]

Every poetry lock operation ends up redownloading the 3 wheels, which are quite large. Isn't there a way to have them cached by poetry?

a-gn · 2022-10-24T15:01:23Z

I think this is related: in a project I have these conditional URL dependencies defined
torch = [
    {url = "https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp39-cp39-linux_x86_64.whl", markers = "sys_platform == 'linux'"},
    {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_10_9_x86_64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'x86_64'"},
    {url = "https://download.pytorch.org/whl/cpu/torch-1.12.1-cp39-none-macosx_11_0_arm64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'arm64'"}
]
Every poetry lock operation ends up redownloading the 3 wheels, which are quite large. Isn't there a way to have them cached by poetry?

On my system also, this seemed to make Poetry re-download torch every time it resolved dependencies. It did not happen with other dependencies that were given by name (to be downloaded from PyPI, no URL).

Since PyTorch URLs have to be hard-coded to install properly and PyTorch's wheel takes more than 1GB, this prevents me from migrating the team to Poetry.

neersighted · 2022-10-24T16:01:15Z

Ah, looking at this, I realize that all the metadata caching happens in the repository layer. So if you're using direct URL dependencies, Poetry has no caching whatsoever. I personally got turned around here on whether this was a bug or as-designed behavior (currently, the latter is true).

Ideally the artifacts cache could be made agonostic to repositories so that it is keyed on URLs only and we can share it, as @dimbleby has mentioned. On top of that, I wonder if some mechanism to cache metadata (maybe a direct CachedRepository?) could be implemented, as all that code is currently tied up to indexes.

jace-ys · 2022-11-10T22:00:35Z

I'm also experiencing this issue and it's unfortunate as I now have to choose between installing specific torch = { url = "https://download.pytorch.org/whl/cpu/torch-1.9.0-cp38-none-macosx_11_0_arm64.whl", markers = "platform_machine == 'arm64' and platform_system == 'Darwin'" } wheels for my architecture that are quicker to install but make dependency resolution super slow, or installing the full torch = "1.9.0" which takes longer to install but solves the slow resolution times.

It will be great if there was caching for direct URL dependencies as well, as neither option is ideal right now 😞

leoitcode · 2022-11-25T18:44:29Z

I'm having the same problem in my project, because if you have any packages with " { url = ... } " poetry add, poetry lock , poetry update, everytime it downloads again. Like a temporary solution, I'm using requirements.txt for URL packages and pyproject.toml for the remaining, waiting for a solution.

neersighted · 2022-11-25T18:49:54Z

I think we've pretty firmly established what is going on and what is needed to improve -- I'd ask that people please refrain from "me too" as it's just adding noise right now.

github-actions · 2024-02-29T14:05:55Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

bb added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels May 13, 2020

finswimmer added kind/enhancement Not a bug or feature, but improves usability or performance and removed kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels May 14, 2020

This was referenced Jun 18, 2022

Use locally cached wheels during install #5871

Merged

Poetry repeadetly re-downloads url dependencies during dependency solving #5902

Closed

Jinior mentioned this issue Jun 24, 2022

Do not consider constraints when checking for equality of direct origi… python-poetry/poetry-core#405

Merged

This was referenced Sep 2, 2022

Installing torch with a specific wheel causes the wheel to be downloaded twice #6355

Closed

Instructions for installing PyTorch #6409

Open

tall-josh mentioned this issue Sep 24, 2022

Fix url package caching #6614

Closed

2 tasks

This comment was marked as off-topic.

Sign in to view

neersighted mentioned this issue Dec 11, 2022

when supplying multiple url with non intersecting markers, poetry downloads all of them #7176

Closed

neersighted mentioned this issue Feb 13, 2023

Support files, not just directories, as single page sources #6885

Closed

4 tasks

This was referenced Mar 3, 2023

feat: cache url dependencies during dependency resolution #7595

Closed

refactor: extract cache utilities #7621

Merged

ralbertazzi mentioned this issue Mar 20, 2023

feat: let Provider use ArtifactCache #7693

Merged

1 task

radoering closed this as completed in #7693 May 5, 2023

github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poetry downloading same wheels multiple times within a single invocation #2415

Poetry downloading same wheels multiple times within a single invocation #2415

bb commented May 13, 2020 •

edited

Loading

dimbleby commented Jun 18, 2022

tall-josh commented Sep 15, 2022 •

edited

Loading

dimbleby commented Sep 15, 2022

tall-josh commented Sep 15, 2022

dhdaines commented Sep 30, 2022

rbracco commented Oct 12, 2022

neersighted commented Oct 12, 2022

rbracco commented Oct 12, 2022 •

edited

Loading

chopeen commented Oct 13, 2022

nicolascedilnik commented Oct 20, 2022

a-gn commented Oct 24, 2022

neersighted commented Oct 24, 2022

jace-ys commented Nov 10, 2022 •

edited

Loading

leoitcode commented Nov 25, 2022

neersighted commented Nov 25, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

github-actions bot commented Feb 29, 2024

Poetry downloading same wheels multiple times within a single invocation #2415

Poetry downloading same wheels multiple times within a single invocation #2415

Comments

bb commented May 13, 2020 • edited Loading

Issue

dimbleby commented Jun 18, 2022

tall-josh commented Sep 15, 2022 • edited Loading

dimbleby commented Sep 15, 2022

tall-josh commented Sep 15, 2022

dhdaines commented Sep 30, 2022

rbracco commented Oct 12, 2022

neersighted commented Oct 12, 2022

rbracco commented Oct 12, 2022 • edited Loading

chopeen commented Oct 13, 2022

nicolascedilnik commented Oct 20, 2022

a-gn commented Oct 24, 2022

neersighted commented Oct 24, 2022

jace-ys commented Nov 10, 2022 • edited Loading

leoitcode commented Nov 25, 2022

neersighted commented Nov 25, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

github-actions bot commented Feb 29, 2024

bb commented May 13, 2020 •

edited

Loading

tall-josh commented Sep 15, 2022 •

edited

Loading

rbracco commented Oct 12, 2022 •

edited

Loading

jace-ys commented Nov 10, 2022 •

edited

Loading