Use installed packages to solve dependency graph #1596

sfarina · 2022-03-07T16:32:38Z

What's the problem this feature will solve?

I'm trying to build stable requirements.txt files for docker containers built on top of existing containers (specifically pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime). To save on image size, these containers don't have the pip cache intact, so pip-compile takes a few minutes to download a large (~1GB) wheel, defeating part of the purpose of using a base docker image. I run pip-compile inside of a docker build -f pipcomile.dockerfile

Describe the solution you'd like

any of:

an option to tell pip-compile not to solve for package X, or anything it depends on (the docker container already manages package X)
an option to tell pip-compile not to solve for any package that is already installed

Alternative Solutions

wait for the wheel to be downloaded every time
cache the large wheel in the docker build before the pip-comile step

Additional context

(user@host)$ docker run --rm -it pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime bash
(root@docker)# pip install pip-tools
(root@docker)# pip install torch==1.10.0
    Requirement already satisfied: torch==1.10.0 in /opt/conda/lib/python3.7/site-packages (1.10.0)
(root@docker)# echo "torch==1.10.0" > requirements.in
(root@docker)# python3 -m piptools compile -v
#...
  torch==1.10.0 not in cache, need to check index
# takes 3 minutes to download a large wheel that's already installed

The text was updated successfully, but these errors were encountered:

AndydeCleyre · 2022-03-07T19:56:37Z

OK, I don't have a great handle on every aspect of the caching, and I don't know if this is 100% satisfactory, but this can be worked around somewhat by copying/mounting/sharing a very small json cache file (not the wheel itself).

I used the container to run pip-compile which indeed takes too long and does too much work. This generated /root/.cache/pip-tools/depcache-cp3.7.json (formatted for this comment):

{
  "__format__": 1,
  "dependencies": {
    "torch": {
      "1.10.0": [
        "typing-extensions"
      ]
    },
    "typing-extensions": {
      "4.1.1": []
    }
  }
}

And now:

$ podman run --rm -it -v $PWD/depcache-cp3.7.json:/root/.cache/pip-tools/depcache-cp3.7.json:rw docker://pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime bash
# pip install pip-tools
# echo "torch==1.10.0" >requirements.in
# time pip-compile

#
# This file is autogenerated by pip-compile with python 3.7
# To update, run:
#
#    pip-compile
#
torch==1.10.0
    # via -r requirements.in
typing-extensions==4.1.1
    # via torch

real    0m1.662s
user    0m0.427s
sys     0m0.048s

sfarina · 2022-03-08T15:43:13Z

Thanks for looking into this so quickly!

I'll try this out. It's not ideal - I'll have to explain the magic config file, but it's a better workaround than mine, which downloads the wheel and caches the wheel in the docker build before installing pip-tools.

sfarina · 2022-03-08T18:37:41Z

Maybe this is a problem with how pip/pypi works. The dependency graph should be solvable without downloading ANY packages through pypi exposing some small file(s) or api.

I think it would still be nice to have requirements.in parse something like

torch==installed # or ignored, or existing, ...
transformers

to ignore/trust an existing package, but maybe I'm alone in that.

update:
it would do away with the need for the magic config file: depcache-cp3.7.json:/root/.cache/pip-tools/depcache-cp3.7.json

AndydeCleyre · 2022-03-08T18:51:41Z

I'm still interested in this issue, and I still don't have all the answers. But I'll add now another workaround, oriented to your last suggestion.

Be warned: it sacrifices the total locking guarantees, but will "probably" (😓) be fine:

# echo "transformers" >>locked.in
# pip-compile locked.in
# echo "-r locked.txt" >>requirements.txt

# echo "torch" >>installed.txt
# echo "-r installed.txt" >>requirements.txt

# pip install -r requirements.txt

sfarina · 2022-03-08T20:01:46Z

I'm ok with sacrificing "total locking" guarantees, as some of the locking is done by pinning a docker base image.

Thanks for the new workaround, but I don't think would fix the situation, since transformers depends on torch (which you wouldn't have known a priori). I'm in a meeting but can test in a bit.

AndydeCleyre · 2022-03-08T22:30:07Z

Oh yeah sorry, this method won't help in this case.

AndydeCleyre · 2022-03-21T17:39:19Z

Maybe this is a problem with how pip/pypi works. The dependency graph should be solvable without downloading ANY packages through pypi exposing some small file(s) or api.

I think this is it really. Unfortunately, especially with setup.py packages, arbitrary code can be run on the system-of-installation to determine the requirements, and we can't rely on simple static dependency declarations universally.

That said, in the container you're using, the relevant info does seem to be available in /opt/conda/pkgs/*/info/*.json.

And in normally installed packages, we may find the details needed in e.g. /usr/lib/python3/dist-packages/*.egg-info/{PKG-INFO,requires.txt}.

So maybe we can update our cache file with data from those sources. If we do, I don't know if it should be done by default, as it has different security implications than using the PyPI data/packages.

I'll also link some related issues:

sfarina · 2022-03-21T18:47:57Z

So maybe we can update our cache file with data from those sources. If we do, I don't know if it should be done by default, as it has different security implications than using the PyPI data/packages.

Having an option to look through /path/to/python/{dist,site}-packages/*.egg-info/<files> to update the cache would be nice, but the future is probably the .whl.METADATA issue you linked, whenever that is finished.

EpicWink · 2022-06-15T04:04:25Z

I have a similar problem, but in my case the dependency¹ I have installed in my Docker image is not available on PyPI or our internal index (but may be in our internal index in the future²), due to it requiring system libraries. This means I need one of the first two options in the OP (so I can lock all dependencies): consider installed (preferred) or ignore specific packages.

If you want to know, the dependency is (a fork of) OpenSfM, requiring OpenCV (and its Python bindings, which we manage manually) and Ceres-solver. ↩
We'll likely turn the opensfm wheels into manylinux wheels, then distribute in our internal index, but I'd like to find a way to use the OpenCV distributed with opencv-python. ↩

sfarina · 2022-08-17T16:04:08Z

Since this a an edge case relevant to my docker workflow, I'll post my docker workaround: create a docker cache for the pip cache:

RUN --mount=type=cache,target=/root/.cache/pip python3 -m piptools compile -v

That should avoid repeatedly downloading big wheels (until the cache is cleared).

It might be better to use a real mount instead of a cache mount, but I'm no docker expert so, by all means, experiment.

EpicWink · 2024-05-16T21:02:18Z

@sfarina you closed this as completed: could you please link the pull request which completed this issue? Or are you thinking that your Docker cache mount solves your use case, in which case could you please instead close as won't fix?

sfarina · 2024-05-16T22:50:39Z

won't fix / stale

atugushev added the cache Related to dependency cache label Apr 6, 2022

sfarina closed this as completed May 16, 2024

sfarina reopened this May 16, 2024

sfarina closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2024

antony-frolov mentioned this issue Aug 27, 2024

Add an argument for ignore_installed #2123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use installed packages to solve dependency graph #1596

Use installed packages to solve dependency graph #1596

sfarina commented Mar 7, 2022

AndydeCleyre commented Mar 7, 2022

sfarina commented Mar 8, 2022

sfarina commented Mar 8, 2022 •

edited

Loading

AndydeCleyre commented Mar 8, 2022

sfarina commented Mar 8, 2022

AndydeCleyre commented Mar 8, 2022

AndydeCleyre commented Mar 21, 2022

sfarina commented Mar 21, 2022

EpicWink commented Jun 15, 2022

sfarina commented Aug 17, 2022

EpicWink commented May 16, 2024 •

edited

Loading

sfarina commented May 16, 2024

Use installed packages to solve dependency graph #1596

Use installed packages to solve dependency graph #1596

Comments

sfarina commented Mar 7, 2022

What's the problem this feature will solve?

Describe the solution you'd like

Alternative Solutions

Additional context

AndydeCleyre commented Mar 7, 2022

sfarina commented Mar 8, 2022

sfarina commented Mar 8, 2022 • edited Loading

AndydeCleyre commented Mar 8, 2022

sfarina commented Mar 8, 2022

AndydeCleyre commented Mar 8, 2022

AndydeCleyre commented Mar 21, 2022

sfarina commented Mar 21, 2022

EpicWink commented Jun 15, 2022

Footnotes

sfarina commented Aug 17, 2022

EpicWink commented May 16, 2024 • edited Loading

sfarina commented May 16, 2024

sfarina commented Mar 8, 2022 •

edited

Loading

EpicWink commented May 16, 2024 •

edited

Loading