Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: proof of concept, parallel installation of wheels #12816

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

morotti
Copy link
Contributor

@morotti morotti commented Jul 1, 2024

Hello,

This is a proof of concept to install wheels in parallels.
Mostly to demonstrate that it's possible.

That took a whole 10 minutes of ChatGPT to parallelize your code :D (and a whole week of debugging pip for other performance issues to understand how to go about it).
I don't expect that to be merged, however if somebody wants to add a proper optional argument --parallel-install=N, it should be possible to merge.

There are two ways to go about parallelizing the pip installation:

  • Either you parallelize the extraction of files within a wheel. The save() function. I think it's not worthwhile because a lot of files are just kilobytes and a wheel might just have a few files, so there might be a lot of overhead to manage threads and other bits. Besides, the Zipfile extractor is very suboptimal and would be better to be rewritten, with maybe a bit of async I/O,
  • Or you parallelize the extraction of wheels. This is what this PR is doing.

It's pretty easy to do with a ThreadPool. It could be a done with a ProcessPool for a lot more performance improvements, but it's problematic to manage errors with sub processes and I'm not sure fork/processpool are implemented in all systems that pip has to support.
So let's investigate what gain you can get with a simple ThreadPool :)

Turns out, there isn't much to get because of the GIL, unless your filesystem has very high latency (NFS).
You get a small gain with 2 threads, usually no more improvement with 3 threads, it gets worse with 4+ threads.

I note there are multiple critical bugs in pip that makes the extraction very slow and inefficient and hold the GIL.
There should be room for more improvements after all these PRs are merged:

pending PR: #12803
pending PR: #12782
pending PR for download, don't try to parallelize downloads without that fix #12810

some benchmarks below on different types of disks:

image
above: run on NFS, NFS is high latency and the gains are substantial

image
above: run on two different disks, /tmp that should be in memory or kind of, and a local volume that is block storage on a virtual machine.

image
above: run with larger packages to compare.
tensorflow is 1-2 GB extracted
pandas+numpy numba+llvmlite scikitlearn are around 50-100 MB each.

I do note that pip seems to install packages alphabetically, which is not ideal.
tensorflow (and torch) fall toward the very end and I'd rather they start toward the beginning. A lot of the extraction time is waiting at the end for the final package tensorflow to finish extracting. It would complete sooner if it started sooner.

@notatallshaw
Copy link
Member

notatallshaw commented Jul 1, 2024

I recently opened an issue on this given uv did this with almost no reported issues.

One issue I did notice since I posted was this one: astral-sh/uv#4328

So it's definitely a test case that should be checked.

There are other edge cases I would be concerned about as well, e.g. What about two packages of the same name? What if one was editable and the other was not?

@morotti
Copy link
Contributor Author

morotti commented Jul 1, 2024

astral-sh/uv#4328

That one is an issue on windows. It's hard to tell what uv is doing because it's working from some cache with hardlinks or symlinks.

pip extraction always erases an existing file (call to unlink()) and write again.
if 2 threads concurrently try to erase/rewrite the same init file.

  • I think it works on Linux because it allows the opened file to be removed.
  • I think it doesn't work on Windows because it doesn't allow the opened file to be deleted while it's being written to.

Quick thoughts, I wonder if a solution would be to have a write lock per package directory, venv/python3.x/site-packages/<packagename>, that would allow parallel extraction and handle packages that erase each other. (I think we have to avoid a lock per file because the lock would be slower than the extraction).

There are other edge cases I would be concerned about as well, e..g. What about two packages of the same name? What if one was editable and the other was not?

I think that makes no sense from the installation perspective. The installation loop runs after the resolver.

The resolver should have resolved what packages to install. i.e. one package per name.
I'm not sure if there is a way where you can force pip to install conflicting packages like pip install ./pandas-1.2.3.whl ./pandas-1.2.4.whl ./pandas-1.2.5.whl -e ./gitclone/pandas

@morotti morotti mentioned this pull request Jul 17, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants