PERF: proof of concept, parallel installation of wheels #12816

morotti · 2024-07-01T14:28:47Z

Hello,

This is a proof of concept to install wheels in parallels.
Mostly to demonstrate that it's possible.

That took a whole 10 minutes of ChatGPT to parallelize your code :D (and a whole week of debugging pip for other performance issues to understand how to go about it).
I don't expect that to be merged, however if somebody wants to add a proper optional argument --parallel-install=N, it should be possible to merge.

There are two ways to go about parallelizing the pip installation:

Either you parallelize the extraction of files within a wheel. The save() function. I think it's not worthwhile because a lot of files are just kilobytes and a wheel might just have a few files, so there might be a lot of overhead to manage threads and other bits. Besides, the Zipfile extractor is very suboptimal and would be better to be rewritten, with maybe a bit of async I/O,
Or you parallelize the extraction of wheels. This is what this PR is doing.

It's pretty easy to do with a ThreadPool. It could be a done with a ProcessPool for a lot more performance improvements, but it's problematic to manage errors with sub processes and I'm not sure fork/processpool are implemented in all systems that pip has to support.
So let's investigate what gain you can get with a simple ThreadPool :)

Turns out, there isn't much to get because of the GIL, unless your filesystem has very high latency (NFS).
You get a small gain with 2 threads, usually no more improvement with 3 threads, it gets worse with 4+ threads.

I note there are multiple critical bugs in pip that makes the extraction very slow and inefficient and hold the GIL.
There should be room for more improvements after all these PRs are merged:

pending PR: #12803
pending PR: #12782
pending PR for download, don't try to parallelize downloads without that fix #12810

some benchmarks below on different types of disks:

above: run on NFS, NFS is high latency and the gains are substantial

above: run on two different disks, /tmp that should be in memory or kind of, and a local volume that is block storage on a virtual machine.

above: run with larger packages to compare.
tensorflow is 1-2 GB extracted
pandas+numpy numba+llvmlite scikitlearn are around 50-100 MB each.

I do note that pip seems to install packages alphabetically, which is not ideal.
tensorflow (and torch) fall toward the very end and I'd rather they start toward the beginning. A lot of the extraction time is waiting at the end for the final package tensorflow to finish extracting. It would complete sooner if it started sooner.

notatallshaw · 2024-07-01T16:45:05Z

I recently opened an issue on this given uv did this with almost no reported issues.

One issue I did notice since I posted was this one: astral-sh/uv#4328

So it's definitely a test case that should be checked.

There are other edge cases I would be concerned about as well, e.g. What about two packages of the same name? What if one was editable and the other was not?

morotti · 2024-07-01T17:17:15Z

astral-sh/uv#4328

That one is an issue on windows. It's hard to tell what uv is doing because it's working from some cache with hardlinks or symlinks.

pip extraction always erases an existing file (call to unlink()) and write again.
if 2 threads concurrently try to erase/rewrite the same init file.

I think it works on Linux because it allows the opened file to be removed.
I think it doesn't work on Windows because it doesn't allow the opened file to be deleted while it's being written to.

Quick thoughts, I wonder if a solution would be to have a write lock per package directory, venv/python3.x/site-packages/<packagename>, that would allow parallel extraction and handle packages that erase each other. (I think we have to avoid a lock per file because the lock would be slower than the extraction).

There are other edge cases I would be concerned about as well, e..g. What about two packages of the same name? What if one was editable and the other was not?

I think that makes no sense from the installation perspective. The installation loop runs after the resolver.

The resolver should have resolved what packages to install. i.e. one package per name.
I'm not sure if there is a way where you can force pip to install conflicting packages like pip install ./pandas-1.2.3.whl ./pandas-1.2.4.whl ./pandas-1.2.5.whl -e ./gitclone/pandas

PERF: proof of concept, parallel installation of wheels

3e81ec8

morotti mentioned this pull request Jul 17, 2024

Install packages in parallel #12742

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: proof of concept, parallel installation of wheels #12816

PERF: proof of concept, parallel installation of wheels #12816

morotti commented Jul 1, 2024 •

edited

Loading

notatallshaw commented Jul 1, 2024 •

edited

Loading

morotti commented Jul 1, 2024 •

edited

Loading

PERF: proof of concept, parallel installation of wheels #12816

Are you sure you want to change the base?

PERF: proof of concept, parallel installation of wheels #12816

Conversation

morotti commented Jul 1, 2024 • edited Loading

notatallshaw commented Jul 1, 2024 • edited Loading

morotti commented Jul 1, 2024 • edited Loading

morotti commented Jul 1, 2024 •

edited

Loading

notatallshaw commented Jul 1, 2024 •

edited

Loading

morotti commented Jul 1, 2024 •

edited

Loading