Parallelize Resolver #3627

ibraheemdev · 2024-05-16T16:09:20Z

Summary

This PR introduces parallelism to the resolver. Specifically, we can perform PubGrub resolution on a separate thread, while keeping all I/O on the tokio thread. We already have the infrastructure set up for this with the channel and OnceMap, which makes this change relatively simple. The big change needed to make this possible is removing the lifetimes on some of the types that need to be shared between the resolver and pubgrub thread.

A related PR, #1163, found that adding yield_now calls improved throughput. With optimal scheduling we might be able to get away with everything on the same thread here. However, in the ideal pipeline with perfect prefetching, the resolution and prefetching can run completely in parallel without depending on one another. While this would be very difficult to achieve, even with our current prefetching pattern we see a consistent performance improvement from parallelism.

This does also require reverting a few of the changes from #3413, but not all of them. The sharing is isolated to the resolver task.

Test Plan

On smaller tasks performance is mixed with ~2% improvements/regressions on both sides. However, on medium-large resolution tasks we see the benefits of parallelism, with improvements anywhere from 10-50%.

./scripts/requirements/jupyter.in
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):      29.2 ms ±   1.8 ms    [User: 20.3 ms, System: 29.8 ms]
  Range (min … max):    26.4 ms …  36.0 ms    91 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):      25.5 ms ±   1.0 ms    [User: 19.5 ms, System: 25.5 ms]
  Range (min … max):    23.6 ms …  27.8 ms    99 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.15 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm)

./scripts/requirements/boto3.in   
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):     487.1 ms ±   6.2 ms    [User: 464.6 ms, System: 61.6 ms]
  Range (min … max):   480.0 ms … 497.3 ms    10 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):     430.8 ms ±   9.3 ms    [User: 529.0 ms, System: 77.2 ms]
  Range (min … max):   417.1 ms … 442.5 ms    10 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.13 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm)

./scripts/requirements/airflow.in 
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):     478.1 ms ±  18.8 ms    [User: 482.6 ms, System: 205.0 ms]
  Range (min … max):   454.7 ms … 508.9 ms    10 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):     308.7 ms ±  11.7 ms    [User: 428.5 ms, System: 209.5 ms]
  Range (min … max):   287.8 ms … 323.1 ms    10 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.55 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm)

charliermarsh · 2024-05-16T16:21:16Z

Very cool!

charliermarsh · 2024-05-16T16:21:30Z

Tagging @konstin and @BurntSushi to review.

BurntSushi

This is pretty awesome. I think this largely makes sense to me overall. I am a little concerned about the switch to unbounded channels/streams though. I convinced us a while back to switch from unbounded channels to bounded channels. I believe this was my argument: #1163 (comment)

crates/uv-resolver/src/resolver/batch_prefetch.rs

crates/uv-resolver/src/resolver/mod.rs

BurntSushi · 2024-05-16T17:23:30Z

crates/uv-resolver/src/resolver/mod.rs

+            .map(|request| self.process_request(request).boxed_local())
+            // Allow as many futures as possible to start in the background.
+            // Backpressure is provided by at a more granular level by `DistributionDatabase`
+            // and `SourceDispatch`, as well as the bounded request channel.


Can this comment be unpacked a bit more? Also, which bounded request channel is this referring to?

This is an old comment, I didn't touch any of the fetch code. It's referring to the channel between the prefetcher and solver, I'll update it to make it clearer.

crates/uv-resolver/src/resolver/mod.rs

zanieb · 2024-05-17T02:05:57Z

Love to see a 50% improvement :)

zanieb · 2024-05-17T02:07:20Z

crates/uv-interpreter/src/environment.rs

+pub struct PythonEnvironment(Arc<SharedPythonEnvironment>);
+
+#[derive(Debug, Clone)]
+struct SharedPythonEnvironment {


Just for my knowledge, is this the typical naming scheme for this pattern?

I usually use Foo and FooInner personally. I don't think I've seen SharedFoo much? I like adding a suffix personally. So I'd prefer FooShared (or whatever). But I don't have a strong opinion.

Cool thanks! Inner makes a bit more sense to me.

I sort of avoid Inner because it feels like a catch-all naming convention. A suffix seems slightly better for readability, I'll switch to that.

konstin · 2024-05-17T09:19:32Z

Amazing work!

Do you know why it gets so much faster, i.e. how the solver is blocking? I've been looking at the spans, but i don't really understand why the prefetches don't get queued anyway on main.

Airflow, main vs. PR:

Here's some perf number from my machine:

jupyter:
  Time (mean ± σ):      15.9 ms ±   1.3 ms    [User: 14.1 ms, System: 21.0 ms]
  Time (mean ± σ):      15.4 ms ±   1.1 ms    [User: 14.1 ms, System: 20.1 ms]
boto3:
  Time (mean ± σ):     383.5 ms ±   3.3 ms    [User: 343.5 ms, System: 60.8 ms]
  Time (mean ± σ):     325.3 ms ±   3.8 ms    [User: 351.7 ms, System: 62.0 ms]
airflow:
  Time (mean ± σ):     192.6 ms ±   3.9 ms    [User: 172.5 ms, System: 151.3 ms]
  Time (mean ± σ):     147.0 ms ±   3.3 ms    [User: 168.4 ms, System: 146.6 ms]

BurntSushi

Love it. I like the lifetime removal a lot too. Nice work.

crates/uv-resolver/src/resolver/index.rs

ibraheemdev · 2024-05-17T15:35:53Z

@konstin The elapsed user time doesn't change up, and if you look at the profile for the resolver thread you'll see a lot of time spent in pubgrub, which suggests that the prefetches may have been queued on the single-threaded version but we simply didn't have enough time to get to them, or if we did they took away from the solver. My hunch is that the solver and prefetcher were fighting for time slices.

## Summary Move completely off tokio's multi-threaded runtime. We've slowly been making changes to be smarter about scheduling in various places instead of depending on tokio's general purpose work-stealing, notably #3627 and #4004. We now no longer benefit from the multi-threaded runtime, as we run on all I/O on the main thread. There's one remaining instance of `block_in_place` that can be swapped for `rayon::spawn`. This change is a small performance improvement due to removing some unnecessary overhead of the multi-threaded runtime (e.g. spawning threads), but nothing major. It also removes some noise from profiles. ## Test Plan ``` Benchmark 1: ./target/profiling/uv (resolve-warm) Time (mean ± σ): 14.9 ms ± 0.3 ms [User: 3.0 ms, System: 17.3 ms] Range (min … max): 14.1 ms … 15.8 ms 169 runs Benchmark 2: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 16.1 ms ± 0.3 ms [User: 3.9 ms, System: 18.7 ms] Range (min … max): 15.1 ms … 17.3 ms 162 runs Summary ./target/profiling/uv (resolve-warm) ran 1.08 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm) ```

ibraheemdev marked this pull request as ready for review May 16, 2024 16:17

charliermarsh requested review from BurntSushi and konstin May 16, 2024 16:21

charliermarsh added the performance Potential performance improvement label May 16, 2024

ibraheemdev added 4 commits May 16, 2024 12:28

spawn pubgrub solver as separate thread

a4e5e55

remove lifetimes from resolver

8b9bd82

update internal docs

6823288

wrap PythonEnvironment in Arc

374afaf

ibraheemdev force-pushed the prefetch-spawn branch 2 times, most recently from 0a03bca to 411bdb4 Compare May 16, 2024 16:36

remove unused rayon dependency

33fd177

ibraheemdev force-pushed the prefetch-spawn branch from 411bdb4 to 33fd177 Compare May 16, 2024 16:38

BurntSushi reviewed May 16, 2024

View reviewed changes

ibraheemdev added 4 commits May 16, 2024 14:57

simplify resolver state

e8f274e

revert to bounded channels

9345600

rename and document resolver states

362e189

fix clippy lints

a447910

ibraheemdev force-pushed the prefetch-spawn branch from 484da3d to a447910 Compare May 16, 2024 19:08

ibraheemdev requested a review from BurntSushi May 16, 2024 19:09

zanieb reviewed May 17, 2024

View reviewed changes

konstin approved these changes May 17, 2024

View reviewed changes

BurntSushi approved these changes May 17, 2024

View reviewed changes

crates/uv-resolver/src/resolver/index.rs Outdated Show resolved Hide resolved

crates/uv-resolver/src/resolver/index.rs Show resolved Hide resolved

update docs

d1d0f29

rename SharedPythonEnvironment to PythonEnvironmentShared

4f2abc4

ibraheemdev merged commit 39af09f into astral-sh:main May 17, 2024
44 checks passed

BrewTestBot mentioned this pull request May 20, 2024

uv 0.1.45 Homebrew/homebrew-core#172224

Merged

zanieb mentioned this pull request May 21, 2024

Deadlock during resolution #3724

Closed

ibraheemdev mentioned this pull request Jul 9, 2024

Switch to Current-Thread Tokio Runtime #4934

Merged

konstin mentioned this pull request Oct 30, 2024

Async API? pubgrub-rs/pubgrub#110

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize Resolver #3627

Parallelize Resolver #3627

ibraheemdev commented May 16, 2024 •

edited

Loading

charliermarsh commented May 16, 2024

charliermarsh commented May 16, 2024

BurntSushi left a comment

BurntSushi May 16, 2024

ibraheemdev May 16, 2024

zanieb commented May 17, 2024

zanieb May 17, 2024

BurntSushi May 17, 2024

zanieb May 17, 2024

ibraheemdev May 17, 2024 •

edited

Loading

konstin commented May 17, 2024

BurntSushi left a comment

ibraheemdev commented May 17, 2024 •

edited

Loading

Parallelize Resolver #3627

Parallelize Resolver #3627

Conversation

ibraheemdev commented May 16, 2024 • edited Loading

Summary

Test Plan

charliermarsh commented May 16, 2024

charliermarsh commented May 16, 2024

BurntSushi left a comment

Choose a reason for hiding this comment

BurntSushi May 16, 2024

Choose a reason for hiding this comment

ibraheemdev May 16, 2024

Choose a reason for hiding this comment

zanieb commented May 17, 2024

zanieb May 17, 2024

Choose a reason for hiding this comment

BurntSushi May 17, 2024

Choose a reason for hiding this comment

zanieb May 17, 2024

Choose a reason for hiding this comment

ibraheemdev May 17, 2024 • edited Loading

Choose a reason for hiding this comment

konstin commented May 17, 2024

BurntSushi left a comment

Choose a reason for hiding this comment

ibraheemdev commented May 17, 2024 • edited Loading

ibraheemdev commented May 16, 2024 •

edited

Loading

ibraheemdev May 17, 2024 •

edited

Loading

ibraheemdev commented May 17, 2024 •

edited

Loading