Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop the im-rc dependency #9878

Closed
wants to merge 3 commits into from
Closed

Drop the im-rc dependency #9878

wants to merge 3 commits into from

Conversation

decathorpe
Copy link
Contributor

This PR drops cargo's dependency on im-rc. The im-rc and im crates seem to be unmaintained, with no code changes, releases, or issue / PR triage in over a year. The linked issue also lists some logic / panic bugs that are still present in the latest im/im-rc releases, which have gone unfixed for at least a year, as well.

A RUSTSEC advisory to mark im and im-rc as unmaintained has been proposed, but the crate's author has apparently responded somewhere (I cannot even confirm that, since that conversation has not been linked in the issue) that the crates are still "maintained", but the GitHub project - both code and issues / PRs - has remained untouched since.

The im-rc crate has now been gathering bug reports for over a year, while also having increasingly outdated crate dependencies (bitmaps 2 vs .3, rand_core 0.5 vs 0.6, rand_xoshiro 0.4 vs 0.6, arbitrary 0.4 vs. 1, proptest 0.9 vs. 1, quickcheck 0.9 vs .1, and dev-dependencies pretty_assertions 0.6 vs. 0.7, proptest_derive 0.1 vs. 0.3, rand 0.7 vs. 0.8).

I'm not sure whether the original reason to use immutable data structures in cargo is still valid. In the PR where this dependency was introduced, the reasoning is that cloning some data structures is expensive, but benchmarks for comparing data structures from std and those from im-rc yielded mixed results. Some of the performance issues that the introduction of im-rc was supposed to work around have since been resolved in simpler ways.


This PR removes the dependency on im-rc and makes the following changes:

  • replace im_rc::HashMap with std::collections::HashMap
  • replace im_rc::HashSet with std::collections::HashSet
  • replace im_rc::OrdMap with std::collections::BTreeMap
  • replace im_rc::OrdSet with std::collections::BTreeSet
  • replace im_rc::hashmap::Entry with std::collections::hash_map::Entry

Additionally, data from a BTreeSet is temporarily moved into a VecDeque in RemainingDeps::pop_most_constrained to make sure pop_front and push_back operations are not too slow (the BTreeSet::pop_first method is still nightly-only under the experimental map_first_last feature flag). The original code before the introduction of im-rc used a reversed std::collections::BinaryHeap max-heap for this purpose, which should work instead of temporarily moving data around, as well (though I have not ever used a BinaryHeap in this way before).


I have run some simple benchmarks against my change to make sure performance does not regress. Since there's no official "bench"es in the cargo project, I just ran cargo test in both debug and release mode in hyperfine before and after applying the changes from this PR, with the CFG_DISABLE_CROSS_TESTS=1 environment variable set, against the current nightly Rust toolchain (rustc 1.57.0-nightly (e30b68353 2021-09-05)).

Before this PR:

  • cargo test (debug)
  Time (mean ± σ):     75.710 s ±  3.423 s    [User: 620.908 s, System: 246.731 s]
  Range (min … max):   69.566 s … 81.616 s    10 runs
  • cargo test --release
  Time (mean ± σ):     66.515 s ±  2.600 s    [User: 567.307 s, System: 238.467 s]
  Range (min … max):   62.159 s … 70.810 s    10 runs

After this PR:

  • cargo test (debug)
  Time (mean ± σ):     76.524 s ±  5.503 s    [User: 620.339 s, System: 253.578 s]
  Range (min … max):   67.246 s … 87.058 s    10 runs
  • cargo test --release
  Time (mean ± σ):     66.460 s ±  4.264 s    [User: 562.187 s, System: 238.884 s]
  Range (min … max):   62.246 s … 77.469 s    10 runs

Assuming the unit and integration tests cover the changed code at all, the performance impact of dropping im-rc in favor of using collections from the standard library is either very very small (within the margin of error of the benchmark results), and if anything, ever so slightly in favor of the std collections in release mode.

As a nice side effect, the compiled cargo binary is also slightly smaller with this change, probably not only because im-rc is dropped, but also due to removal / de-duplication of old versions of crates from im-rc's dependency tree.

  • debug: 223.9 MB vs. 209.1 MB
  • release: 19.2 MB vs. 19.0 MB

(compared on x86_64 linux target, with same rust toolchain)


Alternatively, there is now a fork of the im project (imbl), where a brave soul has started to triage all logic bugs and panic! issues that have not been triaged in the old project.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 6, 2021
@ehuss
Copy link
Contributor

ehuss commented Sep 6, 2021

@Eh2406 or @alexcrichton would you be able to review this? I was not involved in the original decision.

Regarding benchmarking, cargo test probably isn't a good proxy for resolver benchmarking, since 99% of the work is doing other stuff. It would definitely be nice to have a more formal benchmarking system. For now, one way to benchmark the resolver is to run something like cargo generate-lockfile in some larger projects.

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 6, 2021

To start with thank you for working on this! I have been watching the abandonment of im for a while wondering when we should stop using it. Also thank you for the long and clear wright up, it makes all the relevant context easy to find!

From the introductory PR you linked, my reading is the main advantage of im over std is that it makes development easier, in that performance mistakes are and extra O(ln(N)) vs O(N). Which can be the difference between "get it correct then make it fast" vs having to "fix perf in order to run tests". Furthermore the work on public and private dependencies has never been benchmarked on std. That being said, approximately no one is working on the Resolver and the P&PD work is unstable and stalled.

pop_most_constrained should probably go back to a BinaryHeap.

Cargo test are probably going to mostly be benchmarking file IO, and only use the resolver in small ways. resolver-tests may be a little better, but it is also synthetic and not deterministic. Eric suggestion: cargo generate-lockfile may be the correct decision. Maybe the Cargo.toml from here (without -Zminimal-versions, we don't care about the hang we do care about the large tree)?

Some of the type substitutions will probably need to be im-rc::T -> RC<std::T> but I am not sure which ones.

Sorry I ran out of time for a longer review.

@decathorpe
Copy link
Contributor Author

Thanks for the initial review!

Regarding benchmarking, cargo test probably isn't a good proxy for resolver benchmarking, since 99% of the work is doing other stuff. It would definitely be nice to have a more formal benchmarking system. For now, one way to benchmark the resolver is to run something like cargo generate-lockfile in some larger projects.

Yeah, running initial benchmarks with cargo test was just a basic smoke test to make sure I didn't make things significantly worse, and I needed to run cargo test before and after anyway :) To come back to your suggestion, how should better benchmarks with generate-lockfile look like?

  • build the cargo binary with and without my changes
  • run / time cargo generate-lockfile for projects (which ones would be good candidates?)
  • do I need to remove Cargo.lock between runs of cargo generate-lockfile?

pop_most_constrained should probably go back to a BinaryHeap

I can try to follow up with that, but after some initial looks into the git history of the affected file, it appears to not be as simple as reverting the PR that switched from BinaryHeap to im_rc::OrdSet, because there were some other changes to this function after the switch, which I don't fully understand yet.

@ehuss
Copy link
Contributor

ehuss commented Sep 6, 2021

That seems like a reasonable set of steps. You don't need to remove Cargo.lock with generate-lockfile.

A recent discussion came up with the following as some good real-world projects (counts of dependencies given):

  • cargo: 130
  • rust: 518
  • tikv: 552
  • firefox: 577
  • diem: 653
  • servo: 658
  • paritytech/substrate: 896

@decathorpe
Copy link
Contributor Author

Alright, I've compiled cargo using im-rc into cargo-old and without im-rc into cargo-new, and performance results are mixed. Benchmarks were run with hyperfine -w 1 -p "rm -f Cargo.lock" -c "rm -f Cargo.lock" -r 20 "cargo-old generate-lockfile" (equivalently the call for cargo-new). I added calls to remove Cargo.lock between runs, because otherwise results were not reproducing well.

  • rustc 1.54.0 from release tarball:

new version slightly faster.

Benchmark #1: cargo-old generate-lockfile
  Time (mean ± σ):      2.174 s ±  0.697 s    [User: 134.7 ms, System: 36.9 ms]
  Range (min … max):    1.698 s …  4.653 s    20 runs

Benchmark #1: cargo-new generate-lockfile
  Time (mean ± σ):      1.909 s ±  0.164 s    [User: 199.4 ms, System: 54.7 ms]
  Range (min … max):    1.734 s …  2.341 s    20 runs
  • cargo from git master:

both versions have almost identical performance

Benchmark #1: cargo-old generate-lockfile
  Time (mean ± σ):     169.2 ms ±  11.0 ms    [User: 36.7 ms, System: 6.9 ms]
  Range (min … max):   154.0 ms … 196.9 ms    20 runs

Benchmark #1: cargo-new generate-lockfile
  Time (mean ± σ):     174.1 ms ±  10.5 ms    [User: 35.1 ms, System: 7.8 ms]
  Range (min … max):   153.6 ms … 196.1 ms    20 runs
  • paritytech/substrate:

significantly faster with im-rc

Benchmark #1: cargo-old generate-lockfile
  Time (mean ± σ):     342.9 ms ±   8.5 ms    [User: 189.6 ms, System: 30.9 ms]
  Range (min … max):   327.5 ms … 363.7 ms    20 runs

Benchmark #1: cargo-new generate-lockfile
  Time (mean ± σ):     815.3 ms ±  24.7 ms    [User: 596.8 ms, System: 85.2 ms]
  Range (min … max):   794.6 ms … 908.4 ms    20 runs

So the performance with my changes does not seem to correlate directly with the size of the dependency tree, otherwise the benchmarks for rustc should have been slower than those for cargo, but it performed best for rustc.

I'll try using a BinaryHeap in pop_most_constrained and will then run these benchmarks again.

@decathorpe
Copy link
Contributor Author

Alright, I've switched the BTreeSet back to a BinaryHeap, performance seems to be slightly better, bit still a bit worse than with im_rc::OrdSet in the case of projects with very large / deep dependency trees:

  • rustc 1.54.0 from release tarball:

slightly faster than with im-rc (!)

Benchmark #1: cargo-old generate-lockfile
  Time (mean ± σ):      1.892 s ±  0.289 s    [User: 140.1 ms, System: 36.8 ms]
  Range (min … max):    1.643 s …  2.688 s    20 runs
 
Benchmark #1: cargo-new generate-lockfile
  Time (mean ± σ):      1.798 s ±  0.047 s    [User: 169.9 ms, System: 56.7 ms]
  Range (min … max):    1.719 s …  1.897 s    20 runs
  • cargo from git master:

both versions have almost identical performance, within margin of error

Benchmark #1: cargo-old generate-lockfile
  Time (mean ± σ):     166.8 ms ±   6.7 ms    [User: 33.8 ms, System: 7.0 ms]
  Range (min … max):   156.3 ms … 177.8 ms    20 runs
 
Benchmark #1: cargo-new generate-lockfile
  Time (mean ± σ):     169.4 ms ±   6.6 ms    [User: 36.0 ms, System: 8.3 ms]
  Range (min … max):   151.8 ms … 179.2 ms    20 runs
  • paritytech/substrate:

was faster by ~40% with im-rc

Benchmark #1: cargo-old generate-lockfile
  Time (mean ± σ):     350.6 ms ±   7.4 ms    [User: 190.7 ms, System: 30.9 ms]
  Range (min … max):   335.6 ms … 363.3 ms    20 runs
 
Benchmark #1: cargo-new generate-lockfile
  Time (mean ± σ):     597.6 ms ±  10.4 ms    [User: 378.6 ms, System: 90.1 ms]
  Range (min … max):   579.6 ms … 623.3 ms    20 runs
  • servo/servo from git master:

both versions have almost identical performance, within margin of error

Benchmark #1: cargo-old generate-lockfile
  Time (mean ± σ):      2.885 s ±  0.240 s    [User: 239.7 ms, System: 70.6 ms]
  Range (min … max):    2.710 s …  3.785 s    20 runs
 
Benchmark #1: cargo-new generate-lockfile
  Time (mean ± σ):      2.958 s ±  0.161 s    [User: 295.8 ms, System: 96.0 ms]
  Range (min … max):    2.777 s …  3.291 s    20 runs

I also benchmarked this against some small-to-medium-sized projects and performance cargo generate-lockfile was almost identical in all cases, so paritytech/substrate seems to be the only outlier.

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 6, 2021

If you are having fun with this kind of perf work, I have two follow up questions.

  1. How bad is this PR when the P&PD unstable feature is enabled (by adding cargo-features = ["public-dependency"] to the top of Cargo.toml and adding a -Z flag to the command line)?
  2. What is different about a flame graph of paritytech/substrate with and without this PR?

If you are not having fun I understand.

@decathorpe
Copy link
Contributor Author

With cargo-features = ["public-dependency"] enabled in Cargo.toml, the results are very similar (with one exception, noted below):

  • rustc 1.54.0:

still very similar or slightly faster without im-rc

Benchmark #1: cargo-old -Z unstable-options generate-lockfile
  Time (mean ± σ):      1.815 s ±  0.177 s    [User: 138.1 ms, System: 38.9 ms]
  Range (min … max):    1.654 s …  2.298 s    20 runs
 
Benchmark #1: cargo-new -Z unstable-options generate-lockfile
  Time (mean ± σ):      1.810 s ±  0.088 s    [User: 170.4 ms, System: 65.4 ms]
  Range (min … max):    1.692 s …  2.018 s    20 runs
  • cargo git master:

slightly slower without im-rc, but within margin of error

Benchmark #1: cargo-old -Z unstable-options generate-lockfile
  Time (mean ± σ):     164.8 ms ±   9.4 ms    [User: 31.4 ms, System: 6.6 ms]
  Range (min … max):   150.2 ms … 177.9 ms    20 runs

Benchmark #1: cargo-new -Z unstable-options generate-lockfile
  Time (mean ± σ):     174.9 ms ±   9.3 ms    [User: 32.7 ms, System: 9.7 ms]
  Range (min … max):   165.4 ms … 208.8 ms    20 runs
  • substrate git master:

still ~40% slower without im-rc

Benchmark #1: cargo-old -Z unstable-options generate-lockfile
  Time (mean ± σ):     362.6 ms ±  12.7 ms    [User: 196.2 ms, System: 32.9 ms]
  Range (min … max):   343.4 ms … 390.7 ms    20 runs
 
Benchmark #1: cargo-new -Z unstable-options generate-lockfile
  Time (mean ± σ):     671.0 ms ±  11.6 ms    [User: 418.0 ms, System: 120.8 ms]
  Range (min … max):   649.2 ms … 693.2 ms    20 runs

Looks like those aren't much affected by this feature at all, performance numbers are almost identical whether the feature is enabled or not. The only exception is servo, where enabling this feature makes cargo -Z unstable-options generate-lockfile run forever (I killed it after 10 minutes)


Here's some flamegraphs I generated from running my "old" and "new" versions of cargo against substrate:

  • with im-rc collections:

cargo-im-rc

  • with std collections:

cargo-std

It's a bit hard to read because flamegraph truncates the names of the frames, but it looks like the command spends a lot longer somewhere in alloc::collections::btree::* (knowing what I changed in between, it's probably some thing to do with cloning some big BTreeSet or BTreeMap ...)


PS: I'm starting to feel like I've had enough benchmarking fun for now ;)

@alexcrichton
Copy link
Member

I don't have a ton to add over what @Eh2406 has already mentioned, but I'll reiterate that these types were used specifically for quick cloning capabilities. As seen on some of the crate graphs this can have a significant impact on Cargo's runtime.

Also, though, I don't think that cargo generate-lockfile is necessarily what should be benchmarked here. The best benchmark is what Cargo does when it's using the old Cargo.lock file as guidance but generate-lockfile doesn't do that. The idea is that Cargo rebuilds Cargo.lock on all invocations, but this only needs to be fast insofar that if there's a previous Cargo.lock that still works we basically just use that. Unfortunately we don't have a great way to benchmark this today, the closest equivalent is cargo update -p some-path-dependency-crate. Another possible candidate is cargo build -Z unstable-options --build-plan (I forget the exact flags) but that also benchmarks creation of units themselves which can be somewhat expensive.

One option for some of these usages is to use Rc with make_mut, but if benchmarks are regressing here then there's clearly cloning which would lose out on Rc's fast path in make_mut, so it may not work for all usages (just some perhaps). I don't recall the precise behavior of what's cloned where, even on the fast path.

@decathorpe
Copy link
Contributor Author

Well ... if there's no meaningful way to benchmark my changes, then what can I do to move this forward?

True, this change makes runtime a few fractions of a second worse (still below one second though) for some cargo commands in projects with very very large dependency trees. Is that really bad enough that you want to continue depending on a seemingly-abandoned crate that has confirmed correctness and panic! safety issues in the exact data types that are in use by cargo?

PS: I've now also looked more closely into the imbl fork, and I don't think it would be a good replacement for im-rc either, since it only provides the thread-safe / "slow" variants from the im crate, not the single-threaded / non-Rc / "fast" variants from im-rc.

@alexcrichton
Copy link
Member

Sorry I don't mean to say there's no meaningful way to benchmark your changes. My point is that these benchmarks, as-is, may not be benchmarking the most critical item which is resolution with a lock file already in place.

Additionally, yes, the few fractions of a second here matter quite a lot. Cargo needs to get out of the way extremely quickly for incremental rebuilds, even of larger projects. Incremental rebuilds aren't always going to involve tons of rustc invocations that are super slow. For example if you cargo run and the binary is already built everything Cargo does is pure overhead while it figures out the binary doesn't need to be rebuilt. If that's 500 vs 100 ms that's a huge difference.

Yes it's not great to use unmaintained code, but a problem with a library does not guarantee that Cargo's usage is problematic. There may still be a problem but mere existence isn't enough to convince me to rip it out immediately without further investigation for how it affects Cargo.

@decathorpe
Copy link
Contributor Author

I agree, for "CLI interactivity", a few hundred milliseconds can make a pretty big difference.

I just wanted to say that for most projects I tested, the runtime differences were within the margin of error (rustc, cargo, servo, and a few other, smaller projects), but one project (paritytech/substrate) was a bit of an outlier, because with my changes, cargo generate-lockfile took almost twice as long. Still, the difference between "this takes 300 ms" and "this takes 600 ms" is not that noticable by an end user, since neither qualify as "instantaneous", nor are either in the "I can go grab a coffee while this runs" territory.

But as you noted, those were not really good targets for benchmarking. So could you give me a better, concrete example of a cargo command (or series of commands) that I could target for benchmarking / profiling my changes?

Then, I can try to find performance bottlenecks, and if I find any, I could try to fix those on top of the current patch, and see if I can't improve runtime behaviour such that getting rid of im-rc wouldn't cause performance regressions.

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 8, 2021

It's a bit hard to read because flamegraph truncates the names of the frames, but it looks like the command spends a lot longer somewhere in alloc::collections::btree::* (knowing what I changed in between, it's probably some thing to do with cloning some big BTreeSet or BTreeMap ...)

Looks to me like it is in btree::map... The only place you added a BTreeMap is in Graph. So I wonder if it should use Rc<BTreeMap<_, Rc<BTreeMap<_,_ >>> and make_mut. But I don't think that gets cloned. Unless that was with P&PD?

The best benchmark is what Cargo does when it's using the old Cargo.lock file as guidance but generate-lockfile doesn't do that.

I wonder if we can call std::process::exit after the resolver is done, then use cargo check as a benchmark?

One option for some of these usages is to use Rc with make_mut, but if benchmarks are regressing here then there's clearly cloning which would lose out on Rc's fast path in

Using make_mut makes it proportional to how many times it is modified, not cloned. Witch can be a big difference, especially if a big structure is cloned and the modifications are to a small part.

PS: I've now also looked more closely into the imbl fork, and I don't think it would be a good replacement for im-rc either,

That is very good to know. Thank you for looking into it!

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 8, 2021

Yes it's not great to use unmaintained code, but a problem with a library does not guarantee that Cargo's usage is problematic. There may still be a problem but mere existence isn't enough to convince me to rip it out immediately without further investigation for how it affects Cargo.

I have been keeping an eye as new issues come in to the im crate. I have not yet seen one where I recognized "cargo is doing exactly that", but no one is looking closely and im uses unsafe. I don't think it is a smoking gun that we need to be out now, but I would like to be moving away from it.

Edit: having gone back to review bodil/im-rs#124 is not so different from our use case.

@decathorpe
Copy link
Contributor Author

In the meantime, I've gone ahead and implemented the suggestion to wrap Graph.nodes with an Rc and using Rc::make_mut where necessary. I've re-run the previous "benchmarks" and they're all slightly better, but much better in the case of paritytech/substrate (about 33% faster than without using the Rc). Even a flamegraph shows much less time spent manipulating BTreeMaps:

  • before applying this PR:

flamegraph-old

  • initial changes (dropping im-rc):

cargo-std

  • wrapping Graph nodes in Rc:

flamegraph-rcd

In fact, time spent inside alloc::collections::btree* is reduced so much that it doesn't even show up on the flamegraph any longer.

@alexcrichton
Copy link
Member

So could you give me a better, concrete example of a cargo command (or series of commands) that I could target for benchmarking / profiling my changes?

I listed two above:

  • cargo update -p some-path-dependency-crate
  • cargo build -Z unstable-options --build-plan

Additionally since you're editing Cargo anyway you can add something like an early return Ok(()) instead of actually running the build once the resolution process has completed.


To reiterate, though, without concrete evidence pointing out that Cargo's usage of im-rc is indeed flawed as-is today, I am not personally going to be convinced to merge changes to switch to something else so long as it has a regression. I'm particularly worried about big projects, precisely the case that may be regressing here in the case of substrate.

@decathorpe
Copy link
Contributor Author

Ok, I re-ran benchmarks with those two commands.

  1. cargo update -p $foo-path-dep

With all crates I tested, this was actually slightly faster with my changes.
On average, my patch made this command run ~3.8% faster.

  • substrate: random path-dependency sp-authority-discovery chosen
Benchmark #1: cargo-old update -p sp-authority-discovery
  Time (mean ± σ):     349.8 ms ±   7.6 ms    [User: 197.9 ms, System: 26.6 ms]
  Range (min … max):   340.4 ms … 364.2 ms    20 runs

Benchmark #1: cargo-rcd update -p sp-authority-discovery
  Time (mean ± σ):     341.8 ms ±   7.2 ms    [User: 188.6 ms, System: 28.2 ms]
  Range (min … max):   332.5 ms … 357.1 ms    20 runs
  • rustc 1.54.0: random path-dependency alloc chosen
Benchmark #1: cargo-old update -p alloc
  Time (mean ± σ):     216.8 ms ±   7.4 ms    [User: 68.9 ms, System: 17.1 ms]
  Range (min … max):   203.6 ms … 227.8 ms    20 runs

Benchmark #1: cargo-rcd update -p alloc
  Time (mean ± σ):     210.2 ms ±   8.1 ms    [User: 63.6 ms, System: 16.2 ms]
  Range (min … max):   191.5 ms … 223.0 ms    20 runs
  • servo: random path-dependency servo_config chosen
Benchmark #1: cargo-old update -p servo_config
  Time (mean ± σ):     482.7 ms ±  10.3 ms    [User: 195.6 ms, System: 54.9 ms]
  Range (min … max):   450.5 ms … 494.3 ms    20 runs

Benchmark #1: cargo-rcd update -p servo_config
  Time (mean ± σ):     478.9 ms ±  12.7 ms    [User: 189.2 ms, System: 53.6 ms]
  Range (min … max):   458.2 ms … 499.6 ms    20 runs
  • cargo: random path-dependency cargo-test-macro chosen
Benchmark #1: cargo-old update -p cargo-test-macro
  Time (mean ± σ):      15.3 ms ±   0.3 ms    [User: 10.9 ms, System: 4.3 ms]
  Range (min … max):    14.8 ms …  16.0 ms    20 runs

Benchmark #1: cargo-rcd update -p cargo-test-macro
  Time (mean ± σ):      13.9 ms ±   0.2 ms    [User: 9.7 ms, System: 4.2 ms]
  Range (min … max):    13.7 ms …  14.4 ms    20 runs
  1. cargo build -Z unstable-options --build-plan

This command was also slightly faster with my changes, for all tested crates.
On average, my patch made this command run ~3.9% faster.

  • substrate:
Benchmark #1: cargo-old build -Z unstable-options --build-plan
  Time (mean ± σ):     373.7 ms ±   2.4 ms    [User: 336.2 ms, System: 98.1 ms]
  Range (min … max):   368.6 ms … 377.0 ms    20 runs

Benchmark #1: cargo-rcd build -Z unstable-options --build-plan
  Time (mean ± σ):     346.5 ms ±   3.0 ms    [User: 309.8 ms, System: 98.1 ms]
  Range (min … max):   341.5 ms … 352.0 ms    20 runs
  • rustc 1.54.0:
Benchmark #1: cargo-old build -Z unstable-options --build-plan
  Time (mean ± σ):     156.5 ms ±   1.6 ms    [User: 134.8 ms, System: 53.1 ms]
  Range (min … max):   154.3 ms … 160.6 ms    20 runs

Benchmark #1: cargo-rcd build -Z unstable-options --build-plan
  Time (mean ± σ):     147.7 ms ±   1.9 ms    [User: 125.3 ms, System: 53.4 ms]
  Range (min … max):   144.9 ms … 153.3 ms    20 runs
  • servo:
Benchmark #1: cargo-old build -Z unstable-options --build-plan
  Time (mean ± σ):     303.4 ms ±   2.7 ms    [User: 243.0 ms, System: 96.9 ms]
  Range (min … max):   297.3 ms … 311.2 ms    20 runs

Benchmark #1: cargo-rcd build -Z unstable-options --build-plan
  Time (mean ± σ):     293.2 ms ±   5.0 ms    [User: 233.4 ms, System: 95.8 ms]
  Range (min … max):   287.0 ms … 304.0 ms    20 runs
  • cargo:
Benchmark #1: cargo-old build -Z unstable-options --build-plan
  Time (mean ± σ):      34.9 ms ±   0.3 ms    [User: 27.5 ms, System: 14.3 ms]
  Range (min … max):    34.2 ms …  35.5 ms    20 runs

Benchmark #1: cargo-rcd build -Z unstable-options --build-plan
  Time (mean ± σ):      33.6 ms ±   0.3 ms    [User: 26.1 ms, System: 14.3 ms]
  Range (min … max):    33.1 ms …  34.4 ms    20 runs

I understand that you're wary of merging changes like this one where the benefit might be small and there might be performance regressions for some projects. Even if this PR doesn't end up getting merged, it can serve as research for anybody who will be looking into replacing im-rc in the future, and that's fine with me, too.

@alexcrichton
Copy link
Member

That's great news! That's definitely one of the biggest things to make sure doesn't regress, but it's also not necessarily everything. Do you plan on continuing to benchmark and evaluate this change, or are you waiting for a decision from someone else about whether this change is acceptable?

@decathorpe
Copy link
Contributor Author

I can continue evaluating this change, if there's more things to evaluate. But sure, it would be good to know if this PR (or something like it) might eventually be acceptable for merging, or if continuing to work on it will just be keeping me from doing more fruitful things with my time.

@alexcrichton
Copy link
Member

I don't think anyone will have a problem with the idea behind this change, which is to maintain perf and drop an unmaintained dependency. The tricky part at this point is evaulating this. Are you willing to evaluate this? That would involve trying to stress the resolver in interesting ways. If you instead are waiting for us to tell you how to stress the resolver and benchmark this I would like to know.

@decathorpe
Copy link
Contributor Author

Willing - to some extent, yes. However, since I assume I am the one person in this thread who knows least about how the cargo resolver works and how to stress it in different ways, I would need some pointers about what to do next.

@alexcrichton
Copy link
Member

Ok so just to make sure I put this into context, you're changing the data structure representations in Cargo in a core fashion here which is highly likely to have an impact on performance, something we care much about. At the same time, though, you're not really providing a ton of leadership from your end. I don't mean to dilute that which you've already done but so far it's not far above the minimum where things compile, tests pass, and you're running benchmarks we tell you to run.

Conversely, though, I would expect that changes to a core part of a project are accompanied with an equivalent amount of understanding and willingness to investigate and explore the ramifications of the changes that are being made. You don't sound too willilng to do this and are happy to do what we tell you but not much else. This places a lot of work on the maintainers of Cargo because we basically end up doing the change ourselves vicariously through you, which is both inefficient and draining on us. It would be much more helpful, if you'd like to see this change to completion, if you could put in some time and effort to analyze the resolver and determine the performance impact of this. This frees us, the Cargo team, from doing a large portion of the work of determining how to evaluate this change. We of course can offer assistance and try to find blind spots, but ideally this is a collaborative process that doesn't involve the Cargo team doing everything in terms of evaluation.

For example some concrete things I would expect to be evaluated is the performance on graphs that may not have lock files. It seems like there's some good data here to show that with a lock file performance is not regressing, but we still also don't want Cargo to take seconds-to-minutes to complete if there isn't a lock file. The generate-lockfile benchmarks you were running earlier are a good way to test this, but I think it would also be prudent to test this on synthetic graphs such as ones that need a lot of backtracking to get correctly resolved and such. There's a few tests within Cargo where backtracking is required and they took ages to complete until correct backtracking was added. I think it would be good to track the peformance difference of some resolution graphs like that.

This change by no means needs to be performance-neutral-or-better across the board. My goal is to understand the performance impact this has. The documentation of the structures here specifically say they get cloned a lot and performance is important, and with this PR those comments are basically stale and ignored. We unfortunately didn't do a great job of recording benchmarks in the past, which isn't great because it offloads work to this PR, but I don't think that it's necessarily an excuse to eschew things.

As a final note, when you post benchmarks, it'd be great if mostly just a summary was posted. Reading over dozens of outputs of hyperfine isn't really interesting and feels like you're just giving us data without actually synthesizing or reading it much yourself. We trust you to measure things well and do your math right, so we don't need to reach all the same conclusions oursleves from the same data.

@decathorpe
Copy link
Contributor Author

While this is not exactly the response I was hoping for, I understand that time is one thing that we all could use more of. Honestly, I expected this to be a simpler change, since it essentially reverts a previous PR, where its performance characteristics weren't explored at all beyond "it's a bit faster" - so putting this burden on "unlucky me" seems a bit unfair. I'm not working on Rust stuff full time, I'm doing this for fun (<sarcasm />), and with exams and more university courses coming up soon, I won't have the time to go spelunking for performance bottlenecks in the entire cargo code base anytime soon. Maybe this PR can serve as "prior research" if somebody wants to tackle dropping im-rc again. Sorry for taking up your time.

PS: Independent of this PR, it would probably make sense to add a few simple benches (maybe with example Cargo.toml fixtures that expose some bottlenecks in the crate's code) to cargo, so future changes like this one might have an easier time ...

@decathorpe decathorpe closed this Sep 16, 2021
@Eh2406
Copy link
Contributor

Eh2406 commented Sep 16, 2021

This does need to be done with care, but it is unfair to require that to be done by you. It would be better if we had isolated and repeatable benchmarks, and it is not good to claim that we care about performance without them. The current implementation of the resolver is a tangled mess and isolating benchmarks is hard. As one of the people who understands the performance of the resolver the best, let me give this a close look and see if that gives Alex the confidence to merge.

The main reason this needs to be done carefully is that while #6336 was perf neutral, changes since then did rely on it. For example digging into why we needed to add Rc to the Graph when it did not have them before #6336, turns out we added a parents: Graph<PackageId, _>, to Context in #6653 and removed the RcLists versions in #6860. This was a big readability improvement, but relied on Graph being cheap to clone.

Next let us look at RemainingDeps it is large ~O(# of crates with dependencies) and only modified in small ways. But we are switching from a BtreeMap to a BinaryHeap so that is better and much easier to read. The contents are DepsFrame witch have a Rc around the key parts. So I think this will be OK.

resolve_features is small O(# of packages with activated features) and its values are FeaturesSet witch wraps an Rc. It is rarely changed so it is better as a Rc<HashMap<.

links are changed even more rarely. Definitely an Rc<HashMap<.

Activations is large O(# of packages) and only modified in small ways. But it will be modified every time it is copied. I don't think an Rc will help.

PublicDependency only matters when P&PD is in use. It is large O(# of crates with dependencies) and can be modified a lot in each tick. I would guess that the outer HashMap should be plain, as we will modify at least one package every tick. But the inner HashMap should be an Rc<HashMap< as only a small fraction are likely to change.

@decathorpe I should have left the resolver in a state that was easier to contribute to. I am sorry this has been a frustrating experience. I want to see this replacement happen, and will make shore your commits get used. Thank you for getting the ball rolling.

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 23, 2021

@decathorpe as one peace of followup, Eric opened #9935 for us to discuss how best to add benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants