-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slightly optimize hash map stable hashing #89404
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 6fd9f81c09c27edcf702f3be7cedaa5b465a16a9 with merge 6ed7c0251b508f4b31a6b15623addde734ae2172... |
This comment has been minimized.
This comment has been minimized.
6fd9f81
to
3b9da30
Compare
Well, this is embarrassing 😅 I assumed that iterating a |
This comment has been minimized.
This comment has been minimized.
@jackh726 Ok I'm not actually sure if that last failure is caused by the PR or if it's spurious? |
@bors try |
⌛ Trying commit 3b9da308ebfd15a41bf400a14ab99f8a61427963 with merge 871b2837e9311a4b8b20f8ec57ee7463d1714c64... |
☀️ Try build successful - checks-actions |
Queued 871b2837e9311a4b8b20f8ec57ee7463d1714c64 with parent 6dc08b9, future comparison URL. |
No problem. Take a look at the "Heap Sort" algorithm :) |
I assumed that |
Finished benchmarking commit (871b2837e9311a4b8b20f8ec57ee7463d1714c64): comparison url. Summary: This change led to very large relevant regressions 😿 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
3b9da30
to
8031200
Compare
Sigh, back to the drawing board I guess. It would really help to make this sort faster, but my usual tricks (parallelizing the sort or using radix sort) probably can't be applied here. Let's see if the SmallVec alone helps with anything, if not, I'll close the PR. Can someone please run another perf run? :) Thanks. |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 8031200d5bc4f4b8356a514e2b765d3ccfaa6870 with merge b66e21c62a53fcc9fd91130c8b45d414d64e0616... |
📌 Commit e4b4d18 has been approved by |
🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened. |
I just saw this now. Very interesting! Note that we don't actually need to "sort" the hash table contents before hashing. We just need to have them in a deterministic order. Maybe that can be exploited somehow? |
Hmm, so, any idea how to produce a deterministic order without sorting? :D And without depending on the insertion order I suppose. If all hash map insertions have the same order and the hash function is deterministic, we could probably just iterate the items of the hashmap as they are laid out in the map. But that would depend on the insertion order, which seems to be too brittle (basically this was the reason why we decided to postpone https://perf.rust-lang.org/compare.html?start=55ccbd090d96ec3bb28dbcb383e65bbfa3c293ff&end=7b55f66571824a0ba6e2ee9303d94854dcc8f378 until we make sure that the insertion order won't break anything). |
An idea how to produce a stable hash without order: Use a commutative function. The problem with that is that it requires the hash for each item to be computed separately before it can be merged with a commutative function. The initialization and |
☀️ Test successful - checks-actions |
Finished benchmarking commit (58457bb): comparison url. Summary: This benchmark run did not return any relevant changes. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression |
@the8472 from my benchmarks it looked like hashing and sorting takes a similar amount of time here, based on the workload. So keeping the hashing + a commutative function might theoretically work. I'll try it, thanks. |
…the8472 Avoid sorting in hash map stable hashing Suggested by `@the8472` [here](rust-lang#89404 (comment)). I hope that I understood it right, I replaced the sort with modular multiplication, which should be commutative. Can I ask for a perf. run? However, locally it didn't help at all. Creating the `StableHasher` all over again is probably slowing it down quite a lot. And using `FxHasher` is not straightforward, because the keys and values only implement `HashStable` (and probably they shouldn't be just hashed via `Hash` anyway for it to actually be stable). Maybe the `StableHash` interface could be changed somehow to better suppor these scenarios where the hasher is short-lived. Or the `StableHasher` implementation could have variants with e.g. a shorter buffer for these scenarios.
…cjgillot Change several HashMaps to IndexMap to improve incremental hashing performance Stable hashing hash maps in incremental mode takes a lot of time, especially for some benchmarks like `clap`. As noted by `@Mark-Simulacrum` [here](rust-lang#89404 (comment)), this cost could be reduced by replacing some hash maps by indexmaps. I gathered some statistics and found several hash maps that took a lot of time to hash and replaced them by indexmaps. However, in order for this to work, we need to make sure that these indexmaps have deterministic insertion order. These three are used only in visitors as far as I can see, which seems deterministic. Can we enforce this somehow? Or should some explaining comment be included for these maps?
I was profiling some of the
rustc-perf
benchmarks locally and noticed that quite some time is spent inside the stable hash of hashmaps. I tried to use aSmallVec
instead of aVec
there, which helped very slightly.Then I tried to remove the sorting, which was a bottleneck, and replaced it with insertion into a binary heap. Locally, it yielded nice improvements in instruction counts and RSS in several benchmarks for incremental builds. The implementation could probably be much nicer and possibly extended to other stable hashes, but first I wanted to test the perf impact properly.
Can I ask someone to do a perf run? Thank you!