Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slightly optimize hash map stable hashing #89404

Merged
merged 1 commit into from
Dec 12, 2021

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Sep 30, 2021

I was profiling some of the rustc-perf benchmarks locally and noticed that quite some time is spent inside the stable hash of hashmaps. I tried to use a SmallVec instead of a Vec there, which helped very slightly.

Then I tried to remove the sorting, which was a bottleneck, and replaced it with insertion into a binary heap. Locally, it yielded nice improvements in instruction counts and RSS in several benchmarks for incremental builds. The implementation could probably be much nicer and possibly extended to other stable hashes, but first I wanted to test the perf impact properly.

Can I ask someone to do a perf run? Thank you!

@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 30, 2021
@jackh726
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 30, 2021
@bors
Copy link
Contributor

bors commented Sep 30, 2021

⌛ Trying commit 6fd9f81c09c27edcf702f3be7cedaa5b465a16a9 with merge 6ed7c0251b508f4b31a6b15623addde734ae2172...

@rust-log-analyzer

This comment has been minimized.

@Kobzol
Copy link
Contributor Author

Kobzol commented Sep 30, 2021

Well, this is embarrassing 😅 I assumed that iterating a BinaryHeap would return the items in a sorted order (a bit of an API footgun, although I understand why it doesn't do that). And I was wondering why it was so fast.. now it probably won't be nearly as good, but let's see.

@rust-log-analyzer

This comment has been minimized.

@Kobzol
Copy link
Contributor Author

Kobzol commented Sep 30, 2021

@jackh726 Ok I'm not actually sure if that last failure is caused by the PR or if it's spurious?

@jackh726
Copy link
Member

@bors try

@bors
Copy link
Contributor

bors commented Sep 30, 2021

⌛ Trying commit 3b9da308ebfd15a41bf400a14ab99f8a61427963 with merge 871b2837e9311a4b8b20f8ec57ee7463d1714c64...

@bors
Copy link
Contributor

bors commented Sep 30, 2021

☀️ Try build successful - checks-actions
Build commit: 871b2837e9311a4b8b20f8ec57ee7463d1714c64 (871b2837e9311a4b8b20f8ec57ee7463d1714c64)

@rust-timer
Copy link
Collaborator

Queued 871b2837e9311a4b8b20f8ec57ee7463d1714c64 with parent 6dc08b9, future comparison URL.

@leonardo-m
Copy link

I assumed that iterating a BinaryHeap would return the items in a sorted order (a bit of an API footgun, although I understand why it doesn't do that).

No problem. Take a look at the "Heap Sort" algorithm :)

@Kobzol
Copy link
Contributor Author

Kobzol commented Sep 30, 2021

No problem. Take a look at the "Heap Sort" algorithm :)

I assumed that into_iter() would behave like into_iter_sorted. Which I didn't know existed, otherwise it would be obvious :) At least I'm not alone :D

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (871b2837e9311a4b8b20f8ec57ee7463d1714c64): comparison url.

Summary: This change led to very large relevant regressions 😿 in compiler performance.

  • Very large regression in instruction counts (up to 8.6% on incr-full builds of clap-rs)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 30, 2021
@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 1, 2021

Sigh, back to the drawing board I guess. It would really help to make this sort faster, but my usual tricks (parallelizing the sort or using radix sort) probably can't be applied here. Let's see if the SmallVec alone helps with anything, if not, I'll close the PR.

Can someone please run another perf run? :) Thanks.

@joshtriplett
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 1, 2021
@bors
Copy link
Contributor

bors commented Oct 1, 2021

⌛ Trying commit 8031200d5bc4f4b8356a514e2b765d3ccfaa6870 with merge b66e21c62a53fcc9fd91130c8b45d414d64e0616...

@joshtriplett
Copy link
Member

@bors r=@Mark-Simulacrum

@bors
Copy link
Contributor

bors commented Dec 10, 2021

📌 Commit e4b4d18 has been approved by Mark-Simulacrum

@bors
Copy link
Contributor

bors commented Dec 10, 2021

🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 10, 2021
@michaelwoerister
Copy link
Member

I just saw this now. Very interesting! Note that we don't actually need to "sort" the hash table contents before hashing. We just need to have them in a deterministic order. Maybe that can be exploited somehow?

@Kobzol
Copy link
Contributor Author

Kobzol commented Dec 10, 2021

Hmm, so, any idea how to produce a deterministic order without sorting? :D And without depending on the insertion order I suppose.

If all hash map insertions have the same order and the hash function is deterministic, we could probably just iterate the items of the hashmap as they are laid out in the map. But that would depend on the insertion order, which seems to be too brittle (basically this was the reason why we decided to postpone https://perf.rust-lang.org/compare.html?start=55ccbd090d96ec3bb28dbcb383e65bbfa3c293ff&end=7b55f66571824a0ba6e2ee9303d94854dcc8f378 until we make sure that the insertion order won't break anything).

@the8472
Copy link
Member

the8472 commented Dec 12, 2021

Hmm, so, any idea how to produce a deterministic order without sorting?

An idea how to produce a stable hash without order: Use a commutative function.

The problem with that is that it requires the hash for each item to be computed separately before it can be merged with a commutative function. The initialization and finish() costs of SipHasher might be significant enough that the overhead of hashing elements individually might outweigh the gains from not having to sort. In that case some FastStableHasher would be needed.

@bors
Copy link
Contributor

bors commented Dec 12, 2021

⌛ Testing commit e4b4d18 with merge 58457bb...

@bors
Copy link
Contributor

bors commented Dec 12, 2021

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 58457bb to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 12, 2021
@bors bors merged commit 58457bb into rust-lang:master Dec 12, 2021
@rustbot rustbot added this to the 1.59.0 milestone Dec 12, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (58457bb): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@rustbot rustbot removed the perf-regression Performance regression. label Dec 12, 2021
@Kobzol Kobzol deleted the hash-stable-sort branch December 12, 2021 09:53
@Kobzol
Copy link
Contributor Author

Kobzol commented Dec 12, 2021

@the8472 from my benchmarks it looked like hashing and sorting takes a similar amount of time here, based on the workload. So keeping the hashing + a commutative function might theoretically work. I'll try it, thanks.

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 18, 2021
…the8472

Avoid sorting in hash map stable hashing

Suggested by `@the8472` [here](rust-lang#89404 (comment)). I hope that I understood it right, I replaced the sort with modular multiplication, which should be commutative.

Can I ask for a perf. run? However, locally it didn't help at all. Creating the `StableHasher` all over again is probably slowing it down quite a lot. And using `FxHasher` is not straightforward, because the keys and values only implement `HashStable` (and probably they shouldn't be just hashed via `Hash` anyway for it to actually be stable).

Maybe the `StableHash` interface could be changed somehow to better suppor these scenarios where the hasher is short-lived. Or the `StableHasher` implementation could have variants with e.g. a shorter buffer for these scenarios.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 11, 2022
…cjgillot

Change several HashMaps to IndexMap to improve incremental hashing performance

Stable hashing hash maps in incremental mode takes a lot of time, especially for some benchmarks like `clap`. As noted by `@Mark-Simulacrum` [here](rust-lang#89404 (comment)), this cost could be reduced by replacing some hash maps by indexmaps.

I gathered some statistics and found several hash maps that took a lot of time to hash and replaced them by indexmaps. However, in order for this to work, we need to make sure that these indexmaps have deterministic insertion order. These three are used only in visitors as far as I can see, which seems deterministic. Can we enforce this somehow? Or should some explaining comment be included for these maps?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.