-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manually implement Hash
for DefId
#91660
Conversation
r? @jackh726 (rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit ad29818c9fa32552de852d1656757960a5487cf0 with merge abe401b21ff392a1e7153686e94b22d7949b9a0c... |
☀️ Try build successful - checks-actions |
Queued abe401b21ff392a1e7153686e94b22d7949b9a0c with parent ce0f7ba, future comparison URL. |
Finished benchmarking commit (abe401b21ff392a1e7153686e94b22d7949b9a0c): comparison url. Summary: This change led to moderate relevant improvements 🎉 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
Those results appear to be a mixed bag. I suppose this might be because the |
compiler/rustc_span/src/def_id.rs
Outdated
// hash one u64 instead of two u32s | ||
impl Hash for DefId { | ||
fn hash<H: Hasher>(&self, h: &mut H) { | ||
((u64::from(u32::from(self.krate)) << 32) | u64::from(u32::from(self.index))).hash(h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
((u64::from(u32::from(self.krate)) << 32) | u64::from(u32::from(self.index))).hash(h) | |
((u64::from(u32::from(self.index)) << 32) | u64::from(u32::from(self.krate))).hash(h) |
It may be faster this way: https://rust.godbolt.org/z/EMeebaGb3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but that would exacerbate the problem of def index bit distribution further (at least my theory is that we usually have more defs in a crate than we have crates within one build, but I may be wrong).
I'd rather reorder the fields in the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little-endian only improvement, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah (probably 64-bit LE only to boot) -- if you throw a --target
on the godbolt for a BE arch you can see what it looks like.
In practice the most popular architectures these days are LE (x86-64 and aarch64), but I guess if one were really motivated they could use #[cfg]
to keep it optimal on both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I always find endianness tricky to think about, just checking my reasoning :)
Are they? The instruction count changes of significance are all small improvements.
I don't understand this. Can you elaborate? |
I guess I was reading too much into the results. After looking into the hash function again, I think it's probably noise. I'd like to see if reordering the fields brings more benefit. What's your take? Should we try to optimize the |
This comment has been minimized.
This comment has been minimized.
I will experiment today with different variations. I have local benchmarking set up which is much faster than CI for doing multiple perf runs. |
Ok, some very surprising results. v1Like the original in this PR (with slightly different syntax for the integer conversions):
Gave some tiny instruction count wins (I only measured
v2Alex's suggestion of swapping the order:
Gave gigantic slowdowns. Here's the first part of the results:
I did a Cachegrind diff and confirmed that it was due to huge numbers of collisions. E.g. this
changed to this:
Wow. v3Like v1, but with the fields in
But I get the same test failures seen on CI, which I don't understand. I get those same failures even if I write
So now I'm really confused. |
Perhaps those tests rely on the field order of DefId? I'm not sure how (I haven't had the time to look into them, but given your suggestion, I'll do that next. |
Also that's what I suspected regarding the drawback of the large shift. Perhaps shifting (or rotating?) by a smaller amount and doing xor instead of or might give us even better results? Just a hunch. That one could even work for 32 bit systems. |
By "large shift" do you mean the left shift by 32? It's the obvious way to combine two 32 bit values into a 64 bit value. If you're talking about doing something more complex like xor or rotate then that's just a kind of pre-hash before the real hash, which doesn't seem worthwhile; might as well just hash the two fields in the normal way. I did try another thing: changing
The folded multiply is clever, it folds the overflow bytes from the multiply back into the result. This avoids the It doesn't go bad when the |
Thinking some more: within the compiler
|
Except the bottom 2-3 bits, which are always 0. (I'm not sure if this meaningfully impacts your point.) (Sorry for the confusion of commenting from two different accounts!) |
The UI tests appear to check trait coherence and ambiguity. Probably a good idea to dig into that code, as it may be less robust than we'd like. |
Having had a deeper look at the failing tests, the difference appears to hinge on the iteration order of the trait lookup, which is a HashMap, so it's kind of expected that order would change when the hashing function does. I think updating the tests here should be fine. |
Ok, so I guess we want to avoid doing such optimizations generically. The risk of getting an avalanche of collisions (pardon the pun) is very real. For this PR, before merging I think I should |
fda4652
to
7b914cc
Compare
This should be ready now. In practice it boils down to @nnethercote 's v3 (which gave slightly better results in his benchmarks) on 64 bit systems and keeping the derived impls on all other pointer widths. Do we need another perf run or can we merge this @jackh726 ? |
Extracting |
7b914cc
to
52496b6
Compare
@camelid with the comments I added, this should hopefully now be sufficiently clear. |
I'd suggest making the comments into doc-comments so rustdoc picks them up. Also, what's the disadvantage of extracting local variables? I'm not sure why adding comments helps with that. |
I even made some ascii art to show where each bit ends up. So what's unclear? 😄 My reasoning for not making those doc comments is that they are for folks working on exactly that file whereas docs are for people using the type. If I just use a |
Finished benchmarking commit (03e04a6f4e077238d1936bfd25c5b6450327e583): comparison url. Summary: This change led to large relevant improvements 🎉 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
Oh, I wasn't saying it's unclear, just that the code would be a little easier to read if local variables were extracted. Your comment is definitely helpful :) |
This also reorders the fields to reduce the assembly operations for hashing and changes two UI tests that depended on the former ordering because of hashmap iteration order.
52496b6
to
635533b
Compare
@bors r+ |
📌 Commit 635533b has been approved by |
Is this actually a good idea to change order of fields and relying on current behavior? As for default |
We only rely on field order for performance, not correctness. Nothing to worry here. |
☀️ Test successful - checks-actions |
Finished benchmarking commit (a2d25b4): comparison url. Summary: This change led to large relevant improvements 🎉 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression |
Field order could be guaranteed by |
Probably not. Although at that point we may also want to I'll create a follow-up PR. |
This might speed up hashing for hashers that can work on individual u64s. Just as an experiment, suggested in a reddit thread on
FxHasher
. cc @nnethercoteNote that this should not be merged as is without cfg-ing the code path for 64 bits.