-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized HashMap. RawTable exposes a safe interface. #15720
Conversation
} | ||
} | ||
|
||
fn as_mut_ptrs(&self) -> RawBucket<K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks pretty similar to calculate_offsets
. Can you try to share as much code as possible? It was pretty subtle for me to get right in the first place, and bugs in this type of code can be pretty catastrophic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to reuse the code or write a comprehensive test, at least.
calculate_offsets
returns more than we need here. I found it difficult to control its inlining. This function is called millions of times from rustc
. It's an important part of every search
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has mut
in the name, but takes &self
: would taking &mut self
be more appropriate? (Is that even possible?)
Had to write a branchless |
Are you sure that that is from this patch? |
That's right, let's compare two stage1s.
|
// 1111111b | ||
// Then AND with the capacity: & 1000000b | ||
// ------------ | ||
// 1000000b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... and it's zero at all other times."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This is nearly ready for a merge. I'm going check a few details and document them.
|
|
The build passes once again. Everything is resolved so far and ready for a merge. I hope I didn't accidentally remove anything during rebase. |
I was running some JSON tests today and got these results:
So it's an improvement but it still doesn't match the JSON decoder using TreeMap. I don't have the expertise to check the reason, but someone more experienced may find it worth a look. Code
JSON used:
|
@arthurprs Benchmarking insertion is not reliable since it depends on the allocator. Does serialization create an empty map for every structure? The difference would be best measured with a profiler. |
I tried to, but since I have little experience with rust internals and C-ish profilers I couldn't identify the reason. The object building code is bellow, it's pretty simple. The only changes I made was replacing TreeMap with HashMap.
|
* branchless `bucket.next()` * updated documentation after interface changes
Closing due to inactivity, but feel free to reopen with a rebase! |
Argh! @pczarn I completely forgot about this pull request -- if you do rebase, please ping me and I will try to review promptly. |
This is #15720, rebased and reopened. cc @nikomatsakis
First, a benchmark of the original hashmap implementation on Intel i3.
bucket_distance
andpop_internal
A branchless implementation of probe count calculation. A branch just avoided unsigned underflow. Since the
capacity
is a power of 2, we can ignore underflow and simply return the difference modulo capacity. We can use the fact that 'an index argument >= capacity' is acceptable later on.pop_internal
returnsV
instead ofOption<V>
.No more low hanging fruits...
iteration over buckets
Let's use external iterators for their greatest advantage in Rust: to avoid bounds checks and indexing. Thus
{Empty,Full}Index
becomes{Empty,Full}Bucket
.The growing is done an optimized reinsertion algorithm based on
insert_hashed_ordered
.HashMap::new()
returns a table with the capacity set to 0. Hashmaps that never had an element inserted won't allocate, as measured bybench::new_drop
.Removed two pointers from
RawTable
as they can be recalculated once per iteration rather than every indexing operation.RawTable
struct is as small asVec
with 24 bytes.Reduced code duplication between
swap
andinsert_hashed_nocheck
. Created a new methodinsert_or_replace_with
that accepts a simple closure for this. Moreover,robin_hood
will most likely get inlined.safe interface
This is possible thanks to a lot of prior work by @nikomatsakis! Relevant commit: nikomatsakis@2fcb95b
To eliminate the double-take issue, we must tie a bucket pointer to a reference to the table within a private structure. Only a bucket that holds a unique reference can be updated.
GapThenFull
uses a similar strategy to encapsulate two consecutive buckets and a single reference. Some of HashMap's methods became functions parameterized over mutability.We could use an uninit reference or drop trickery to replace one zeroing of a bucket's hash per
shift
ofGapThenFull
with a single memory access. The former language feature is not planned and the latter seems inefficient.split hashmap.rs into separate files
Total number of lines approaches 3000.
final refactoring
Standalone
robin_hood
.I realized that
insert_or_update_with
can callinsert_or_replace_with
directly.Inlining now started happening in microbenchmarks for some reason. Looks like
search_hashed_generic
is quite small:conclusion
Further optimizations are increasingly difficult, usually they would make the code unreadable or would require new language features.
It's possible to build cache aware HashMap on top of a vector of SoAs, such as
Vec<([u64, ..N], [K, ..N], [V, ..N])>
. Possibility of in-place growing is the only performance advantage of this approach. Unfortunately, an increase in iteration complexity surpasses benefits of storage reuse.Documentation is not finished. Most methods need more comments and tests.
Robin hood hashing scheme was introduced in #12081.
cc @cgaebel