Robin hood hashmap #12081

cgaebel · 2014-02-07T02:26:47Z

Partially addresses #11783.

Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.

The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.

Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.

Benchmarks:

comprehensive_old_hashmap         378 ns/iter (+/- 8)
comprehensive_new_hashmap         206 ns/iter (+/- 4)
1.8x faster

old_hashmap_as_queue              238 ns/iter (+/- 8)
new_hashmap_as_queue              119 ns/iter (+/- 2)
2x faster

old_hashmap_insert                172 ns/iter (+/- 8)
new_hashmap_insert                146 ns/iter (+/- 11)
1.17x faster

old_hashmap_find_existing         50 ns/iter (+/- 12)
new_hashmap_find_existing         35 ns/iter (+/- 6)
1.43x faster

old_hashmap_find_notexisting      49 ns/iter (+/- 49)
new_hashmap_find_notexisting      34 ns/iter (+/- 4)
1.44x faster

Memory usage of old hashtable (64-bit assumed):

aligned(8+sizeof(Option)+sizeof(K)+sizeof(V))/0.75 + 48ish bytes

Memory usage of new hashtable:

(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 112ish bytes

Timing of building librustc:

compile_and_link: x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.457 s   parsing
time: 0.028 s   gated feature checking
time: 0.000 s   crate injection
time: 0.108 s   configuration 1
time: 1.049 s   expansion
time: 0.219 s   configuration 2
time: 0.222 s   maybe building test harness
time: 0.223 s   prelude injection
time: 0.268 s   assinging node ids and indexing ast
time: 0.075 s   external crate/lib resolution
time: 0.026 s   language item collection
time: 1.016 s   resolution
time: 0.038 s   lifetime resolution
time: 0.000 s   looking for entry point
time: 0.030 s   looking for macro registrar
time: 0.061 s   freevar finding
time: 0.138 s   region resolution
time: 0.110 s   type collecting
time: 0.072 s   variance inference
time: 0.126 s   coherence checking
time: 9.110 s   type checking
time: 0.186 s   const marking
time: 0.049 s   const checking
time: 0.418 s   privacy checking
time: 0.057 s   effect checking
time: 0.033 s   loop checking
time: 1.293 s   compute moves
time: 0.182 s   match checking
time: 0.242 s   liveness checking
time: 0.866 s   borrow checking
time: 0.150 s   kind checking
time: 0.013 s   reachability checking
time: 0.175 s   death checking
time: 0.461 s   lint checking
time: 13.112 s  translation
  time: 4.352 s llvm function passes
  time: 96.702 s    llvm module passes
  time: 50.574 s    codegen passes
time: 154.611 s LLVM passes
  time: 2.821 s running linker
time: 15.750 s  linking


compile_and_link: x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.422 s   parsing
time: 0.031 s   gated feature checking
time: 0.000 s   crate injection
time: 0.126 s   configuration 1
time: 1.014 s   expansion
time: 0.251 s   configuration 2
time: 0.249 s   maybe building test harness
time: 0.273 s   prelude injection
time: 0.279 s   assinging node ids and indexing ast
time: 0.076 s   external crate/lib resolution
time: 0.033 s   language item collection
time: 1.028 s   resolution
time: 0.036 s   lifetime resolution
time: 0.000 s   looking for entry point
time: 0.029 s   looking for macro registrar
time: 0.063 s   freevar finding
time: 0.133 s   region resolution
time: 0.111 s   type collecting
time: 0.077 s   variance inference
time: 0.565 s   coherence checking
time: 8.953 s   type checking
time: 0.176 s   const marking
time: 0.050 s   const checking
time: 0.401 s   privacy checking
time: 0.063 s   effect checking
time: 0.032 s   loop checking
time: 1.291 s   compute moves
time: 0.172 s   match checking
time: 0.249 s   liveness checking
time: 0.831 s   borrow checking
time: 0.121 s   kind checking
time: 0.013 s   reachability checking
time: 0.179 s   death checking
time: 0.503 s   lint checking
time: 14.385 s  translation
  time: 4.495 s llvm function passes
  time: 92.234 s    llvm module passes
  time: 51.172 s    codegen passes
time: 150.809 s LLVM passes
  time: 2.542 s running linker
time: 15.109 s  linking

BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.

And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.

Please review this very closely. This is my first major contribution to Rust. Sorry for the ugly diff!

huonw · 2014-02-07T02:31:07Z

src/libstd/hashmap.rs

 use num;
-use option::{None, Option, Some};
-use rand::Rng;
+use option::*;


We're trying to avoid glob imports.

I import so, so many things from iter that it makes more sense to glob.

All other globs have been removed.

nikomatsakis · 2014-03-01T02:54:25Z

I'm not even sure how much controversy there is at this point, just thought I'd throw out my 2 cents ;)

huonw · 2014-03-01T03:20:44Z

src/libcollections/hashmap.rs

+mod table {
+    use std::clone::Clone;
+    use std::cmp::Eq;
+    use std::hash::{Hash, Hasher, sip};


I think this is the (legitimately) unused import, rather than the outer one.

Dear god that's embarassing.

Thanks, though..

huonw · 2014-03-01T03:23:02Z

Re assertions: #12616 gives a precedent for writing

// FIXME #12049
if cfg!(test) { assert!(...) }

cgaebel · 2014-03-01T03:29:31Z

Added the "debug asserts" referencing #12049.

emberian · 2014-03-02T01:39:06Z

@cgaebel it's probably a staging thing. You might need a #[cfg(not(stage0))] use ... or something.

cgaebel · 2014-03-02T01:40:02Z

Naw huon got it. It was just me not looking at the line numbers and being confused. This should be good to merge.

emberian · 2014-03-02T01:40:39Z

/me quietly sits on his hands

huonw · 2014-03-02T01:42:32Z

BTW, are the numbers in the PR description still current with the various changes that have been made?

cgaebel · 2014-03-02T01:52:17Z

No, but should be same ballpark. Nothing major that should affect performance has changed.

nikomatsakis · 2014-03-03T13:11:38Z

On a recent flight, I finally found the time to read the PR in
depth. This is some excellent work. I am satisfied with the
refactoring of the unsafe bits combined with the usage assertions,
it's even better than what I had in mind. I think though I have some
ideas for how to statically verify the usage of FullIndex and friends
are used properly, but this could be a follow-up PR (and I have to try
it out, there might be a complication I have not foreseen).

I will look over my notes later today and add them into the PR. I
seem to recall that mostly I just had nits, apart from my thoughts on
how to make it safer.

Regarding the mutable iterator in specific -- I think you could make
it "safe" along the model of the vector iterators, by having the
iterator take mutable slices that it shrinks for each step. But this
is just trading one unsafety for another, since those slices would
contain partially initialized data, and the compiler would be unaware
of it. I think the current setup (artifically prolonging the lifetime)
is a smaller sin.

nikomatsakis · 2014-03-03T15:14:41Z

(So I investigated my "safety" ideas a bit more. I think it can be done, but it's a non-trivial patch, and it's unclear if the resulting code is clearer. It would be somewhat nicer if we could parameterize over mutability, though using traits it might be possible to write the relevant code generically. Regardless, as the patch'd be non-trivial, it might affect perf, and it may not be more readable really, so I think it ought to be considered as a follow up.)

amaranth · 2014-03-09T05:54:02Z

src/libcollections/hashmap.rs

 //! # Example
 //!
 //! ```rust
-//! use collections::HashMap;
+//! use std::hashmap::HashMap;


Looks like a merge ended up with the old version here?

Fixed. Good catch!

brson · 2014-03-11T20:27:34Z

The unsafety is unfortunate but we can't ignore this perf improvement. Let's land this but please do keep trying to abstract out the unsafe code here.

@cgaebel Thanks.

cgaebel · 2014-03-11T20:39:57Z

The public interface is safe. What pieces of unsafety should I focus on? The "double-take bug"?

alexcrichton · 2014-03-12T02:59:53Z

Feel free to comment on the PR whenever you force push, sadly none of us get notifications!

cgaebel · 2014-03-12T11:13:06Z

Oops. /pushed

nikomatsakis · 2014-03-12T16:07:31Z

The public interface is safe. What pieces of unsafety should I focus on? The "double-take bug"?

Note: I have an in-progress branch addressing the double take bug. I'll push it somewhere public.

Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.

cgaebel · 2014-03-12T22:30:26Z

Force pushed.

Partially addresses #11783. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: ``` comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(Option)+sizeof(K)+sizeof(V))/0.75 + 48ish bytes Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 112ish bytes Timing of building librustc: compile_and_link: x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc time: 0.457 s parsing time: 0.028 s gated feature checking time: 0.000 s crate injection time: 0.108 s configuration 1 time: 1.049 s expansion time: 0.219 s configuration 2 time: 0.222 s maybe building test harness time: 0.223 s prelude injection time: 0.268 s assinging node ids and indexing ast time: 0.075 s external crate/lib resolution time: 0.026 s language item collection time: 1.016 s resolution time: 0.038 s lifetime resolution time: 0.000 s looking for entry point time: 0.030 s looking for macro registrar time: 0.061 s freevar finding time: 0.138 s region resolution time: 0.110 s type collecting time: 0.072 s variance inference time: 0.126 s coherence checking time: 9.110 s type checking time: 0.186 s const marking time: 0.049 s const checking time: 0.418 s privacy checking time: 0.057 s effect checking time: 0.033 s loop checking time: 1.293 s compute moves time: 0.182 s match checking time: 0.242 s liveness checking time: 0.866 s borrow checking time: 0.150 s kind checking time: 0.013 s reachability checking time: 0.175 s death checking time: 0.461 s lint checking time: 13.112 s translation time: 4.352 s llvm function passes time: 96.702 s llvm module passes time: 50.574 s codegen passes time: 154.611 s LLVM passes time: 2.821 s running linker time: 15.750 s linking compile_and_link: x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc time: 0.422 s parsing time: 0.031 s gated feature checking time: 0.000 s crate injection time: 0.126 s configuration 1 time: 1.014 s expansion time: 0.251 s configuration 2 time: 0.249 s maybe building test harness time: 0.273 s prelude injection time: 0.279 s assinging node ids and indexing ast time: 0.076 s external crate/lib resolution time: 0.033 s language item collection time: 1.028 s resolution time: 0.036 s lifetime resolution time: 0.000 s looking for entry point time: 0.029 s looking for macro registrar time: 0.063 s freevar finding time: 0.133 s region resolution time: 0.111 s type collecting time: 0.077 s variance inference time: 0.565 s coherence checking time: 8.953 s type checking time: 0.176 s const marking time: 0.050 s const checking time: 0.401 s privacy checking time: 0.063 s effect checking time: 0.032 s loop checking time: 1.291 s compute moves time: 0.172 s match checking time: 0.249 s liveness checking time: 0.831 s borrow checking time: 0.121 s kind checking time: 0.013 s reachability checking time: 0.179 s death checking time: 0.503 s lint checking time: 14.385 s translation time: 4.495 s llvm function passes time: 92.234 s llvm module passes time: 51.172 s codegen passes time: 150.809 s LLVM passes time: 2.542 s running linker time: 15.109 s linking ``` BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance. Please review this very closely. This is my first major contribution to Rust. Sorry for the ugly diff!

huonw · 2014-03-17T14:00:37Z

BTW, this reduced rustc's memory use (for building librustc) by ~100MB: http://huonw.github.io/isrustfastyet/mem/#2eebeb8,4d64441

cgaebel · 2014-03-17T15:03:47Z

NICE! Btw is that a CPU usage regression, or just noise?

emberian · 2014-03-17T16:06:18Z

noise. I was probably building another copy of rustc at the time, or
playing a game.

On Mon, Mar 17, 2014 at 11:03 AM, Clark Gaebel [email protected]:

NICE! Btw is that a CPU usage regression, or just noise?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/12081#issuecomment-37825672
.

http://octayn.net/

fix: incorrect suggestions generated by `manual_retain` lint fixes rust-lang#10393, fixes rust-lang#11457, fixes rust-lang#12081 rust-lang#10393: In the current implementation of `manual_retain`, if the argument to the closure is matched using tuple, they are all treated as the result of a call to `map.into_iter().filter(<f>)`. However, such tuple pattern matching can also occur in many different containers that stores tuples internally. The correct approach is to apply different lint policies depending on whether the receiver of `into_iter` is a map or not. rust-lang#11457 and rust-lang#12081: In the current implementation of `manual_retain`, if the argument to the closure is `Binding`, the closure will be used directly in the `retain` method, which will result in incorrect suggestion because the first argument to the `retain` closure may be of a different type. In addition, if the argument to the closure is `Ref + Binding`, the lint will simply remove the `Ref` part and use the `Binding` part as the argument to the new closure, which will lead to bad suggestion for the same reason. The correct approach is to detect each of these cases and apply lint suggestions conservatively. changelog: [`manual_retain`] refactor and add check for various patterns

huonw reviewed Feb 7, 2014
View reviewed changes

huonw reviewed Mar 1, 2014
View reviewed changes

amaranth reviewed Mar 9, 2014
View reviewed changes

bors merged commit 5bdbd21 into rust-lang:master Mar 13, 2014

pczarn mentioned this pull request Jul 16, 2014

Optimized HashMap. RawTable exposes a safe interface. #15720

Closed

ThomasWaldmann mentioned this pull request Jan 28, 2017

WIP: Robin Hood hashing borgbackup/borg#1429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robin hood hashmap #12081

Robin hood hashmap #12081

cgaebel commented Feb 7, 2014

huonw Feb 7, 2014

cgaebel Feb 8, 2014

nikomatsakis commented Mar 1, 2014

huonw Mar 1, 2014

cgaebel Mar 1, 2014

huonw commented Mar 1, 2014

cgaebel commented Mar 1, 2014

emberian commented Mar 2, 2014

cgaebel commented Mar 2, 2014

emberian commented Mar 2, 2014

huonw commented Mar 2, 2014

cgaebel commented Mar 2, 2014

nikomatsakis commented Mar 3, 2014

nikomatsakis commented Mar 3, 2014

amaranth Mar 9, 2014

cgaebel Mar 9, 2014

brson commented Mar 11, 2014

cgaebel commented Mar 11, 2014

alexcrichton commented Mar 12, 2014

cgaebel commented Mar 12, 2014

nikomatsakis commented Mar 12, 2014

cgaebel commented Mar 12, 2014

huonw commented Mar 17, 2014

cgaebel commented Mar 17, 2014

emberian commented Mar 17, 2014

Robin hood hashmap #12081

Robin hood hashmap #12081

Conversation

cgaebel commented Feb 7, 2014

huonw Feb 7, 2014

Choose a reason for hiding this comment

cgaebel Feb 8, 2014

Choose a reason for hiding this comment

nikomatsakis commented Mar 1, 2014

huonw Mar 1, 2014

Choose a reason for hiding this comment

cgaebel Mar 1, 2014

Choose a reason for hiding this comment

huonw commented Mar 1, 2014

cgaebel commented Mar 1, 2014

emberian commented Mar 2, 2014

cgaebel commented Mar 2, 2014

emberian commented Mar 2, 2014

huonw commented Mar 2, 2014

cgaebel commented Mar 2, 2014

nikomatsakis commented Mar 3, 2014

nikomatsakis commented Mar 3, 2014

amaranth Mar 9, 2014

Choose a reason for hiding this comment

cgaebel Mar 9, 2014

Choose a reason for hiding this comment

brson commented Mar 11, 2014

cgaebel commented Mar 11, 2014

alexcrichton commented Mar 12, 2014

cgaebel commented Mar 12, 2014

nikomatsakis commented Mar 12, 2014

cgaebel commented Mar 12, 2014

huonw commented Mar 17, 2014

cgaebel commented Mar 17, 2014

emberian commented Mar 17, 2014