Changed HashMap's internal layout. Cleanup. #21973

pczarn · 2015-02-05T18:24:03Z

Changes HashMap's memory layout from [hhhh...KKKK...VVVV...] to [KVKVKVKV...hhhh...]. This makes in-place growth easy to implement (and efficient).

The removal of find_with_or_insert_with has made more cleanup possible.

20 benchmark runs, averaged:

                           before         after
bench::find_existing       40573.50 ns    41227.65 ns
bench::find_nonexisting    41815.45 ns    42362.60 ns
bench::get_remove_insert     197.85 ns      198.60 ns
bench::grow_by_insertion     171.05 ns      154.05 ns
bench::hashmap_as_queue      112.85 ns      112.65 ns
bench::new_drop               79.40 ns       79.20 ns
bench::new_insert_drop       179.40 ns      149.05 ns

thanks to @gankro for the Entry interface, and to @thestinger for improving jemalloc's in-place realloc!
cc @cgaebel
r? @gankro

cgaebel · 2015-02-05T18:27:08Z

src/libstd/collections/hash/map.rs

+// middle of a cache line, this strategy pulls in one cache line of hashes on
+// most lookups (64-byte cache line with 8-byte hash). I think this choice is
+// pretty good, but α could go up to 0.95, or down to 0.84 to trade off some
+// space.
 //
 // > Wait, what? Where did you get 1-α^k from?


This should be updated, too.

cgaebel · 2015-02-05T20:46:14Z

src/libstd/collections/hash/map.rs

    let size = table.size();
-    let mut probe = Bucket::new(table, hash);
+    let mut probe = if let Some(probe) = Bucket::new(table, hash) {


s/if let/match

cgaebel · 2015-02-05T20:51:29Z

Can this PR be split up into "In-place growth for HashMap" and "Cleanup"?

Gankra · 2015-02-05T21:05:15Z

Just leaving a note: I am concerned about how this change will negatively affect memory consumption for certain choices of K and V.

That said, I think speed is more important than memory consumption, to some limit.

cgaebel · 2015-02-05T21:13:08Z

It will also affect the performance of the "keys" and "values" iterators.

pnkfelix · 2015-02-12T13:18:49Z

Don't the posted benchmark results indicate that insertion (grow_by_insertion and new_insert_drop) have become faster at the expense of making lookup (find_existing, find_nonexisting) slower?

(This outcome makes some sense to me, at least for a heavily loaded table, since the unused interleaved values for non-matching keys are going to occupy portions of the cache line when we are doing a probe sequence over a series of keys.)

(( Well, maybe this explanation is a little too simplistic; the sequence of hhhh ... at the end i guess indicates that in the common case, we should need to only look at a series of hashcodes before we start inspecting the keys themselves, so its not quite as dire as the above explanation made it out to be. ))

I don't know how to evaluate whether the gain is worth the cost here. I just want to make sure that everyone is on board for this shift (or find out if my interpretation of the results is wrong).

Gankra · 2015-02-12T13:31:46Z

Just a note that I believe @pczarn is currently reworking this PR.

pczarn · 2015-02-16T13:04:51Z

I did the reworking. Some small details still need attention.

It will also affect the performance of the "keys" and "values" iterators.

Sounds bad. Can you find a real world example of this? To keep the performance the same, I could make hashmap use two allocations, [VVVV…hhhh…] and [KKKK…].

Here's a relevant post on data layout: http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/
Unfortunately, it's of little use for us, because it's not about Robin Hood (and for C++, not Rust).

This is how benchmark results have changed:

[40.573, 41.706, 209.1, 179.4, 122.8, 78.4, 182.7]
  Cleanup.
[41.989, 43.615, 212.4, 167.8, 125.0, 78.7, 163.3]
  Changed HashMap's internal layout.
[41.145, 42.298, 206.8, 162.5, 121.7, 78.9, 159.3]
  In-place growth for HashMap.
[41.055, 42.427, 208.3, 156.0, 124.2, 78.4, 161.6]

@pnkfelix: Lookup has become slower after the first commit because of refactoring. Keep in mind that the benchmark does 1000 lookups per iteration and the difference between 41.7ns and 42.4ns per lookup is small.

The improvement from in-place growth is suprisingly low. I'll have to check why.

cgaebel · 2015-02-16T15:11:21Z

@pczarn The performance of key/value iterators is just because with the current design they walk over a compact array, and with your proposed design they eat twice as much cache as they do this. If doing a small per-key or per-value operation, this essentially halves memory bandwidth.

Gankra · 2015-02-16T16:58:25Z

Well strictly speaking it's already walking over the hashes checking for hash != 0 at the same time.

Gankra · 2015-02-21T04:50:59Z

Oh shoot, I let this slip through the cracks. Needs a rebase (hopefully the last one, since we should be done with crazy API churn).

pczarn · 2015-02-21T23:09:40Z

Updated. I'm going to test and make corrections when new snapshots land.

So iteration over small keys/values is already 2x-4x more cache-intensive than in an array. With larger values, like in HashMap<usize, (String, String)>, it gets much worse.

To avoid the issue, keys/values can be stored in an array such as [([K; 16], [V; 16]); n].

pczarn · 2015-02-27T17:23:31Z

Done, tested.

Gankra · 2015-02-27T17:35:15Z

Great! Will review tonight.

Gankra · 2015-02-27T23:20:26Z

Ack, a few of my issues are addressed by later commits. Doing this commit-by-commit isn't the right strategy here. Shifting gears.

Gankra · 2015-02-27T23:37:14Z

src/libstd/collections/hash/table.rs

-    // inform rustc that in fact instances of K and V are reachable from here.
-    marker:   marker::PhantomData<(K,V)>,
+    // NB. The table will probably need manual impls of Send and Sync if this
+    // field ever changes.


I don't think that needs saying? That's true for pretty much all Uniques.

bors · 2015-04-22T06:44:34Z

☔ The latest upstream changes (presumably #24674) made this pull request unmergeable. Please resolve the merge conflicts.

* use of NonZero hashes * refactoring * correct explanation of the load factor * better nomenclature * 'probe distance' -> 'displacement'

Manishearth · 2015-05-10T10:33:36Z

@gankro (this was rebased and needs review)

Gankra · 2015-05-10T19:33:59Z

@Manishearth Yes I tried to get someone else to review over two months ago. :/

Today's that last day of my pseudo-vacation so I'm free to tackle this again this week.

Gankra · 2015-05-12T15:14:41Z

Alright I've started just reading the full source at the last commit's hash just because the changes have been so comprehensive that it borders on a rewrite.

Was Deref vs Borrow ever fully addressed? Borrow is generally just a trait for Deref-like+Hash/Eq/Ord equivalence.

Gankra · 2015-05-12T17:24:51Z

src/libstd/collections/hash/table.rs

-    pub fn into_bucket(self) -> Bucket<K, V, M> {
+    /// Duplicates the current position. This can be useful for operations
+    /// on two or more buckets.
+    pub fn stash(self) -> Bucket<K, V, Bucket<K, V, M, S>, S> {


What about -> Bucket<K, V, Self, S>?

Gankra · 2015-05-19T17:58:58Z

So, to the best of my knowledge this code seems to be correct, but I have strong reservations about where we're going with respect to the proliferation of type-complexity. I used to be able to grok this code pretty ok: We have HashMap, RawTable, and some Buckets. Now everything's super generic over some "M" type and there's Partial* types all over. It's not clear that these abstractions are pulling their weight: are they preventing real bugs or enabling simpler or more maintainable code?

CC @nikomatsakis @huonw @cgaebel

cgaebel · 2015-05-19T23:20:14Z

I share that sentiment.

Gankra · 2015-05-28T20:45:09Z

It's been a couple weeks with no response on any of my comments, and I'm not a huge fan of the general design changes. As such I'm closing this for now. We can continue discussion on the PR and maybe re-open if we get somewhere.

…chton Cache conscious hashmap table Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at rust-lang#21973. This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged. **Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout). **Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions. Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/ The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower. Total wasted padding between items (C being the capacity of the table). * Old layout: C * (K-K padding) + C * (V-V padding) * Proposed: C * (K-V padding) + C * (V-K padding) In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_). Starting from the worst case the memory overhead is: * `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*) * `HashMap<u64, u16>` 33% memory overhead. * `HashMap<u64, u32>` 20% memory overhead. * `HashMap<T, T>` 0% memory overhead * Worst case based on sizeof K + sizeof V: | x | 16 | 24 | 32 | 64 | 128 | |----------------|--------|--------|--------|-------|-------| | (8+x+7)/(8+x) | 1.29 | 1.22 | 1.18 | 1.1 | 1.05 | I've a test repo here to run benchmarks https://github.com/arthurprs/hashmap2/tree/layout ``` ➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff % grow_10_000 922,064 783,933 -138,131 -14.98% grow_big_value_10_000 1,901,909 1,171,862 -730,047 -38.38% grow_fnv_10_000 443,544 418,674 -24,870 -5.61% insert_100 2,469 2,342 -127 -5.14% insert_1000 23,331 21,536 -1,795 -7.69% insert_100_000 4,748,048 3,764,305 -983,743 -20.72% insert_10_000 321,744 290,126 -31,618 -9.83% insert_int_bigvalue_10_000 749,764 407,547 -342,217 -45.64% insert_str_10_000 337,425 334,009 -3,416 -1.01% insert_string_10_000 788,667 788,262 -405 -0.05% iter_keys_100_000 394,484 374,161 -20,323 -5.15% iter_keys_big_value_100_000 402,071 620,810 218,739 54.40% iter_values_100_000 424,794 373,004 -51,790 -12.19% iterate_100_000 424,297 389,950 -34,347 -8.10% lookup_100_000 189,997 186,554 -3,443 -1.81% lookup_100_000_bigvalue 192,509 189,695 -2,814 -1.46% lookup_10_000 154,251 145,731 -8,520 -5.52% lookup_10_000_bigvalue 162,315 146,527 -15,788 -9.73% lookup_10_000_exist 132,769 128,922 -3,847 -2.90% lookup_10_000_noexist 146,880 144,504 -2,376 -1.62% lookup_1_000_000 137,167 132,260 -4,907 -3.58% lookup_1_000_000_bigvalue 141,130 134,371 -6,759 -4.79% lookup_1_000_000_bigvalue_unif 567,235 481,272 -85,963 -15.15% lookup_1_000_000_unif 589,391 453,576 -135,815 -23.04% merge_shuffle 1,253,357 1,207,387 -45,970 -3.67% merge_simple 40,264,690 37,996,903 -2,267,787 -5.63% new 6 5 -1 -16.67% with_capacity_10e5 3,214 3,256 42 1.31% ``` ``` ➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff % iter_keys_100_000 391,677 382,839 -8,838 -2.26% iter_keys_1_000_000 10,797,360 10,209,898 -587,462 -5.44% iter_keys_big_value_100_000 414,736 662,255 247,519 59.68% iter_keys_big_value_1_000_000 10,147,837 12,067,938 1,920,101 18.92% iter_values_100_000 440,445 377,080 -63,365 -14.39% iter_values_1_000_000 10,931,844 9,979,173 -952,671 -8.71% iterate_100_000 428,644 388,509 -40,135 -9.36% iterate_1_000_000 11,065,419 10,042,427 -1,022,992 -9.24% ```

Cache conscious hashmap table Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973. This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged. **Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout). **Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions. Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/ The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower. Total wasted padding between items (C being the capacity of the table). * Old layout: C * (K-K padding) + C * (V-V padding) * Proposed: C * (K-V padding) + C * (V-K padding) In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_). Starting from the worst case the memory overhead is: * `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*) * `HashMap<u64, u16>` 33% memory overhead. * `HashMap<u64, u32>` 20% memory overhead. * `HashMap<T, T>` 0% memory overhead * Worst case based on sizeof K + sizeof V: | x | 16 | 24 | 32 | 64 | 128 | |----------------|--------|--------|--------|-------|-------| | (8+x+7)/(8+x) | 1.29 | 1.22 | 1.18 | 1.1 | 1.05 | I've a test repo here to run benchmarks https://github.com/arthurprs/hashmap2/tree/layout ``` ➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff % grow_10_000 922,064 783,933 -138,131 -14.98% grow_big_value_10_000 1,901,909 1,171,862 -730,047 -38.38% grow_fnv_10_000 443,544 418,674 -24,870 -5.61% insert_100 2,469 2,342 -127 -5.14% insert_1000 23,331 21,536 -1,795 -7.69% insert_100_000 4,748,048 3,764,305 -983,743 -20.72% insert_10_000 321,744 290,126 -31,618 -9.83% insert_int_bigvalue_10_000 749,764 407,547 -342,217 -45.64% insert_str_10_000 337,425 334,009 -3,416 -1.01% insert_string_10_000 788,667 788,262 -405 -0.05% iter_keys_100_000 394,484 374,161 -20,323 -5.15% iter_keys_big_value_100_000 402,071 620,810 218,739 54.40% iter_values_100_000 424,794 373,004 -51,790 -12.19% iterate_100_000 424,297 389,950 -34,347 -8.10% lookup_100_000 189,997 186,554 -3,443 -1.81% lookup_100_000_bigvalue 192,509 189,695 -2,814 -1.46% lookup_10_000 154,251 145,731 -8,520 -5.52% lookup_10_000_bigvalue 162,315 146,527 -15,788 -9.73% lookup_10_000_exist 132,769 128,922 -3,847 -2.90% lookup_10_000_noexist 146,880 144,504 -2,376 -1.62% lookup_1_000_000 137,167 132,260 -4,907 -3.58% lookup_1_000_000_bigvalue 141,130 134,371 -6,759 -4.79% lookup_1_000_000_bigvalue_unif 567,235 481,272 -85,963 -15.15% lookup_1_000_000_unif 589,391 453,576 -135,815 -23.04% merge_shuffle 1,253,357 1,207,387 -45,970 -3.67% merge_simple 40,264,690 37,996,903 -2,267,787 -5.63% new 6 5 -1 -16.67% with_capacity_10e5 3,214 3,256 42 1.31% ``` ``` ➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff % iter_keys_100_000 391,677 382,839 -8,838 -2.26% iter_keys_1_000_000 10,797,360 10,209,898 -587,462 -5.44% iter_keys_big_value_100_000 414,736 662,255 247,519 59.68% iter_keys_big_value_1_000_000 10,147,837 12,067,938 1,920,101 18.92% iter_values_100_000 440,445 377,080 -63,365 -14.39% iter_values_1_000_000 10,931,844 9,979,173 -952,671 -8.71% iterate_100_000 428,644 388,509 -40,135 -9.36% iterate_1_000_000 11,065,419 10,042,427 -1,022,992 -9.24% ```

rust-highfive assigned Gankra Feb 5, 2015

pczarn force-pushed the hash_map-mem-layout branch from dc6bc2d to 64e8b91 Compare February 5, 2015 18:26

cgaebel reviewed Feb 5, 2015
View reviewed changes

pczarn force-pushed the hash_map-mem-layout branch from 64e8b91 to 407572c Compare February 5, 2015 18:30

cgaebel reviewed Feb 5, 2015
View reviewed changes

pczarn force-pushed the hash_map-mem-layout branch from 49bfe41 to f495fb8 Compare February 7, 2015 18:55

pczarn force-pushed the hash_map-mem-layout branch from f495fb8 to 6218462 Compare February 16, 2015 12:53

pczarn force-pushed the hash_map-mem-layout branch from 6218462 to 316b300 Compare February 21, 2015 18:01

pczarn force-pushed the hash_map-mem-layout branch from 627837f to edcddb1 Compare February 26, 2015 18:19

pczarn mentioned this pull request Feb 26, 2015

llvm::StructType::getElementType: Assertion `N < NumContainedTys && "Element number out of range!"' failed. #21721

Closed

pczarn force-pushed the hash_map-mem-layout branch from edcddb1 to b7317cd Compare February 27, 2015 17:23

Gankra reviewed Feb 27, 2015
View reviewed changes

Cleanup. Changed HashMap's internal layout.

776d23e

* use of NonZero hashes * refactoring * correct explanation of the load factor * better nomenclature * 'probe distance' -> 'displacement'

pczarn force-pushed the hash_map-mem-layout branch from 3f67ec1 to 776d23e Compare April 29, 2015 10:33

Gankra reviewed May 12, 2015
View reviewed changes

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 26, 2015

Gankra closed this May 28, 2015

This was referenced Sep 20, 2016

Use usize instead of u64 for hashes in HashMap #36567

Closed

Revisit HashMap memory layout #36660

Closed

Cache conscious hashmap table #36692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed HashMap's internal layout. Cleanup. #21973

Changed HashMap's internal layout. Cleanup. #21973

pczarn commented Feb 5, 2015

cgaebel Feb 5, 2015

cgaebel Feb 5, 2015

cgaebel commented Feb 5, 2015

Gankra commented Feb 5, 2015

cgaebel commented Feb 5, 2015

pnkfelix commented Feb 12, 2015

Gankra commented Feb 12, 2015

pczarn commented Feb 16, 2015

cgaebel commented Feb 16, 2015

Gankra commented Feb 16, 2015

Gankra commented Feb 21, 2015

pczarn commented Feb 21, 2015

pczarn commented Feb 27, 2015

Gankra commented Feb 27, 2015

Gankra commented Feb 27, 2015

Gankra Feb 27, 2015

bors commented Apr 22, 2015

Manishearth commented May 10, 2015

Gankra commented May 10, 2015

Gankra commented May 12, 2015

Gankra May 12, 2015

Gankra commented May 19, 2015

cgaebel commented May 19, 2015

Gankra commented May 28, 2015

Changed HashMap's internal layout. Cleanup. #21973

Changed HashMap's internal layout. Cleanup. #21973

Conversation

pczarn commented Feb 5, 2015

cgaebel Feb 5, 2015

Choose a reason for hiding this comment

cgaebel Feb 5, 2015

Choose a reason for hiding this comment

cgaebel commented Feb 5, 2015

Gankra commented Feb 5, 2015

cgaebel commented Feb 5, 2015

pnkfelix commented Feb 12, 2015

Gankra commented Feb 12, 2015

pczarn commented Feb 16, 2015

cgaebel commented Feb 16, 2015

Gankra commented Feb 16, 2015

Gankra commented Feb 21, 2015

pczarn commented Feb 21, 2015

pczarn commented Feb 27, 2015

Gankra commented Feb 27, 2015

Gankra commented Feb 27, 2015

Gankra Feb 27, 2015

Choose a reason for hiding this comment

bors commented Apr 22, 2015

Manishearth commented May 10, 2015

Gankra commented May 10, 2015

Gankra commented May 12, 2015

Gankra May 12, 2015

Choose a reason for hiding this comment

Gankra commented May 19, 2015

cgaebel commented May 19, 2015

Gankra commented May 28, 2015