Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed HashMap's internal layout. Cleanup. #21973

Closed
wants to merge 1 commit into from

Conversation

pczarn
Copy link
Contributor

@pczarn pczarn commented Feb 5, 2015

Changes HashMap's memory layout from [hhhh...KKKK...VVVV...] to [KVKVKVKV...hhhh...]. This makes in-place growth easy to implement (and efficient).

The removal of find_with_or_insert_with has made more cleanup possible.

20 benchmark runs, averaged:

                           before         after
bench::find_existing       40573.50 ns    41227.65 ns
bench::find_nonexisting    41815.45 ns    42362.60 ns
bench::get_remove_insert     197.85 ns      198.60 ns
bench::grow_by_insertion     171.05 ns      154.05 ns
bench::hashmap_as_queue      112.85 ns      112.65 ns
bench::new_drop               79.40 ns       79.20 ns
bench::new_insert_drop       179.40 ns      149.05 ns

thanks to @gankro for the Entry interface, and to @thestinger for improving jemalloc's in-place realloc!
cc @cgaebel
r? @gankro

// middle of a cache line, this strategy pulls in one cache line of hashes on
// most lookups (64-byte cache line with 8-byte hash). I think this choice is
// pretty good, but α could go up to 0.95, or down to 0.84 to trade off some
// space.
//
// > Wait, what? Where did you get 1-α^k from?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated, too.

let size = table.size();
let mut probe = Bucket::new(table, hash);
let mut probe = if let Some(probe) = Bucket::new(table, hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/if let/match

@cgaebel
Copy link
Contributor

cgaebel commented Feb 5, 2015

Can this PR be split up into "In-place growth for HashMap" and "Cleanup"?

@Gankra
Copy link
Contributor

Gankra commented Feb 5, 2015

Just leaving a note: I am concerned about how this change will negatively affect memory consumption for certain choices of K and V.

That said, I think speed is more important than memory consumption, to some limit.

@cgaebel
Copy link
Contributor

cgaebel commented Feb 5, 2015

It will also affect the performance of the "keys" and "values" iterators.

@pnkfelix
Copy link
Member

Don't the posted benchmark results indicate that insertion (grow_by_insertion and new_insert_drop) have become faster at the expense of making lookup (find_existing, find_nonexisting) slower?

(This outcome makes some sense to me, at least for a heavily loaded table, since the unused interleaved values for non-matching keys are going to occupy portions of the cache line when we are doing a probe sequence over a series of keys.)

  • (( Well, maybe this explanation is a little too simplistic; the sequence of hhhh ... at the end i guess indicates that in the common case, we should need to only look at a series of hashcodes before we start inspecting the keys themselves, so its not quite as dire as the above explanation made it out to be. ))

I don't know how to evaluate whether the gain is worth the cost here. I just want to make sure that everyone is on board for this shift (or find out if my interpretation of the results is wrong).

@Gankra
Copy link
Contributor

Gankra commented Feb 12, 2015

Just a note that I believe @pczarn is currently reworking this PR.

@pczarn
Copy link
Contributor Author

pczarn commented Feb 16, 2015

I did the reworking. Some small details still need attention.

It will also affect the performance of the "keys" and "values" iterators.

Sounds bad. Can you find a real world example of this? To keep the performance the same, I could make hashmap use two allocations, [VVVV…hhhh…] and [KKKK…].

Here's a relevant post on data layout: http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/
Unfortunately, it's of little use for us, because it's not about Robin Hood (and for C++, not Rust).

This is how benchmark results have changed:

[40.573, 41.706, 209.1, 179.4, 122.8, 78.4, 182.7]
  Cleanup.
[41.989, 43.615, 212.4, 167.8, 125.0, 78.7, 163.3]
  Changed HashMap's internal layout.
[41.145, 42.298, 206.8, 162.5, 121.7, 78.9, 159.3]
  In-place growth for HashMap.
[41.055, 42.427, 208.3, 156.0, 124.2, 78.4, 161.6]

@pnkfelix: Lookup has become slower after the first commit because of refactoring. Keep in mind that the benchmark does 1000 lookups per iteration and the difference between 41.7ns and 42.4ns per lookup is small.

The improvement from in-place growth is suprisingly low. I'll have to check why.

@cgaebel
Copy link
Contributor

cgaebel commented Feb 16, 2015

@pczarn The performance of key/value iterators is just because with the current design they walk over a compact array, and with your proposed design they eat twice as much cache as they do this. If doing a small per-key or per-value operation, this essentially halves memory bandwidth.

@Gankra
Copy link
Contributor

Gankra commented Feb 16, 2015

Well strictly speaking it's already walking over the hashes checking for hash != 0 at the same time.

@Gankra
Copy link
Contributor

Gankra commented Feb 21, 2015

Oh shoot, I let this slip through the cracks. Needs a rebase (hopefully the last one, since we should be done with crazy API churn).

@pczarn
Copy link
Contributor Author

pczarn commented Feb 21, 2015

Updated. I'm going to test and make corrections when new snapshots land.

So iteration over small keys/values is already 2x-4x more cache-intensive than in an array. With larger values, like in HashMap<usize, (String, String)>, it gets much worse.

To avoid the issue, keys/values can be stored in an array such as [([K; 16], [V; 16]); n].

@pczarn
Copy link
Contributor Author

pczarn commented Feb 27, 2015

Done, tested.

@Gankra
Copy link
Contributor

Gankra commented Feb 27, 2015

Great! Will review tonight.

@Gankra
Copy link
Contributor

Gankra commented Feb 27, 2015

Ack, a few of my issues are addressed by later commits. Doing this commit-by-commit isn't the right strategy here. Shifting gears.

// inform rustc that in fact instances of K and V are reachable from here.
marker: marker::PhantomData<(K,V)>,
// NB. The table will probably need manual impls of Send and Sync if this
// field ever changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that needs saying? That's true for pretty much all Uniques.

@bors
Copy link
Contributor

bors commented Apr 22, 2015

☔ The latest upstream changes (presumably #24674) made this pull request unmergeable. Please resolve the merge conflicts.

* use of NonZero hashes
* refactoring
* correct explanation of the load factor
* better nomenclature
* 'probe distance' -> 'displacement'
@Manishearth
Copy link
Member

@gankro (this was rebased and needs review)

@Gankra
Copy link
Contributor

Gankra commented May 10, 2015

@Manishearth Yes I tried to get someone else to review over two months ago. :/

Today's that last day of my pseudo-vacation so I'm free to tackle this again this week.

@Gankra
Copy link
Contributor

Gankra commented May 12, 2015

Alright I've started just reading the full source at the last commit's hash just because the changes have been so comprehensive that it borders on a rewrite.

Was Deref vs Borrow ever fully addressed? Borrow is generally just a trait for Deref-like+Hash/Eq/Ord equivalence.

pub fn into_bucket(self) -> Bucket<K, V, M> {
/// Duplicates the current position. This can be useful for operations
/// on two or more buckets.
pub fn stash(self) -> Bucket<K, V, Bucket<K, V, M, S>, S> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about -> Bucket<K, V, Self, S>?

@Gankra
Copy link
Contributor

Gankra commented May 19, 2015

So, to the best of my knowledge this code seems to be correct, but I have strong reservations about where we're going with respect to the proliferation of type-complexity. I used to be able to grok this code pretty ok: We have HashMap, RawTable, and some Buckets. Now everything's super generic over some "M" type and there's Partial* types all over. It's not clear that these abstractions are pulling their weight: are they preventing real bugs or enabling simpler or more maintainable code?

CC @nikomatsakis @huonw @cgaebel

@cgaebel
Copy link
Contributor

cgaebel commented May 19, 2015

I share that sentiment.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 26, 2015
@Gankra
Copy link
Contributor

Gankra commented May 28, 2015

It's been a couple weeks with no response on any of my comments, and I'm not a huge fan of the general design changes. As such I'm closing this for now. We can continue discussion on the PR and maybe re-open if we get somewhere.

@Gankra Gankra closed this May 28, 2015
sophiajt pushed a commit to sophiajt/rust that referenced this pull request Oct 11, 2016
…chton

Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at rust-lang#21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```
bors added a commit that referenced this pull request Oct 14, 2016
Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants