Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashmap: Store hashes as usize internally #36595

Merged
merged 1 commit into from
Nov 1, 2016

Conversation

bluss
Copy link
Member

@bluss bluss commented Sep 20, 2016

We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.

Fixes #36567

@rust-highfive
Copy link
Collaborator

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@bluss
Copy link
Member Author

bluss commented Sep 20, 2016

Needs benchmarks and testing on 32-bit.

@arthurprs
Copy link
Contributor

arthurprs commented Sep 24, 2016

-- updated benchmarks bellow

@alexcrichton
Copy link
Member

@bluss we discussed this briefly at libs triage today and were curious, do you have some representative numbers as well about this change? (or @arthurprs do you have some?) Would be useful to see the comparisons!

I was personally a little worried about losing 32 bits of a 64-bit hash on 32-bit platforms, but so long as we generally recommend a uniform distribution of bits I think it'll work out.

// We need to avoid 0 in order to prevent collisions with
// EMPTY_HASH. We can maintain our precious uniform distribution
// of initial indexes by unconditionally setting the MSB,
// effectively reducing 64-bits hashes to 63 bits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs updating because of the 32 -> 31?

@arthurprs
Copy link
Contributor

arthurprs commented Sep 27, 2016

Here are some x86 benchmarks. Keep in mind that most of these are <usize, usize> which is the best case for this change (memory footprint is 25% smaller).

What I don't get is why in the hhkkvv layout the iteration becomes slower, I'll try to dig something from the assembly.

-- benchmarks remove look for updates bellow

@bluss
Copy link
Member Author

bluss commented Sep 27, 2016

I wanted to know how it influences bootstrap time on both 64-bit and 32-bit, it would be a good metric. I don't have the resources to do the comparison, so I left this here to ask if someone wanted to try it.

Ah, the benchmarks are for kkvv layout (current) and kvkv (your pr)

@arthurprs
Copy link
Contributor

I looked at the disassembly but I can't see anything that would make iteration that much slower. From an algorithm POV it makes no sense to me.

@alexcrichton
Copy link
Member

@arthurprs those numbers seem to be mostly related to performance, but isn't the theory behind this change that it mostly saves memory?

@arthurprs
Copy link
Contributor

@alexcrichton sure, I just happened to have these around due to the other PR. And the decrease in cache pressure shows up in these.

As for memory benefits it's easy to calculate as a function of sizeof(K) and sizeof(V) as it doesn't depend on other factors like padding etc.

sz(K)+sz(V) 4 8 16 24 32 64 128 256
(x+4)/(x+8) 0.667 0.75 0.833 0.875 0.9 0.944 0.971 0.985

@bluss
Copy link
Member Author

bluss commented Sep 29, 2016

@alexcrichton Smaller data means better performance; more hashes fit into a cache line during lookup. So performance was my main thought.

@arthurprs
Copy link
Contributor

Dealing with so many benchmarks is tricky. I found a problem so I'm repeating the benchmarks, will post fixed and expanded updates soon.

@arthurprs
Copy link
Contributor

arthurprs commented Sep 29, 2016

Here's the updated results. And the gist with even more comparisons and better visualization https://gist.github.com/arthurprs/9f28847dceee86bd5cfffcd30d9cd6cc

Code: https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize

➜  hashmap2 git:(layout_usize) ✗ cargo benchcmp hhkkvv_u64:: hhkkvv_usz:: x86.txt
 name                            hhkkvv_u64:: ns/iter  hhkkvv_usz:: ns/iter  diff ns/iter   diff % 
 grow_100_000                    1,288,426             1,347,922                   59,496    4.62% 
 grow_10_000                     1,302,671             1,351,398                   48,727    3.74% 
 grow_big_value_100_000          38,371,161            37,769,065                -602,096   -1.57% 
 grow_big_value_10_000           4,054,619             4,437,890                  383,271    9.45% 
 grow_fnv_10_000                 433,147               380,392                    -52,755  -12.18% 
 insert_100                      5,215                 4,970                         -245   -4.70% 
 insert_1000                     48,039                44,946                      -3,093   -6.44% 
 insert_100_000                  6,843,047             7,122,700                  279,653    4.09% 
 insert_10_000                   532,906               530,855                     -2,051   -0.38% 
 insert_1_000_000                124,166,274           124,684,470                518,196    0.42% 
 insert_int_bigvalue_10_000      1,440,912             1,390,695                  -50,217   -3.49% 
 insert_str_10_000               626,603               595,789                    -30,814   -4.92% 
 insert_string_10_000            1,370,791             1,371,131                      340    0.02% 
 iter_keys_100_000               371,087               383,975                     12,888    3.47% 
 iter_keys_1_000_000             8,448,411             8,579,116                  130,705    1.55% 
 iter_keys_big_value_100_000     352,423               393,528                     41,105   11.66% 
 iter_keys_big_value_1_000_000   8,386,089             8,615,999                  229,910    2.74% 
 iter_values_100_000             392,468               545,712                    153,244   39.05% 
 iter_values_1_000_000           8,797,970             11,673,560               2,875,590   32.68% 
 iterate_100_000                 394,909               552,853                    157,944   40.00% 
 iterate_1_000_000               8,778,111             11,821,943               3,043,832   34.68% 
 lookup_100_000                  328,320               313,771                    -14,549   -4.43% 
 lookup_100_000_bigvalue         319,496               316,429                     -3,067   -0.96% 
 lookup_10_000                   265,687               252,238                    -13,449   -5.06% 
 lookup_10_000_bigvalue          281,491               261,869                    -19,622   -6.97% 
 lookup_10_000_exist             257,790               246,435                    -11,355   -4.40% 
 lookup_10_000_noexist           273,700               273,483                       -217   -0.08% 
 lookup_1_000_000                257,266               251,604                     -5,662   -2.20% 
 lookup_1_000_000_bigvalue       264,313               261,944                     -2,369   -0.90% 
 lookup_1_000_000_bigvalue_unif  684,819               740,230                     55,411    8.09% 
 lookup_1_000_000_unif           598,199               619,402                     21,203    3.54% 
 merge_shuffle                   1,604,293             1,636,942                   32,649    2.04% 
 merge_simple                    61,494,395            42,660,176             -18,834,219  -30.63% 
 new                             9                     9                                0    0.00% 
 with_capacity_10e5              2,537                 1,308                       -1,229  -48.44% 
➜  hashmap2 git:(layout_usize) ✗ cargo benchcmp hhkvkv_u64:: hhkvkv_usz:: x86.txt 
 name                            hhkvkv_u64:: ns/iter  hhkvkv_usz:: ns/iter  diff ns/iter   diff % 
 grow_100_000                    1,162,181             1,138,593                  -23,588   -2.03% 
 grow_10_000                     1,158,647             1,084,507                  -74,140   -6.40% 
 grow_big_value_100_000          37,001,012            36,665,075                -335,937   -0.91% 
 grow_big_value_10_000           3,437,647             3,538,101                  100,454    2.92% 
 grow_fnv_10_000                 419,084               349,202                    -69,882  -16.67% 
 insert_100                      4,823                 4,556                         -267   -5.54% 
 insert_1000                     46,000                45,223                        -777   -1.69% 
 insert_100_000                  6,664,378             6,213,870                 -450,508   -6.76% 
 insert_10_000                   534,694               510,209                    -24,485   -4.58% 
 insert_1_000_000                123,740,627           107,358,807            -16,381,820  -13.24% 
 insert_int_bigvalue_10_000      1,651,770             1,539,210                 -112,560   -6.81% 
 insert_str_10_000               599,655               580,417                    -19,238   -3.21% 
 insert_string_10_000            1,382,560             1,363,848                  -18,712   -1.35% 
 iter_keys_100_000               361,889               339,290                    -22,599   -6.24% 
 iter_keys_1_000_000             8,312,419             8,026,248                 -286,171   -3.44% 
 iter_keys_big_value_100_000     526,549               584,094                     57,545   10.93% 
 iter_keys_big_value_1_000_000   9,904,451             9,810,863                  -93,588   -0.94% 
 iter_values_100_000             366,842               338,926                    -27,916   -7.61% 
 iter_values_1_000_000           8,532,860             8,143,969                 -388,891   -4.56% 
 iterate_100_000                 367,828               342,287                    -25,541   -6.94% 
 iterate_1_000_000               8,542,904             8,187,025                 -355,879   -4.17% 
 lookup_100_000                  323,857               307,916                    -15,941   -4.92% 
 lookup_100_000_bigvalue         342,299               323,328                    -18,971   -5.54% 
 lookup_10_000                   262,727               251,569                    -11,158   -4.25% 
 lookup_10_000_bigvalue          282,536               264,107                    -18,429   -6.52% 
 lookup_10_000_exist             252,367               244,794                     -7,573   -3.00% 
 lookup_10_000_noexist           273,349               273,088                       -261   -0.10% 
 lookup_1_000_000                284,377               248,260                    -36,117  -12.70% 
 lookup_1_000_000_bigvalue       268,109               262,584                     -5,525   -2.06% 
 lookup_1_000_000_bigvalue_unif  660,544               687,425                     26,881    4.07% 
 lookup_1_000_000_unif           517,636               535,840                     18,204    3.52% 
 merge_shuffle                   1,488,137             1,400,964                  -87,173   -5.86% 
 merge_simple                    38,786,908            31,456,907              -7,330,001  -18.90% 
 new                             9                     9                                0    0.00% 
 with_capacity_10e5              2,500                 1,350                       -1,150  -46.00%

@alexcrichton
Copy link
Member

@arthurprs hm some of those benchmarks seem worrisome indicating that iteration gets 40% slower? Is that just an outlier though?

@arthurprs
Copy link
Contributor

@alexcrichton yeah, it's super odd (and makes no sense to me) but I can reproduce it every time here. I was hopping somebody else could run the benchmarks.

It's just a matter of checking out these branches (https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize) and running something like
cargo bench hhkkvv::iter --target=i686-unknown-linux-gnu

@alexcrichton
Copy link
Member

@rfcbot fcp merge

Another neat improvement to our hash tables, and seems like the numbers back it up?

@bors
Copy link
Contributor

bors commented Oct 7, 2016

☔ The latest upstream changes (presumably #36753) made this pull request unmergeable. Please resolve the merge conflicts.

@alexbool
Copy link
Contributor

rfcbot stuck?

@bluss
Copy link
Member Author

bluss commented Oct 12, 2016

I'm waiting for @arthurprs's hashmap change to merge first before updating. I think the benchmarks said this was a universal win for that memory layout(?), so it should be a simple decision then.

@alexcrichton
Copy link
Member

Ah yeah we've discussed this as well, so it's ok to r+ when ready to go.

We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.
@nnethercote
Copy link
Contributor

Measuring with compare.py from rustc-benchmarks would be useful here.

@alexcrichton
Copy link
Member

@bluss this is ready to go now, right?

@bluss
Copy link
Member Author

bluss commented Oct 31, 2016

It is ready to go. I haven't invested more time in benchmarking this, and that wasn't the plan from the start either; I wanted to simply implement this and give to rustc developers if they were interested. @arthurprs's benchmarks show surprisingly noticeable gains, so I'm happy.

@alexcrichton
Copy link
Member

@bors: r+

Ok, thanks!

@bors
Copy link
Contributor

bors commented Oct 31, 2016

📌 Commit 13a1f21 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Nov 1, 2016

⌛ Testing commit 13a1f21 with merge 265ab65...

bors added a commit that referenced this pull request Nov 1, 2016
hashmap: Store hashes as usize internally

We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.

Fixes #36567
@bors bors merged commit 13a1f21 into rust-lang:master Nov 1, 2016
@bluss bluss deleted the hashmap-usize-for-hash branch November 1, 2016 08:42
@arthurprs
Copy link
Contributor

arthurprs commented Nov 1, 2016

The gains very likely come from smaller hash array memory footprint and cheaper displacement() calculation. The later is in a very hot code path.

Edit: actually the displacement may not affected but the patch is probably saving 64bit arithmetic and freeing an extra register in other places.

@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants