hashmap: Store hashes as usize internally #36595

bluss · 2016-09-20T07:40:53Z

We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.

Fixes #36567

rust-highfive · 2016-09-20T07:41:05Z

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

bluss · 2016-09-20T07:44:04Z

Needs benchmarks and testing on 32-bit.

arthurprs · 2016-09-24T16:07:46Z

-- updated benchmarks bellow

alexcrichton · 2016-09-26T23:18:27Z

@bluss we discussed this briefly at libs triage today and were curious, do you have some representative numbers as well about this change? (or @arthurprs do you have some?) Would be useful to see the comparisons!

I was personally a little worried about losing 32 bits of a 64-bit hash on 32-bit platforms, but so long as we generally recommend a uniform distribution of bits I think it'll work out.

arthurprs · 2016-09-27T09:17:17Z

src/libstd/collections/hash/table.rs

+        // We need to avoid 0 in order to prevent collisions with
+        // EMPTY_HASH. We can maintain our precious uniform distribution
+        // of initial indexes by unconditionally setting the MSB,
+        // effectively reducing 64-bits hashes to 63 bits.


This comment needs updating because of the 32 -> 31?

arthurprs · 2016-09-27T09:30:21Z

Here are some x86 benchmarks. Keep in mind that most of these are <usize, usize> which is the best case for this change (memory footprint is 25% smaller).

What I don't get is why in the hhkkvv layout the iteration becomes slower, I'll try to dig something from the assembly.

-- benchmarks remove look for updates bellow

bluss · 2016-09-27T09:46:15Z

I wanted to know how it influences bootstrap time on both 64-bit and 32-bit, it would be a good metric. I don't have the resources to do the comparison, so I left this here to ask if someone wanted to try it.

Ah, the benchmarks are for kkvv layout (current) and kvkv (your pr)

arthurprs · 2016-09-27T17:17:25Z

I looked at the disassembly but I can't see anything that would make iteration that much slower. From an algorithm POV it makes no sense to me.

alexcrichton · 2016-09-29T00:46:23Z

@arthurprs those numbers seem to be mostly related to performance, but isn't the theory behind this change that it mostly saves memory?

arthurprs · 2016-09-29T08:24:30Z

@alexcrichton sure, I just happened to have these around due to the other PR. And the decrease in cache pressure shows up in these.

As for memory benefits it's easy to calculate as a function of sizeof(K) and sizeof(V) as it doesn't depend on other factors like padding etc.

sz(K)+sz(V)	4	8	16	24	32	64	128	256
(x+4)/(x+8)	0.667	0.75	0.833	0.875	0.9	0.944	0.971	0.985

bluss · 2016-09-29T09:16:59Z

@alexcrichton Smaller data means better performance; more hashes fit into a cache line during lookup. So performance was my main thought.

arthurprs · 2016-09-29T12:35:13Z

Dealing with so many benchmarks is tricky. I found a problem so I'm repeating the benchmarks, will post fixed and expanded updates soon.

arthurprs · 2016-09-29T13:47:04Z

Here's the updated results. And the gist with even more comparisons and better visualization https://gist.github.com/arthurprs/9f28847dceee86bd5cfffcd30d9cd6cc

Code: https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize

➜  hashmap2 git:(layout_usize) ✗ cargo benchcmp hhkkvv_u64:: hhkkvv_usz:: x86.txt
 name                            hhkkvv_u64:: ns/iter  hhkkvv_usz:: ns/iter  diff ns/iter   diff % 
 grow_100_000                    1,288,426             1,347,922                   59,496    4.62% 
 grow_10_000                     1,302,671             1,351,398                   48,727    3.74% 
 grow_big_value_100_000          38,371,161            37,769,065                -602,096   -1.57% 
 grow_big_value_10_000           4,054,619             4,437,890                  383,271    9.45% 
 grow_fnv_10_000                 433,147               380,392                    -52,755  -12.18% 
 insert_100                      5,215                 4,970                         -245   -4.70% 
 insert_1000                     48,039                44,946                      -3,093   -6.44% 
 insert_100_000                  6,843,047             7,122,700                  279,653    4.09% 
 insert_10_000                   532,906               530,855                     -2,051   -0.38% 
 insert_1_000_000                124,166,274           124,684,470                518,196    0.42% 
 insert_int_bigvalue_10_000      1,440,912             1,390,695                  -50,217   -3.49% 
 insert_str_10_000               626,603               595,789                    -30,814   -4.92% 
 insert_string_10_000            1,370,791             1,371,131                      340    0.02% 
 iter_keys_100_000               371,087               383,975                     12,888    3.47% 
 iter_keys_1_000_000             8,448,411             8,579,116                  130,705    1.55% 
 iter_keys_big_value_100_000     352,423               393,528                     41,105   11.66% 
 iter_keys_big_value_1_000_000   8,386,089             8,615,999                  229,910    2.74% 
 iter_values_100_000             392,468               545,712                    153,244   39.05% 
 iter_values_1_000_000           8,797,970             11,673,560               2,875,590   32.68% 
 iterate_100_000                 394,909               552,853                    157,944   40.00% 
 iterate_1_000_000               8,778,111             11,821,943               3,043,832   34.68% 
 lookup_100_000                  328,320               313,771                    -14,549   -4.43% 
 lookup_100_000_bigvalue         319,496               316,429                     -3,067   -0.96% 
 lookup_10_000                   265,687               252,238                    -13,449   -5.06% 
 lookup_10_000_bigvalue          281,491               261,869                    -19,622   -6.97% 
 lookup_10_000_exist             257,790               246,435                    -11,355   -4.40% 
 lookup_10_000_noexist           273,700               273,483                       -217   -0.08% 
 lookup_1_000_000                257,266               251,604                     -5,662   -2.20% 
 lookup_1_000_000_bigvalue       264,313               261,944                     -2,369   -0.90% 
 lookup_1_000_000_bigvalue_unif  684,819               740,230                     55,411    8.09% 
 lookup_1_000_000_unif           598,199               619,402                     21,203    3.54% 
 merge_shuffle                   1,604,293             1,636,942                   32,649    2.04% 
 merge_simple                    61,494,395            42,660,176             -18,834,219  -30.63% 
 new                             9                     9                                0    0.00% 
 with_capacity_10e5              2,537                 1,308                       -1,229  -48.44%

➜  hashmap2 git:(layout_usize) ✗ cargo benchcmp hhkvkv_u64:: hhkvkv_usz:: x86.txt 
 name                            hhkvkv_u64:: ns/iter  hhkvkv_usz:: ns/iter  diff ns/iter   diff % 
 grow_100_000                    1,162,181             1,138,593                  -23,588   -2.03% 
 grow_10_000                     1,158,647             1,084,507                  -74,140   -6.40% 
 grow_big_value_100_000          37,001,012            36,665,075                -335,937   -0.91% 
 grow_big_value_10_000           3,437,647             3,538,101                  100,454    2.92% 
 grow_fnv_10_000                 419,084               349,202                    -69,882  -16.67% 
 insert_100                      4,823                 4,556                         -267   -5.54% 
 insert_1000                     46,000                45,223                        -777   -1.69% 
 insert_100_000                  6,664,378             6,213,870                 -450,508   -6.76% 
 insert_10_000                   534,694               510,209                    -24,485   -4.58% 
 insert_1_000_000                123,740,627           107,358,807            -16,381,820  -13.24% 
 insert_int_bigvalue_10_000      1,651,770             1,539,210                 -112,560   -6.81% 
 insert_str_10_000               599,655               580,417                    -19,238   -3.21% 
 insert_string_10_000            1,382,560             1,363,848                  -18,712   -1.35% 
 iter_keys_100_000               361,889               339,290                    -22,599   -6.24% 
 iter_keys_1_000_000             8,312,419             8,026,248                 -286,171   -3.44% 
 iter_keys_big_value_100_000     526,549               584,094                     57,545   10.93% 
 iter_keys_big_value_1_000_000   9,904,451             9,810,863                  -93,588   -0.94% 
 iter_values_100_000             366,842               338,926                    -27,916   -7.61% 
 iter_values_1_000_000           8,532,860             8,143,969                 -388,891   -4.56% 
 iterate_100_000                 367,828               342,287                    -25,541   -6.94% 
 iterate_1_000_000               8,542,904             8,187,025                 -355,879   -4.17% 
 lookup_100_000                  323,857               307,916                    -15,941   -4.92% 
 lookup_100_000_bigvalue         342,299               323,328                    -18,971   -5.54% 
 lookup_10_000                   262,727               251,569                    -11,158   -4.25% 
 lookup_10_000_bigvalue          282,536               264,107                    -18,429   -6.52% 
 lookup_10_000_exist             252,367               244,794                     -7,573   -3.00% 
 lookup_10_000_noexist           273,349               273,088                       -261   -0.10% 
 lookup_1_000_000                284,377               248,260                    -36,117  -12.70% 
 lookup_1_000_000_bigvalue       268,109               262,584                     -5,525   -2.06% 
 lookup_1_000_000_bigvalue_unif  660,544               687,425                     26,881    4.07% 
 lookup_1_000_000_unif           517,636               535,840                     18,204    3.52% 
 merge_shuffle                   1,488,137             1,400,964                  -87,173   -5.86% 
 merge_simple                    38,786,908            31,456,907              -7,330,001  -18.90% 
 new                             9                     9                                0    0.00% 
 with_capacity_10e5              2,500                 1,350                       -1,150  -46.00%

alexcrichton · 2016-09-30T19:59:19Z

@arthurprs hm some of those benchmarks seem worrisome indicating that iteration gets 40% slower? Is that just an outlier though?

arthurprs · 2016-10-01T15:05:22Z

@alexcrichton yeah, it's super odd (and makes no sense to me) but I can reproduce it every time here. I was hopping somebody else could run the benchmarks.

It's just a matter of checking out these branches (https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize) and running something like
cargo bench hhkkvv::iter --target=i686-unknown-linux-gnu

alexcrichton · 2016-10-03T23:58:52Z

@rfcbot fcp merge

Another neat improvement to our hash tables, and seems like the numbers back it up?

bors · 2016-10-07T07:55:54Z

☔ The latest upstream changes (presumably #36753) made this pull request unmergeable. Please resolve the merge conflicts.

alexbool · 2016-10-12T12:44:01Z

rfcbot stuck?

bluss · 2016-10-12T13:07:31Z

I'm waiting for @arthurprs's hashmap change to merge first before updating. I think the benchmarks said this was a universal win for that memory layout(?), so it should be a simple decision then.

alexcrichton · 2016-10-12T14:12:33Z

Ah yeah we've discussed this as well, so it's ok to r+ when ready to go.

We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This should be an improvement for the size of the data structure on 32-bit platforms. Smaller data means better cache utilization and hopefully better performance.

nnethercote · 2016-10-17T21:36:36Z

Measuring with compare.py from rustc-benchmarks would be useful here.

alexcrichton · 2016-10-31T20:44:06Z

@bluss this is ready to go now, right?

bluss · 2016-10-31T21:02:27Z

It is ready to go. I haven't invested more time in benchmarking this, and that wasn't the plan from the start either; I wanted to simply implement this and give to rustc developers if they were interested. @arthurprs's benchmarks show surprisingly noticeable gains, so I'm happy.

alexcrichton · 2016-10-31T23:18:35Z

@bors: r+

Ok, thanks!

bors · 2016-10-31T23:18:36Z

📌 Commit 13a1f21 has been approved by alexcrichton

bors · 2016-11-01T04:36:39Z

⌛ Testing commit 13a1f21 with merge 265ab65...

hashmap: Store hashes as usize internally We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This should be an improvement for the size of the data structure on 32-bit platforms. Smaller data means better cache utilization and hopefully better performance. Fixes #36567

bors · 2016-11-01T07:44:48Z

arthurprs · 2016-11-01T08:55:39Z

The gains very likely come from smaller hash array memory footprint and cheaper displacement() calculation. The later is in a very hot code path.

Edit: actually the displacement may not affected but the patch is probably saving 64bit arithmetic and freeing an extra register in other places.

rust-highfive assigned brson Sep 20, 2016

bluss mentioned this pull request Sep 20, 2016

Use usize instead of u64 for hashes in HashMap #36567

Closed

arthurprs mentioned this pull request Sep 24, 2016

Cache conscious hashmap table #36692

Merged

arthurprs reviewed Sep 27, 2016

View reviewed changes

bluss force-pushed the hashmap-usize-for-hash branch from 4283919 to 6707668 Compare October 17, 2016 13:50

bluss force-pushed the hashmap-usize-for-hash branch from 6707668 to 13a1f21 Compare October 17, 2016 13:55

bors merged commit 13a1f21 into rust-lang:master Nov 1, 2016

bluss deleted the hashmap-usize-for-hash branch November 1, 2016 08:42

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hashmap: Store hashes as usize internally #36595

hashmap: Store hashes as usize internally #36595

bluss commented Sep 20, 2016

rust-highfive commented Sep 20, 2016

bluss commented Sep 20, 2016

arthurprs commented Sep 24, 2016 •

edited

Loading

alexcrichton commented Sep 26, 2016

arthurprs Sep 27, 2016

arthurprs commented Sep 27, 2016 •

edited

Loading

bluss commented Sep 27, 2016

arthurprs commented Sep 27, 2016

alexcrichton commented Sep 29, 2016

arthurprs commented Sep 29, 2016

bluss commented Sep 29, 2016

arthurprs commented Sep 29, 2016

arthurprs commented Sep 29, 2016 •

edited

Loading

alexcrichton commented Sep 30, 2016

arthurprs commented Oct 1, 2016

alexcrichton commented Oct 3, 2016

bors commented Oct 7, 2016

alexbool commented Oct 12, 2016

bluss commented Oct 12, 2016

alexcrichton commented Oct 12, 2016

nnethercote commented Oct 17, 2016

alexcrichton commented Oct 31, 2016

bluss commented Oct 31, 2016

alexcrichton commented Oct 31, 2016

bors commented Oct 31, 2016

bors commented Nov 1, 2016

bors commented Nov 1, 2016

arthurprs commented Nov 1, 2016 •

edited

Loading

hashmap: Store hashes as usize internally #36595

hashmap: Store hashes as usize internally #36595

Conversation

bluss commented Sep 20, 2016

rust-highfive commented Sep 20, 2016

bluss commented Sep 20, 2016

arthurprs commented Sep 24, 2016 • edited Loading

alexcrichton commented Sep 26, 2016

arthurprs Sep 27, 2016

Choose a reason for hiding this comment

arthurprs commented Sep 27, 2016 • edited Loading

bluss commented Sep 27, 2016

arthurprs commented Sep 27, 2016

alexcrichton commented Sep 29, 2016

arthurprs commented Sep 29, 2016

bluss commented Sep 29, 2016

arthurprs commented Sep 29, 2016

arthurprs commented Sep 29, 2016 • edited Loading

alexcrichton commented Sep 30, 2016

arthurprs commented Oct 1, 2016

alexcrichton commented Oct 3, 2016

bors commented Oct 7, 2016

alexbool commented Oct 12, 2016

bluss commented Oct 12, 2016

alexcrichton commented Oct 12, 2016

nnethercote commented Oct 17, 2016

alexcrichton commented Oct 31, 2016

bluss commented Oct 31, 2016

alexcrichton commented Oct 31, 2016

bors commented Oct 31, 2016

bors commented Nov 1, 2016

bors commented Nov 1, 2016

arthurprs commented Nov 1, 2016 • edited Loading

arthurprs commented Sep 24, 2016 •

edited

Loading

arthurprs commented Sep 27, 2016 •

edited

Loading

arthurprs commented Sep 29, 2016 •

edited

Loading

arthurprs commented Nov 1, 2016 •

edited

Loading