-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hashmap: Store hashes as usize internally #36595
Conversation
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
Needs benchmarks and testing on 32-bit. |
-- updated benchmarks bellow |
@bluss we discussed this briefly at libs triage today and were curious, do you have some representative numbers as well about this change? (or @arthurprs do you have some?) Would be useful to see the comparisons! I was personally a little worried about losing 32 bits of a 64-bit hash on 32-bit platforms, but so long as we generally recommend a uniform distribution of bits I think it'll work out. |
// We need to avoid 0 in order to prevent collisions with | ||
// EMPTY_HASH. We can maintain our precious uniform distribution | ||
// of initial indexes by unconditionally setting the MSB, | ||
// effectively reducing 64-bits hashes to 63 bits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs updating because of the 32 -> 31?
Here are some x86 benchmarks. Keep in mind that most of these are What I don't get is why in the hhkkvv layout the iteration becomes slower, I'll try to dig something from the assembly. -- benchmarks remove look for updates bellow |
I wanted to know how it influences bootstrap time on both 64-bit and 32-bit, it would be a good metric. I don't have the resources to do the comparison, so I left this here to ask if someone wanted to try it. Ah, the benchmarks are for kkvv layout (current) and kvkv (your pr) |
I looked at the disassembly but I can't see anything that would make iteration that much slower. From an algorithm POV it makes no sense to me. |
@arthurprs those numbers seem to be mostly related to performance, but isn't the theory behind this change that it mostly saves memory? |
@alexcrichton sure, I just happened to have these around due to the other PR. And the decrease in cache pressure shows up in these. As for memory benefits it's easy to calculate as a function of sizeof(K) and sizeof(V) as it doesn't depend on other factors like padding etc.
|
@alexcrichton Smaller data means better performance; more hashes fit into a cache line during lookup. So performance was my main thought. |
Dealing with so many benchmarks is tricky. I found a problem so I'm repeating the benchmarks, will post fixed and expanded updates soon. |
Here's the updated results. And the gist with even more comparisons and better visualization https://gist.github.com/arthurprs/9f28847dceee86bd5cfffcd30d9cd6cc Code: https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize
|
@arthurprs hm some of those benchmarks seem worrisome indicating that iteration gets 40% slower? Is that just an outlier though? |
@alexcrichton yeah, it's super odd (and makes no sense to me) but I can reproduce it every time here. I was hopping somebody else could run the benchmarks. It's just a matter of checking out these branches (https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize) and running something like |
@rfcbot fcp merge Another neat improvement to our hash tables, and seems like the numbers back it up? |
☔ The latest upstream changes (presumably #36753) made this pull request unmergeable. Please resolve the merge conflicts. |
rfcbot stuck? |
I'm waiting for @arthurprs's hashmap change to merge first before updating. I think the benchmarks said this was a universal win for that memory layout(?), so it should be a simple decision then. |
Ah yeah we've discussed this as well, so it's ok to r+ when ready to go. |
4283919
to
6707668
Compare
We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This should be an improvement for the size of the data structure on 32-bit platforms. Smaller data means better cache utilization and hopefully better performance.
6707668
to
13a1f21
Compare
Measuring with compare.py from rustc-benchmarks would be useful here. |
@bluss this is ready to go now, right? |
It is ready to go. I haven't invested more time in benchmarking this, and that wasn't the plan from the start either; I wanted to simply implement this and give to rustc developers if they were interested. @arthurprs's benchmarks show surprisingly noticeable gains, so I'm happy. |
@bors: r+ Ok, thanks! |
📌 Commit 13a1f21 has been approved by |
hashmap: Store hashes as usize internally We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This should be an improvement for the size of the data structure on 32-bit platforms. Smaller data means better cache utilization and hopefully better performance. Fixes #36567
The gains very likely come from smaller hash array memory footprint and cheaper displacement() calculation. The later is in a very hot code path. Edit: actually the displacement may not affected but the patch is probably saving 64bit arithmetic and freeing an extra register in other places. |
We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.
Fixes #36567