-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Optimize HashMap
: bucketed approach
#4603
base: master
Are you sure you want to change the base?
Conversation
HashMap
HashMap
: bucketed approach
I've been doing some testing on varying the bucket count with the actual size of the hashmap for the 75% filled test. I've found that increasing the bucket size actually tends to be more space effective than increasing the size of the overall hashmap. For example, the 75% test passes with:
(tested in increments of 200 length). Each of these are minimal, so decreasing the bucket or map len (by the 200 increment at least) in any of these cases would cause the test to fail. Also note that this is very dependent on how many hash collisions there are. This example uses the |
This unpredictability in what bucket size is needed adds to the general unpredictability of what size is needed for the overall hashmap. Increasing the bucket length ever more makes it more likely a hashmap of len N can actually store closer to N items, but increasing the bucket len also multiplies the total number of slots. Other approaches like linear probing hashmaps store elements with conflicting hashes automatically in the next slot. This isn't practicle for this hashmap since we would never be able to break out of a loop when looping to a next slot. It could be worth to try this approach with a bounded maximum on this next slot loop though. Theoretically a linear probing hashmap with a bounded maximum pseudo-bucket size of say 5 would have much more sharing than a bucketed approach. This is because in the bucketed approach each bucket only stores elements of the same hash (mod len). The linear probe however, a random slice of say 5 in the hashmap may begin with 2 elements which had the same hash (mod len), followed by an empty slot, and followed by two more elements with different hashes. So a linear probing map with a maximum loop of say 10 elements could still be more space efficient than a bucketed hashmap with a bucket size of 5. Note that for a linear probing map to be implemented in a flat fashion with a maximum loop bound like this, we also need to add this loop count of empty spaces to the end of the map to ensure we can still loop past the last slot that many times. This small constant factor shouldn't be an issue though. |
So far the linear probing hashmap has been performing much worse than expected. Putting aside the size failures - I'm not sure why a linear probing map with 4 max iterations would be performing so much worse time-wise than a bucketed map with a bucket length of 4, and thus the same 4 iterations through each element of a bucket. When you then consider the linear probing map has an actual length and slot count of |
Changes to circuit sizes
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
After speaking with @guipublic I've determined the somewhat odd performance patterns here to be the result of arrays being copied in Acir if they are used again afterward. All of these approaches write to the main array of the hashmap, but I think the bucketed approach may write to it less often since most writes will be to buckets until the bucket is written to the bucket array at the end. I've made an issue for this here: #4621 I think adding a similar copy on write optimization that brillig has to these arrays will be useful. This is partly evidenced by the speedup these hashmaps see when main is changed to |
I've looked into this some more and found that every array_set is actually already mutated rather than copied. The existing last use analysis applies to 100% of array sets on master for example. Manually setting acir gen to always mutate also does not improve performance. I'm back to square one again on why master & linear are so much slower than the bucketed approach. I wonder if it could be related to #4629. |
Description
Problem*
Resolves #4584
Summary*
HashMap
on master is quite slow. Notably:break
in constrained codeI'm making this draft PR to test out a few alternatives in terms of proving time. So far I have two alternative implementations in mind:
BUCKET_ITEMS
maximum number of collisions before we must assert/error. There is also a memory tradeoff since the total amount of slots will beN * BUCKET_ITEMS
. For reference, many probing hashmaps tend to have a capacity of roughlyN * 2
- similar to Vectors.unconstrained
(PR here chore: Optimize HashMap: unconstrained approach #4605). Usingunconstrained
for retrieving an element we can usebreak
which gives obvious speedups. The downside of this approach is more just the danger of usingunconstrained
, and that it requires the lib to be vigilant for under-constrained code. This is also a bit unspecific since "using constrained" is rather general. There will be large difference depending on the specific HashMap architecture chosen around the unconstrained functions. The current unconstrained code is more of a proof of concept as well. It isn't yet abstracted over every key, value type for example. It's accessor methods are also different in that they assert a value is present instead of returning an optional value. Another notable difference is that the pedersen hasher is used for the other hash map types, where the unconstrained approach does not hash the keys at all.All approaches are works in progress so any suggestions on further optimizations are appreciated.
The different approaches used to be separate PRs, but they got too spammy. They're now just branches:
Tests
Simple Insert and Get
Timings:
master & linear are both getting noticeably slower as the map gets larger which is odd & concerning.
75% full, 50% get
Timings (all use nargo compiled in debug, 1 warmup run, median of just 3 tries):
200 * 5 = 1000
total slots for elements.N = 200
case though, it is still slower than the bucketed approach with a bucket size of 5.Documentation*
Check one:
PR Checklist*
cargo fmt
on default settings.