when writing to disk bucket index, tune towards packing tighter #30761
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
see #30711
The current implementation of disk buckets (as used by accounts index on disk) was optimized for use as a hashmap with good speed in all cases.
The implementation in the validator synchronizes the in-mem hash map with the disk based one in the background.
Currently, we resize data buckets when we don't find an empty spot when starting a search at a random offset and searching for
max_search
, which is defaulted to approximately 32.This max search makes sense for the index buckets where we have to exhaustively search on read and write to prove something does not exist. For data buckets, we just need to find any vacant bucket to store data. The offset will then be stored in the index bucket.
Summary of Changes
Search 10x locations before resizing disk buckets. This will result in more compact data buckets, improving performance for reads and writes. Insertions or updates with grown/shrunk slot lists can be slower, but these only happen in the background.
Fixes #