-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disk bucket stores single entry in index file #30750
Conversation
6a11a38
to
6131818
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to re-read try_write
storage_cap_and_offset: PackedStorage, | ||
// if the bucket doubled, the index can be recomputed using create_bucket_capacity_pow2 | ||
pub num_slots: Slot, // can this be smaller? epoch size should ~ be the max len. this is the num elements in the slot list | ||
/// the first 'data element. This will only be meaningful if `num_slots`=1. Otherwise, all values are in the data bucket. | ||
pub first_element: T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting here so I don't forget. We had talked about doing the following in a subsequent PR:
Combing storage_cap_and_offset
, num_slots
, and first_element
into an enum:
enum IndexEntrySlots {
None,
Single(T),
Many(num_slots, storage_cap_and_offset),
}
bucket_map/src/bucket.rs
Outdated
elem.first_element = if num_slots == 1 { | ||
// replace | ||
*data.next().unwrap() | ||
} else { | ||
// set to default for cleanliness | ||
T::default() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you like this better or worse? (fwiw, I don't know if rust will allow the borrow inside the or_else
...)
elem.first_element = if num_slots == 1 { | |
// replace | |
*data.next().unwrap() | |
} else { | |
// set to default for cleanliness | |
T::default() | |
}; | |
elem.first_element = *data.next().unwrap_or_else(|| &T::default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworked it differently. I think you'll like it.
numbers on mnb are promising. So, this change is slightly faster than master. But, importantly, it should be half the file i/o. |
Codecov Report
@@ Coverage Diff @@
## master #30750 +/- ##
=======================================
Coverage 81.3% 81.3%
=======================================
Files 724 724
Lines 202941 203024 +83
=======================================
+ Hits 165108 165184 +76
- Misses 37833 37840 +7 |
bucket_map/src/bucket.rs
Outdated
// new data stored should be stored in elem.`first_element` | ||
// new data len is 0 or 1 | ||
elem.num_slots = num_slots; | ||
elem.first_element = data.next().cloned().unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since T
is Copy
, I think we could also use .copied()
here instead of .cloned()
. Dunno how much of a difference (if any) it would make though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this should be .copied()
. I changed it.
Co-authored-by: Brooks <[email protected]>
Co-authored-by: Brooks <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
not doing this until we get rid of the cost to the file size |
Problem
See #30711
99% case is a single (slot, account info) tuple per pubkey. This can be stored in the index file instead of always requiring a second data file.
This will have compounding benefits for performance.
In theory, disk i/o will be half for the common case of a single slot per entry. The data files will only be read and written for entries with more than 1 slot.
Summary of Changes
Fixes #