Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk bucket stores single entry in index file #30750

Closed
wants to merge 7 commits into from

Conversation

jeffwashington
Copy link
Contributor

@jeffwashington jeffwashington commented Mar 16, 2023

Problem

See #30711

99% case is a single (slot, account info) tuple per pubkey. This can be stored in the index file instead of always requiring a second data file.
This will have compounding benefits for performance.
In theory, disk i/o will be half for the common case of a single slot per entry. The data files will only be read and written for entries with more than 1 slot.

Summary of Changes

Fixes #

@jeffwashington jeffwashington force-pushed the mm11 branch 4 times, most recently from 6a11a38 to 6131818 Compare March 16, 2023 19:35
@jeffwashington jeffwashington changed the title wip: add a single info to bucket index disk bucket stores single entry in index file Mar 16, 2023
@jeffwashington jeffwashington marked this pull request as ready for review March 16, 2023 19:36
Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to re-read try_write

Comment on lines 25 to +29
storage_cap_and_offset: PackedStorage,
// if the bucket doubled, the index can be recomputed using create_bucket_capacity_pow2
pub num_slots: Slot, // can this be smaller? epoch size should ~ be the max len. this is the num elements in the slot list
/// the first 'data element. This will only be meaningful if `num_slots`=1. Otherwise, all values are in the data bucket.
pub first_element: T,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting here so I don't forget. We had talked about doing the following in a subsequent PR:

Combing storage_cap_and_offset, num_slots, and first_element into an enum:

enum IndexEntrySlots {
    None,
    Single(T),
    Many(num_slots, storage_cap_and_offset),
}

bucket_map/src/index_entry.rs Show resolved Hide resolved
bucket_map/src/index_entry.rs Outdated Show resolved Hide resolved
bucket_map/src/index_entry.rs Outdated Show resolved Hide resolved
Comment on lines 303 to 309
elem.first_element = if num_slots == 1 {
// replace
*data.next().unwrap()
} else {
// set to default for cleanliness
T::default()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you like this better or worse? (fwiw, I don't know if rust will allow the borrow inside the or_else...)

Suggested change
elem.first_element = if num_slots == 1 {
// replace
*data.next().unwrap()
} else {
// set to default for cleanliness
T::default()
};
elem.first_element = *data.next().unwrap_or_else(|| &T::default())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworked it differently. I think you'll like it.

@brooksprumo brooksprumo self-requested a review March 16, 2023 20:28
@jeffwashington
Copy link
Contributor Author

numbers on mnb are promising.
ledger-tool verify with mnb snapshot (and incremental).
generate_index took 8.2s vs master: (8.9s and 9.5s)
ledger processed in:
1:05 vs master: (1:06, 1:12)

So, this change is slightly faster than master. But, importantly, it should be half the file i/o.

@codecov
Copy link

codecov bot commented Mar 16, 2023

Codecov Report

Merging #30750 (cc5bfda) into master (05ee068) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #30750   +/-   ##
=======================================
  Coverage    81.3%    81.3%           
=======================================
  Files         724      724           
  Lines      202941   203024   +83     
=======================================
+ Hits       165108   165184   +76     
- Misses      37833    37840    +7     

brooksprumo
brooksprumo previously approved these changes Mar 17, 2023
// new data stored should be stored in elem.`first_element`
// new data len is 0 or 1
elem.num_slots = num_slots;
elem.first_element = data.next().cloned().unwrap_or_default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since T is Copy, I think we could also use .copied() here instead of .cloned(). Dunno how much of a difference (if any) it would make though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this should be .copied(). I changed it.

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jeffwashington
Copy link
Contributor Author

not doing this until we get rid of the cost to the file size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants