disk bucket stores single entry in index file #30750

jeffwashington · 2023-03-16T14:34:35Z

Problem

99% case is a single (slot, account info) tuple per pubkey. This can be stored in the index file instead of always requiring a second data file.
This will have compounding benefits for performance.
In theory, disk i/o will be half for the common case of a single slot per entry. The data files will only be read and written for entries with more than 1 slot.

Summary of Changes

Fixes #

brooksprumo

I still need to re-read try_write

brooksprumo · 2023-03-16T20:15:05Z

bucket_map/src/index_entry.rs

    storage_cap_and_offset: PackedStorage,
    // if the bucket doubled, the index can be recomputed using create_bucket_capacity_pow2
    pub num_slots: Slot, // can this be smaller? epoch size should ~ be the max len. this is the num elements in the slot list
+    /// the first 'data element. This will only be meaningful if `num_slots`=1. Otherwise, all values are in the data bucket.
+    pub first_element: T,


Noting here so I don't forget. We had talked about doing the following in a subsequent PR:

Combing storage_cap_and_offset, num_slots, and first_element into an enum:

enum IndexEntrySlots { None, Single(T), Many(num_slots, storage_cap_and_offset), }

bucket_map/src/index_entry.rs

brooksprumo · 2023-03-16T20:25:07Z

bucket_map/src/bucket.rs

+            elem.first_element = if num_slots == 1 {
+                // replace
+                *data.next().unwrap()
+            } else {
+                // set to default for cleanliness
+                T::default()
+            };


Do you like this better or worse? (fwiw, I don't know if rust will allow the borrow inside the or_else...)

Suggested change

elem.first_element = if num_slots == 1 {

// replace

*data.next().unwrap()

} else {

// set to default for cleanliness

T::default()

};

elem.first_element = *data.next().unwrap_or_else(|| &T::default())

reworked it differently. I think you'll like it.

jeffwashington · 2023-03-16T21:34:08Z

numbers on mnb are promising.
ledger-tool verify with mnb snapshot (and incremental).
generate_index took 8.2s vs master: (8.9s and 9.5s)
ledger processed in:
1:05 vs master: (1:06, 1:12)

So, this change is slightly faster than master. But, importantly, it should be half the file i/o.

codecov · 2023-03-16T22:58:00Z

Codecov Report

Merging #30750 (cc5bfda) into master (05ee068) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #30750   +/-   ##
=======================================
  Coverage    81.3%    81.3%           
=======================================
  Files         724      724           
  Lines      202941   203024   +83     
=======================================
+ Hits       165108   165184   +76     
- Misses      37833    37840    +7

brooksprumo · 2023-03-17T02:20:31Z

bucket_map/src/bucket.rs

+            // new data stored should be stored in elem.`first_element`
+            // new data len is 0 or 1
+            elem.num_slots = num_slots;
+            elem.first_element = data.next().cloned().unwrap_or_default();


Since T is Copy, I think we could also use .copied() here instead of .cloned(). Dunno how much of a difference (if any) it would make though.

yes, this should be .copied(). I changed it.

Co-authored-by: Brooks <[email protected]>

brooksprumo

lgtm

jeffwashington · 2023-03-22T22:35:22Z

not doing this until we get rid of the cost to the file size

jeffwashington mentioned this pull request Mar 16, 2023

disk accounts index performance improvements #30711

Closed

jeffwashington force-pushed the mm11 branch 4 times, most recently from 6a11a38 to 6131818 Compare March 16, 2023 19:35

jeffwashington changed the title ~~wip: add a single info to bucket index~~ disk bucket stores single entry in index file Mar 16, 2023

jeffwashington marked this pull request as ready for review March 16, 2023 19:36

jeffwashington requested a review from brooksprumo March 16, 2023 19:36

brooksprumo reviewed Mar 16, 2023

View reviewed changes

brooksprumo self-requested a review March 16, 2023 20:28

brooksprumo previously approved these changes Mar 17, 2023

View reviewed changes

jeffwashington and others added 6 commits March 17, 2023 14:36

disk bucket stores single entry in index file

7c326b1

Update bucket_map/src/index_entry.rs

928d8e5

Co-authored-by: Brooks <[email protected]>

Update bucket_map/src/index_entry.rs

a642941

Co-authored-by: Brooks <[email protected]>

pr feedback

51c2ea2

get location before changing location

7e94e2c

improve bucket map test

07d5738

jeffwashington dismissed brooksprumo’s stale review via 07d5738 March 17, 2023 19:36

jeffwashington force-pushed the mm11 branch from d021cc6 to 07d5738 Compare March 17, 2023 19:36

cloned() -> copied()

cc5bfda

jeffwashington requested a review from brooksprumo March 17, 2023 19:38

brooksprumo approved these changes Mar 17, 2023

View reviewed changes

jeffwashington closed this Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk bucket stores single entry in index file #30750

disk bucket stores single entry in index file #30750

jeffwashington commented Mar 16, 2023 •

edited

Loading

brooksprumo left a comment

brooksprumo Mar 16, 2023

brooksprumo Mar 16, 2023

jeffwashington Mar 16, 2023

jeffwashington commented Mar 16, 2023

codecov bot commented Mar 16, 2023 •

edited

Loading

brooksprumo Mar 17, 2023

jeffwashington Mar 17, 2023

brooksprumo left a comment

jeffwashington commented Mar 22, 2023

disk bucket stores single entry in index file #30750

disk bucket stores single entry in index file #30750

Conversation

jeffwashington commented Mar 16, 2023 • edited Loading

Problem

Summary of Changes

brooksprumo left a comment

Choose a reason for hiding this comment

brooksprumo Mar 16, 2023

Choose a reason for hiding this comment

brooksprumo Mar 16, 2023

Choose a reason for hiding this comment

jeffwashington Mar 16, 2023

Choose a reason for hiding this comment

jeffwashington commented Mar 16, 2023

codecov bot commented Mar 16, 2023 • edited Loading

Codecov Report

brooksprumo Mar 17, 2023

Choose a reason for hiding this comment

jeffwashington Mar 17, 2023

Choose a reason for hiding this comment

brooksprumo left a comment

Choose a reason for hiding this comment

jeffwashington commented Mar 22, 2023

jeffwashington commented Mar 16, 2023 •

edited

Loading

codecov bot commented Mar 16, 2023 •

edited

Loading