at startup, keep duplicates in in-memory index since they will be cleaned shortly #30736

jeffwashington · 2023-03-15T20:33:12Z

Problem

At startup, scan storages to populate index. We can easily identify pubkeys that are in multiple slots (duplicates).
Today, these duplicates are inserted into the disk index. As soon as the first clean runs, the index entries for all these items will be loaded and the older ones will be cleaned (ie. removed) and then the entries will be rewritten. This write/read/rewrite is a waste of i/o bandwidth.

Summary of Changes

Hold all items with duplicates in in-memory index since we know clean will shortly read/modify/write all of them. This improves index generation performance and the performance of the first clean.

Fixes #

…aned soon

brooksprumo

Code looks good. A few questions below.

runtime/src/accounts_index.rs

brooksprumo · 2023-03-22T14:05:42Z

runtime/src/in_mem_accounts_index.rs

-                        // merge this in, mark as duplicate
-                        duplicates.push((slot, k));
-                        if current_slot_list.len() == 1 {
+                    Some((current_slot_list, ref_count)) => {


This comment is actually for the None arm, but I cannot comment on it (line 1068)

When inserting/updating the disk, I remember you mentioning that we don't need anything on disk with ref counts greater than 1; is that the case here with this logic (where new_ref_count is always 1 on line 1071)?

I'm not following. The none arm is for when we are inserting pubkey A but A does not exist on disk yet. So we get a None. Then, we insert it directly to disk with a ref count of 1 (or 0 if entry is cached). This is correct and what we would expect. A second attempt at inserting will leave disk unchanged and add the duplicate to duplicates. Later, when duplicates is iterated, we will load the single entry (and 1 refcount) from disk and merge it with the contents of duplicates, increasing the refcount on the in-memory entry to 2+. The eviction caching logic will prevent us from removing the 2+ refcount entries to disk depending on the tuning of that cache.

Basically I was trying to answer the question of "the refcount is always 1 here, right?". I didn't know/remember about the cached case, so refcount can be 0 too. Was seeing if an assert here made sense or not. I don't think anything needs to change; was endeavoring to understand more 😺

brooksprumo

lgtm

codecov · 2023-03-22T15:32:25Z

Codecov Report

Merging #30736 (c0e0fd3) into master (4285cb2) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #30736   +/-   ##
=======================================
  Coverage    81.4%    81.4%           
=======================================
  Files         723      723           
  Lines      203533   203537    +4     
=======================================
+ Hits       165845   165876   +31     
+ Misses      37688    37661   -27

…l be cleaned shortly (solana-labs#30736)" This reverts commit 9a1d5ea.

jeffwashington mentioned this pull request Mar 15, 2023

disk accounts index performance improvements #30711

Closed

jeffwashington force-pushed the mm9 branch 5 times, most recently from 33e03b6 to efef26d Compare March 21, 2023 21:52

jeffwashington changed the title ~~wip: at startup, keep duplicates in in-memory index since they will be cleaned shortly~~ at startup, keep duplicates in in-memory index since they will be cleaned shortly Mar 21, 2023

jeffwashington marked this pull request as ready for review March 21, 2023 21:53

jeffwashington requested a review from brooksprumo March 21, 2023 21:53

jeffwashington force-pushed the mm9 branch from efef26d to b7ae8cf Compare March 22, 2023 13:52

at startup, keep duplicates in in-memory index since they will be cle…

c0e0fd3

…aned soon

jeffwashington force-pushed the mm9 branch from b7ae8cf to c0e0fd3 Compare March 22, 2023 13:52

brooksprumo reviewed Mar 22, 2023

View reviewed changes

brooksprumo approved these changes Mar 22, 2023

View reviewed changes

jeffwashington merged commit 9a1d5ea into solana-labs:master Mar 22, 2023

jeffwashington added a commit to jeffwashington/solana that referenced this pull request Mar 22, 2023

Revert "at startup, keep duplicates in in-memory index since they wil…

ed71286

…l be cleaned shortly (solana-labs#30736)" This reverts commit 9a1d5ea.

jeffwashington mentioned this pull request Mar 22, 2023

Revert "at startup, keep duplicates in in-memory index since they wil… #30851

Closed

jeffwashington mentioned this pull request Mar 31, 2023

reduce contention on startup index generation #31006

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

at startup, keep duplicates in in-memory index since they will be cleaned shortly #30736

at startup, keep duplicates in in-memory index since they will be cleaned shortly #30736

jeffwashington commented Mar 15, 2023 •

edited

Loading

brooksprumo left a comment

brooksprumo Mar 22, 2023

jeffwashington Mar 22, 2023

brooksprumo Mar 22, 2023

brooksprumo left a comment

codecov bot commented Mar 22, 2023

at startup, keep duplicates in in-memory index since they will be cleaned shortly #30736

at startup, keep duplicates in in-memory index since they will be cleaned shortly #30736

Conversation

jeffwashington commented Mar 15, 2023 • edited Loading

Problem

Summary of Changes

brooksprumo left a comment

Choose a reason for hiding this comment

brooksprumo Mar 22, 2023

Choose a reason for hiding this comment

jeffwashington Mar 22, 2023

Choose a reason for hiding this comment

brooksprumo Mar 22, 2023

Choose a reason for hiding this comment

brooksprumo left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 22, 2023

Codecov Report

jeffwashington commented Mar 15, 2023 •

edited

Loading