Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

at startup, keep duplicates in in-memory index since they will be cleaned shortly #30736

Merged
merged 1 commit into from
Mar 22, 2023

Conversation

jeffwashington
Copy link
Contributor

@jeffwashington jeffwashington commented Mar 15, 2023

Problem

See #30711

At startup, scan storages to populate index. We can easily identify pubkeys that are in multiple slots (duplicates).
Today, these duplicates are inserted into the disk index. As soon as the first clean runs, the index entries for all these items will be loaded and the older ones will be cleaned (ie. removed) and then the entries will be rewritten. This write/read/rewrite is a waste of i/o bandwidth.

Summary of Changes

Hold all items with duplicates in in-memory index since we know clean will shortly read/modify/write all of them. This improves index generation performance and the performance of the first clean.

Fixes #

@jeffwashington jeffwashington force-pushed the mm9 branch 5 times, most recently from 33e03b6 to efef26d Compare March 21, 2023 21:52
@jeffwashington jeffwashington changed the title wip: at startup, keep duplicates in in-memory index since they will be cleaned shortly at startup, keep duplicates in in-memory index since they will be cleaned shortly Mar 21, 2023
@jeffwashington jeffwashington marked this pull request as ready for review March 21, 2023 21:53
Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. A few questions below.

runtime/src/accounts_index.rs Show resolved Hide resolved
// merge this in, mark as duplicate
duplicates.push((slot, k));
if current_slot_list.len() == 1 {
Some((current_slot_list, ref_count)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is actually for the None arm, but I cannot comment on it (line 1068)

When inserting/updating the disk, I remember you mentioning that we don't need anything on disk with ref counts greater than 1; is that the case here with this logic (where new_ref_count is always 1 on line 1071)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following. The none arm is for when we are inserting pubkey A but A does not exist on disk yet. So we get a None. Then, we insert it directly to disk with a ref count of 1 (or 0 if entry is cached). This is correct and what we would expect. A second attempt at inserting will leave disk unchanged and add the duplicate to duplicates. Later, when duplicates is iterated, we will load the single entry (and 1 refcount) from disk and merge it with the contents of duplicates, increasing the refcount on the in-memory entry to 2+. The eviction caching logic will prevent us from removing the 2+ refcount entries to disk depending on the tuning of that cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I was trying to answer the question of "the refcount is always 1 here, right?". I didn't know/remember about the cached case, so refcount can be 0 too. Was seeing if an assert here made sense or not. I don't think anything needs to change; was endeavoring to understand more 😺

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Merging #30736 (c0e0fd3) into master (4285cb2) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #30736   +/-   ##
=======================================
  Coverage    81.4%    81.4%           
=======================================
  Files         723      723           
  Lines      203533   203537    +4     
=======================================
+ Hits       165845   165876   +31     
+ Misses      37688    37661   -27     

@jeffwashington jeffwashington merged commit 9a1d5ea into solana-labs:master Mar 22, 2023
jeffwashington added a commit to jeffwashington/solana that referenced this pull request Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants