Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shares accounts hash cache data between full and incremental #33164

Merged

Conversation

brooksprumo
Copy link
Contributor

@brooksprumo brooksprumo commented Sep 6, 2023

Problem

We cache the data used to calculate the accounts hash. Currently, this data is duplicated for full and incremental accounts hash calculations, but doesn't need to be.

Summary of Changes

Use the same path to cache the data used by both full and incremental accounts hash calculations.

Please see #33164 (comment) for discussion on the implementation and its implications.

(Note: There are future PRs queued up to clean up the old directory names.)

Testing

Since none of the real clusters have the IAH feature gate enabled, I wanted to test that the accounts hash calculations were still correct. I spun up a local testnet with 5 validators and let it run for multiple days. It used the default snapshot intervals (full: 25,000 slots, incremental: 100 slots), and so it has performed approximately 2,350 incremental accounts hash calculations and 98 full accounts hash calculations. No issues so far.

It has also performed an epoch accounts hash, so that confirms via consensus that all the nodes are still doing the right thing.

@brooksprumo brooksprumo self-assigned this Sep 6, 2023
@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Merging #33164 (dcc5e3f) into master (f4816dc) will decrease coverage by 0.1%.
Report is 2 commits behind head on master.
The diff coverage is 86.9%.

@@            Coverage Diff            @@
##           master   #33164     +/-   ##
=========================================
- Coverage    82.1%    82.1%   -0.1%     
=========================================
  Files         785      785             
  Lines      211205   211208      +3     
=========================================
- Hits       173432   173410     -22     
- Misses      37773    37798     +25     

📢 Have feedback on the report? Share it here.

@brooksprumo brooksprumo added the work in progress This isn't quite right yet label Sep 6, 2023
@brooksprumo brooksprumo force-pushed the accounts-hash-cache/shared-data branch 2 times, most recently from a7c7f8a to a1801b4 Compare September 6, 2023 19:29
Comment on lines +208 to +210
if self.should_delete_old_cache_files_on_drop {
self.delete_old_cache_files();
}
Copy link
Contributor Author

@brooksprumo brooksprumo Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impl will be correct, but it currently also won't clean up the non-full ranges that exist in the newest slots.

For example, here's what the accounts hash cache dir could look like:

1. -rw-r--r-- 1 sol users 45080 Sep  8 20:14 0.2500.0.65536.7195195454841191751
2. -rw-r--r-- 1 sol users   872 Sep  8 20:31 2500.4001.0.65536.16192472734143462858
3. -rw-r--r-- 1 sol users   584 Sep  8 20:32 4064.4101.0.65536.7196231288448241015
4. -rw-r--r-- 1 sol users  1016 Sep  8 20:34 4064.4201.0.65536.13132790460150755029
5. -rw-r--r-- 1 sol users  1016 Sep  8 20:36 4064.4301.0.65536.7276174198129441015
6. -rw-r--r-- 1 sol users  1016 Sep  8 20:38 4064.4401.0.65536.14404559913584167901
7. -rw-r--r-- 1 sol users  1016 Sep  8 20:39 4064.4501.0.65536.4542188866695766599
8. -rw-r--r-- 1 sol users  1016 Sep  8 20:41 4064.4601.0.65536.11919391694395303358
9. -rw-r--r-- 1 sol users  1016 Sep  8 20:43 4064.4701.0.65536.12791177168748020520

Lines 1 and 2 are full ranges, so these ranges will not change.

Lines 3-9 all have the same starting slot, but their ending slot continues to grow, since the range is not full yet. These are created as a result of incremental snapshots. Ideally lines 3-8 should have been deleted. Without this PR, they are indeed deleted.

So we do want to delete the files with non-full ranges. (At least the ones for the newer slots)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the current impl is not perfect, it is at least better than the status quo. Luckily, the incremental accounts hash feature gate is not enabled, so no real clusters have nodes with double the cache files.

Without this PR, IAH will duplicate all the files it needs, and requires the full accounts hash calculation to redo all the work that IAH did (to recreate the cache files, but for 'full' instead).

On pop-net, where the snapshot intervals are quite long, this ends up being a lot of duplicated work.

With this PR, we will have old files that won't get cleaned up until the next full accounts hash calculation runs. But we will save on redoing the work of creating the cache files.

I think this is a net win, as perf should be better, and the overall number of files is lower. Future PRs can cleanup these unused files sooner, if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing up the explanation.

yeah. keeping them util next full snapshot works fine for me.

@brooksprumo brooksprumo removed the work in progress This isn't quite right yet label Sep 11, 2023
@brooksprumo brooksprumo marked this pull request as ready for review September 11, 2023 13:21
@brooksprumo brooksprumo added the v1.16 PRs that should be backported to v1.16 label Sep 11, 2023
@brooksprumo brooksprumo requested review from jeffwashington and HaoranYi and removed request for jeffwashington September 11, 2023 14:29
Copy link
Contributor

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines +208 to +210
if self.should_delete_old_cache_files_on_drop {
self.delete_old_cache_files();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing up the explanation.

yeah. keeping them util next full snapshot works fine for me.

@brooksprumo brooksprumo merged commit 6298c6c into solana-labs:master Sep 11, 2023
@brooksprumo brooksprumo deleted the accounts-hash-cache/shared-data branch September 11, 2023 19:55
mergify bot pushed a commit that referenced this pull request Sep 11, 2023
(cherry picked from commit 6298c6c)

# Conflicts:
#	runtime/src/accounts_db.rs
#	runtime/src/cache_hash_data.rs
t-nelson pushed a commit that referenced this pull request Sep 13, 2023
…backport of #33164) (#33212)

* Shares accounts hash cache data between full and incremental (#33164)

(cherry picked from commit 6298c6c)

# Conflicts:
#	runtime/src/accounts_db.rs
#	runtime/src/cache_hash_data.rs

* fix merge conflicts

---------

Co-authored-by: Brooks <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.16 PRs that should be backported to v1.16
Projects
Development

Successfully merging this pull request may close these issues.

2 participants