-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid early clean and bad snapshot by ref-counting #8724
Avoid early clean and bad snapshot by ref-counting #8724
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8724 +/- ##
======================================
Coverage 79.9% 80.0%
======================================
Files 261 261
Lines 57141 57238 +97
======================================
+ Hits 45699 45795 +96
- Misses 11442 11443 +1 |
@ryoqun Good find. I will take a look. |
for slot in dead_slots.iter() { | ||
for store in storage.0.get(slot).unwrap().values() { | ||
for account in store.accounts.accounts(0) { | ||
index.unref_from_storage(&account.meta.pubkey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no better approach to this problem, we can parallelize/async this code as an further improvement to this PR.
Hmm. This is a good way to do it. I'm not super excited about the storage scan we have to do when cleaning up the slot. The other way I suppose would be to keep another index for each appendvec which is a hashmap of pubkey to ref count for that appendvec. I'm still thinking about it though. |
Another option could be to not remove the fork entries from the accounts index until the append vec store is actually removed for that entry. |
Yeah, me too...
I considered that impl, too. But, I opted to do the current PR's impl because having two index maps of 32 byte keys (which could be huge) could waste memory too much. I decided that without much thought, though. So, your thoughts are very welcome! |
I went against that approach, mainly because the number of fork entries recently got desired to small with #8436, which traverses many entries each time snapshot intervals, while cleaning for the next run. Also, there is similar concern for lazy cleaning's fork entry traversal inside regular index updates. As with my previous comment, I'm decided without much thought. I'm very open to your opinions! |
I've extensively tested this PR locally and found this PR actually fixes the actual bank hash mismatch error when cleaning a recent TdS snapshot with new cleaning impl (not pushed to #8337). |
@sakridge How's your thoughts? Do you need some bench info or anything to decide which implementation strategy we should follow? |
I think this is reasonable. I like that it seems like the work could be done async. The other methods would require more locking in the critical update path. Let's move forward with this. It would be nice to get an idea of what the performance is like though before we merge it. |
9701385
to
1697ca5
Compare
I've done minor clean-ups and rebased on the current HEAD of origin/master and measure the before and after as-is: grafana-testnet-monitor-edge-ryoqun-storage-gc-before.pdf As guessed, performance is considerably affected. I'll try some perf improvements. |
@sakridge My performance understanding from the preceding benchmark results in a similar manner to this and this:
|
@sakridge Also, for the record, the pdf creating step is easy as:
|
* Avoid early clean and bad snapshot by ref-counting * Add measure * Clean ups * clean ups (cherry picked from commit 952cd38)
for (_slot_id, account_info) in account_infos { | ||
if *store_counts.get(&account_info.store_id).unwrap() != 0 { | ||
if *store_counts.get(&account_info.store_id).unwrap() == 0 { | ||
would_unref_count += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this counting was flawed from start....
This doesn't align with this incrementing when there are multiple stores: https://github.com/solana-labs/solana/pull/8724/files#diff-ef68c28d9d63e66355c44904d9877825R125
Fixed in #12462
Problem
I've just found another corner case where a validator can create bad snapshot.....
A validator can clean any rooted slots at any timings unless the account's balance is 0-lamport. So, online processing of
clean_accounts
can have incomplete view of alive storages given a pubkey. So, it could purge a shielding zero lamport pubkey too early and create a bad snapshot. In that case, if restored from such a snapshot, the ill-state AccountsIndex could incorrectly revive the 0-lamport account using the old state from old storage.For details, please see the accompanied unit test. (it's huge..)
Anyway, this bug manifests more often if we enable off-line snapshot clean because the apparently-leaking storages are acutally shielding such a bad account view revival in some cases, ironically.
Summary of Changes
Introduce ref-count from storages to the index. Its counting lifetime is bound so that it's unconditionally {in,de}cremented at
AccountsIndex::insert()
andclean_dead_slots
, respectively.This is a bit costly (basically: temporal cost: +1 wlock per 1 account update, spatial cost: 8 byte per pubkey)
@sakridge This PR is still kind of draft.. Is there better solution? If none, I'll do remaining cleanings like the hideous scattered
.0
s and outdated names.@mvines I've not tested but this may be possible cause of past reported snapshot verification error in the SLP cluster with 0.23, where the recent regressed snapshot verification bug shouldn't exist.
Part of #8168