Avoid early clean and bad snapshot by ref-counting #8724

ryoqun · 2020-03-09T06:07:43Z

Problem

I've just found another corner case where a validator can create bad snapshot.....

A validator can clean any rooted slots at any timings unless the account's balance is 0-lamport. So, online processing of clean_accounts can have incomplete view of alive storages given a pubkey. So, it could purge a shielding zero lamport pubkey too early and create a bad snapshot. In that case, if restored from such a snapshot, the ill-state AccountsIndex could incorrectly revive the 0-lamport account using the old state from old storage.

For details, please see the accompanied unit test. (it's huge..)

Anyway, this bug manifests more often if we enable off-line snapshot clean because the apparently-leaking storages are acutally shielding such a bad account view revival in some cases, ironically.

Summary of Changes

Introduce ref-count from storages to the index. Its counting lifetime is bound so that it's unconditionally {in,de}cremented at AccountsIndex::insert() and clean_dead_slots, respectively.

This is a bit costly (basically: temporal cost: +1 wlock per 1 account update, spatial cost: 8 byte per pubkey)

@sakridge This PR is still kind of draft.. Is there better solution? If none, I'll do remaining cleanings like the hideous scattered .0s and outdated names.

@mvines I've not tested but this may be possible cause of past reported snapshot verification error in the SLP cluster with 0.23, where the recent regressed snapshot verification bug shouldn't exist.

Part of #8168

codecov · 2020-03-09T07:46:11Z

Codecov Report

Merging #8724 into master will increase coverage by 0.0%.
The diff coverage is 99.2%.

@@          Coverage Diff           @@
##           master   #8724   +/-   ##
======================================
  Coverage    79.9%   80.0%           
======================================
  Files         261     261           
  Lines       57141   57238   +97     
======================================
+ Hits        45699   45795   +96     
- Misses      11442   11443    +1

runtime/src/accounts_db.rs

sakridge · 2020-03-09T16:22:09Z

@ryoqun Good find. I will take a look.

ryoqun · 2020-03-09T22:54:01Z

runtime/src/accounts_db.rs

+                for slot in dead_slots.iter() {
+                    for store in storage.0.get(slot).unwrap().values() {
+                        for account in store.accounts.accounts(0) {
+                            index.unref_from_storage(&account.meta.pubkey);


If there is no better approach to this problem, we can parallelize/async this code as an further improvement to this PR.

runtime/src/accounts_index.rs

sakridge · 2020-03-10T02:19:30Z

@ryoqun Good find. I will take a look.

Hmm. This is a good way to do it.

I'm not super excited about the storage scan we have to do when cleaning up the slot. The other way I suppose would be to keep another index for each appendvec which is a hashmap of pubkey to ref count for that appendvec. I'm still thinking about it though.

sakridge · 2020-03-10T02:29:33Z

@ryoqun Good find. I will take a look.

Hmm. This is a good way to do it.

I'm not super excited about the storage scan we have to do when cleaning up the slot. The other way I suppose would be to keep another index for each appendvec which is a hashmap of pubkey to ref count for that appendvec. I'm still thinking about it though.

Another option could be to not remove the fork entries from the accounts index until the append vec store is actually removed for that entry.

ryoqun · 2020-03-10T11:26:41Z

@ryoqun Good find. I will take a look.

Hmm. This is a good way to do it.

I'm not super excited about the storage scan we have to do when cleaning up the slot.

Yeah, me too...

The other way I suppose would be to keep another index for each appendvec which is a hashmap of pubkey to ref count for that appendvec. I'm still thinking about it though.

I considered that impl, too. But, I opted to do the current PR's impl because having two index maps of 32 byte keys (which could be huge) could waste memory too much. I decided that without much thought, though. So, your thoughts are very welcome!

ryoqun · 2020-03-10T11:35:19Z

@ryoqun Good find. I will take a look.

Hmm. This is a good way to do it.
I'm not super excited about the storage scan we have to do when cleaning up the slot. The other way I suppose would be to keep another index for each appendvec which is a hashmap of pubkey to ref count for that appendvec. I'm still thinking about it though.

Another option could be to not remove the fork entries from the accounts index until the append vec store is actually removed for that entry.

I went against that approach, mainly because the number of fork entries recently got desired to small with #8436, which traverses many entries each time snapshot intervals, while cleaning for the next run. Also, there is similar concern for lazy cleaning's fork entry traversal inside regular index updates.

As with my previous comment, I'm decided without much thought. I'm very open to your opinions!

ryoqun · 2020-03-10T11:44:18Z

I've extensively tested this PR locally and found this PR actually fixes the actual bank hash mismatch error when cleaning a recent TdS snapshot with new cleaning impl (not pushed to #8337).

ryoqun · 2020-03-11T00:42:38Z

@sakridge How's your thoughts? Do you need some bench info or anything to decide which implementation strategy we should follow?

sakridge · 2020-03-11T00:56:12Z

@sakridge How's your thoughts? Do you need some bench info or anything to decide which implementation strategy we should follow?

I think this is reasonable. I like that it seems like the work could be done async. The other methods would require more locking in the critical update path. Let's move forward with this. It would be nice to get an idea of what the performance is like though before we merge it.

ryoqun · 2020-03-11T10:33:24Z

I've done minor clean-ups and rebased on the current HEAD of origin/master and measure the before and after as-is:

grafana-testnet-monitor-edge-ryoqun-storage-gc-before.pdf
grafana-testnet-monitor-edge-ryoqun-storage-gc-after-v0.pdf

As guessed, performance is considerably affected.

I'll try some perf improvements.

ryoqun · 2020-03-11T17:52:13Z

@sakridge My performance understanding from the preceding benchmark results in a similar manner to this and this:

Overall, the critical section of snapshot creation got slow (+~1s, +2x)
- Max confirmation time is increased (+1.3x for 99th percentile, +1.8x for max) due to more noticeable stoppage for each snapshot creation, which entails the largest reclaiming
However, the overall TPS isn't changed (=~ +/-1%)
Lock contention is visible on the "Resource Usage" panel again like Do periodic inbound cleaning for rooted slots #8436.

ryoqun · 2020-03-11T18:20:02Z

@sakridge Also, for the record, the pdf creating step is easy as:

./colo.sh create -n 3 -c 1
./init-metrics.sh $(whoami)
./net.sh start
wait 30 min
./net.sh stop
Chromium: Print to PDF with A3 paper size scale with custom 25% and with background colors

* Avoid early clean and bad snapshot by ref-counting * Add measure * Clean ups * clean ups (cherry picked from commit 952cd38)

automerge

ryoqun · 2020-09-27T11:52:23Z

runtime/src/accounts_db.rs

            for (_slot_id, account_info) in account_infos {
-                if *store_counts.get(&account_info.store_id).unwrap() != 0 {
+                if *store_counts.get(&account_info.store_id).unwrap() == 0 {
+                    would_unref_count += 1;


Sadly, this counting was flawed from start....

This doesn't align with this incrementing when there are multiple stores: https://github.com/solana-labs/solana/pull/8724/files#diff-ef68c28d9d63e66355c44904d9877825R125

Fixed in #12462

ryoqun added the work in progress This isn't quite right yet label Mar 9, 2020

ryoqun requested a review from sakridge March 9, 2020 06:07

ryoqun added the v1.0 label Mar 9, 2020

ryoqun commented Mar 9, 2020

View reviewed changes

runtime/src/accounts_db.rs Outdated Show resolved Hide resolved

phongbuigk approved these changes Mar 9, 2020

View reviewed changes

ryoqun commented Mar 9, 2020

View reviewed changes

sakridge reviewed Mar 10, 2020

View reviewed changes

runtime/src/accounts_index.rs Outdated Show resolved Hide resolved

Avoid early clean and bad snapshot by ref-counting

b1fceb3

ryoqun changed the title ~~[wip] Avoid early clean and bad snapshot by ref-counting~~ Avoid early clean and bad snapshot by ref-counting Mar 11, 2020

ryoqun added 3 commits March 11, 2020 18:06

Add measure

f81032f

Clean ups

9d4c74b

clean ups

1697ca5

ryoqun force-pushed the no-storage-gc-based-early-clean branch from 9701385 to 1697ca5 Compare March 11, 2020 09:17

ryoqun marked this pull request as ready for review March 11, 2020 18:17

ryoqun mentioned this pull request Mar 12, 2020

Enable conservative out-of-bound snapshot cleaning #8811

Merged

sakridge approved these changes Mar 13, 2020

View reviewed changes

ryoqun merged commit 952cd38 into solana-labs:master Mar 13, 2020

mergify bot pushed a commit that referenced this pull request Mar 13, 2020

Avoid early clean and bad snapshot by ref-counting (#8724)

0c2f703

* Avoid early clean and bad snapshot by ref-counting * Add measure * Clean ups * clean ups (cherry picked from commit 952cd38)

mergify bot mentioned this pull request Mar 13, 2020

Avoid early clean and bad snapshot by ref-counting (bp #8724) #8831

Merged

solana-grimes pushed a commit that referenced this pull request Mar 13, 2020

Avoid early clean and bad snapshot by ref-counting (#8724) (#8831)

62de02c

automerge

ryoqun mentioned this pull request Mar 29, 2020

calculate ref counts earlier to prevent bad clean #9147

Merged

ryoqun commented Sep 27, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid early clean and bad snapshot by ref-counting #8724

Avoid early clean and bad snapshot by ref-counting #8724

ryoqun commented Mar 9, 2020

codecov bot commented Mar 9, 2020 •

edited

Loading

sakridge commented Mar 9, 2020

ryoqun Mar 9, 2020 •

edited

Loading

sakridge commented Mar 10, 2020

sakridge commented Mar 10, 2020

ryoqun commented Mar 10, 2020

ryoqun commented Mar 10, 2020 •

edited

Loading

ryoqun commented Mar 10, 2020

ryoqun commented Mar 11, 2020

sakridge commented Mar 11, 2020

ryoqun commented Mar 11, 2020 •

edited

Loading

ryoqun commented Mar 11, 2020 •

edited

Loading

ryoqun commented Mar 11, 2020 •

edited

Loading

ryoqun Sep 27, 2020

Avoid early clean and bad snapshot by ref-counting #8724

Avoid early clean and bad snapshot by ref-counting #8724

Conversation

ryoqun commented Mar 9, 2020

Problem

Summary of Changes

codecov bot commented Mar 9, 2020 • edited Loading

Codecov Report

sakridge commented Mar 9, 2020

ryoqun Mar 9, 2020 • edited Loading

Choose a reason for hiding this comment

sakridge commented Mar 10, 2020

sakridge commented Mar 10, 2020

ryoqun commented Mar 10, 2020

ryoqun commented Mar 10, 2020 • edited Loading

ryoqun commented Mar 10, 2020

ryoqun commented Mar 11, 2020

sakridge commented Mar 11, 2020

ryoqun commented Mar 11, 2020 • edited Loading

ryoqun commented Mar 11, 2020 • edited Loading

ryoqun commented Mar 11, 2020 • edited Loading

ryoqun Sep 27, 2020

Choose a reason for hiding this comment

codecov bot commented Mar 9, 2020 •

edited

Loading

ryoqun Mar 9, 2020 •

edited

Loading

ryoqun commented Mar 10, 2020 •

edited

Loading

ryoqun commented Mar 11, 2020 •

edited

Loading

ryoqun commented Mar 11, 2020 •

edited

Loading

ryoqun commented Mar 11, 2020 •

edited

Loading