Do periodic inbound cleaning for rooted slots #8436

ryoqun · 2020-02-25T09:09:59Z

Problem

After much investigation, I've realized just removing storage entries with count being 0 nor tweaking the purge logic isn't enough.

The root cause of snapshot bloat is that we leak StorageEntry/AppendVec too easily, first of all. Even just running solana ping causes that situation.

In this regard, much like the zero-lamport accounts RAM exhaustion DOS attack, this behavior can be easily used to mount a DOS with rather small amount of SOL.

Summary of Changes

We were too lazy to GC older storage entries. Especially, we do literally nothing for any account states while not touched.

So, introduce a bit eager compaction for non-zero lamports accounts at the snapshot generation intervals.

Background

This is one of two PRs towards #8168. This is an inbound compaction. The other (#8337) is an outbound (when loading snapshot) compaction. To really fix enlarged snapshot on TdS, we need to land both.

Part of #8168

ryoqun · 2020-02-25T09:12:59Z

Well, I just noticed that when snapshot generation is disabled, this new compaction and existing purge_zero_lamports aren't run never... This could exhaust system resources..

ryoqun · 2020-02-25T09:18:54Z

runtime/src/accounts_db.rs

+    ) {
+        let mut accounts_index = self.accounts_index.write().unwrap();
+        let mut all_pubkeys: HashSet<Pubkey> = HashSet::<Pubkey>::default();
+        for root in accounts_index.not_compacted_roots.iter() {


I think the whole purge_zero_lamport thing just needs to be checked for the delta of newly-rooted slots since last purge, inspired by this #8405. Currently, only this new non-zero lamports account compaction is checked in that manner.

We can move to delta-purge for online processing once we finish out-bound compaction in place #8337. Would that be a good direction?

delta purge sounds good. It's a bit challenging because of the cascading situation, an account which is not part of the delta-set could be preventing a purge, so you would have to know to go look to see if that one could be purged as well.

Hhm. Yeah, that's right.. It looks like harder than what I originally thought.

ryoqun · 2020-02-25T09:19:44Z

runtime/src/accounts_db.rs

+            }
+        }
+        let mut reclaims = Vec::new();
+        for pubkey in all_pubkeys {


I haven't measured how much this impacts the overall validator performance yet...

Basically, this incurs slot_vec write locks for the union of all updated accounts since last checked. Effectively this means one more such lock for all account updates.

Yea. it would be nice to see how it affects it. It seems the main cost is the accounts scan though, so this might not be too much. I dont have a good idea of how many accounts would be purged this way. So maybe we should test it in a testnet.

ryoqun · 2020-02-25T09:33:14Z

@sakridge Could you review this please? I'm wondering there is more better solution or not, especially because of rather high CPU cost of current compaction implementation.

codecov · 2020-02-25T10:14:24Z

Codecov Report

Merging #8436 into master will increase coverage by <.1%.
The diff coverage is 98.8%.

@@           Coverage Diff            @@
##           master   #8436     +/-   ##
========================================
+ Coverage      80%     80%   +<.1%     
========================================
  Files         256     256             
  Lines       55594   55728    +134     
========================================
+ Hits        44477   44607    +130     
- Misses      11117   11121      +4

runtime/src/accounts_db.rs

ryoqun · 2020-02-26T14:01:28Z

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

sakridge · 2020-02-26T16:45:38Z

runtime/src/accounts_db.rs

-    // Purge zero lamport accounts for garbage collection purposes
+    // Reclaim older states of rooted non-zero lamport accounts as a general
+    // AccountsDB bloat mitigation and preprocess for better zero-lamport purging.
+    fn purge_old_normal_accounts(&self, purges: &HashMap<Pubkey, Vec<(Slot, AccountInfo)>>) {


Can we make just one more rename for this function. Maybe something like clean_old_account_roots

Nice catch!

Done: 6c3a696.

I persisted ending with ..._accounts to avoid confusion with AccountsIndex::clean_dead_slot.

The former removes accounts and the later does actual slots. I think root as a noun connotes a (root) slot.

sakridge · 2020-02-26T16:46:38Z

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

Looks pretty good. I had just one more rename and can we get an idea of the performance impact?

ryoqun · 2020-02-26T17:09:27Z

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

Looks pretty good. I had just one more rename and can we get an idea of the performance impact?

@sakridge I'll work on. :) My first testnet perf. thing. :)

sakridge

lgtm, thanks for the renames

Pull request has been modified.

ryoqun · 2020-02-27T09:18:27Z

A somewhat disappointing yet interesting performance reports, considering this is my first one. :)
grafana-testnet-monitor-edge-ryoqun-before.pdf
grafana-testnet-monitor-edge-ryoqun-after.pdf

Findings:

Overall, the critical section of snapshot creation got slow (+~1.8s, +3x).
- Max confirmation time is increased (+1.5x for 99th percentile, +2x for max) due to more noticeable stoppage for each snapshot creation
Clean up is working (AppendVes are ~15% fewer)
However, the overall TPS is increased (+~10%)
- I guess this is due to reduced workset.
Lock contention is visible on the "Resource Usage" panel.

Considering the above, I've introduced a cheap parallelization: 34ab324

@sakridge What do you think about these observations? Although, confirmation time spikes got worsen, I don't think this blocks this PR to be merged, considering we're somewhat forced to address some kind of AccountsDB bloat for security reasons. Or, Ops of production validators are expected to be a round-robin fashion instead of just using a single computer? In that case, online clean isn't a must; we always can rely on offline clean.

ryoqun · 2020-02-27T12:44:18Z

Commenting is TBD (just performance report pdfs, for now)

In short, no compelling optimization result is observed.... ;)

grafana-testnet-monitor-edge-ryoqun-after-par-v1.pdf
grafana-testnet-monitor-edge-ryoqun-after-par-v2.pdf

sakridge · 2020-02-27T17:28:45Z

A somewhat disappointing yet interesting performance reports, considering this is my first one. :)
grafana-testnet-monitor-edge-ryoqun-before.pdf
grafana-testnet-monitor-edge-ryoqun-after.pdf

Findings:

Overall, the critical section of snapshot creation got slow (+~1.8s, +3x).

Max confirmation time is increased (+1.5x for 99th percentile, +2x for max) due to more noticeable stoppage for each snapshot creation

Clean up is working (AppendVes are ~15% fewer)

However, the overall TPS is increased (+~10%)

I guess this is due to reduced workset.

Lock contention is visible on the "Resource Usage" panel.

Considering the above, I've introduced a cheap parallelization: 34ab324

@sakridge What do you think about these observations? Although, confirmation time spikes got worsen, I don't think this blocks this PR to be merged, considering we're somewhat forced to address some kind of AccountsDB bloat for security reasons. Or, Ops of production validators are expected to be a round-robin fashion instead of just using a single computer? In that case, online clean isn't a must; we always can rely on offline clean.

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

sakridge · 2020-02-27T17:30:54Z

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

Or maybe sample the number of accounts so that maybe we only clean up say 10% of them with each run.

ryoqun · 2020-02-28T07:41:09Z

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

I see! Ok I'm learning performance sensitivity in Solana... :)

Before we finally give up and considering async or sampling, I've done my next round of optimizations with good resutls.

grafana-testnet-monitor-edge-ryoqun-after-par-v2.5.pdf (Introduced metrics)
grafana-testnet-monitor-edge-ryoqun-after-par-v3.pdf (Removed the storage scan step)

Compared to the base commit:

Overall, the critical section of snapshot creation got slow a bit (+~200ms, +1.2x).
- Max confirmation time is negligibly increased (+3% for 99th percentile, +15% for max) due to somewhat increased stoppage for each snapshot creation
Cleaning should be working but not observable (This is a bit odd)
The overall TPS isn't changed (=~ +/-1%)
- I guess this is due to no change to the count of AppendVecs
Lock contention isn't visible anymore on the "Resource Usage" panel.

Finally, I think I've come to the compelling optimization result. Is this +200ms latency increase is still a concern? Then, I'll consider further optimization or sampling or asyncs.

runtime/src/accounts_db.rs

Pull request has been modified.

* Do periodic inbound compaction for rooted slots * Add comment * nits * Consider not_compacted_roots in cleanup_dead_slot * Renames in AccountsIndex * Rename to reflect expansion of removed accounts * Fix a comment * rename * Parallelize clean over AccountsIndex * Some niceties * Reduce locks and real chunked parallelism * Measure each step for sampling opportunities * Just noticed par iter is maybe lazy * Replace storage scan with optimized index scan * Various clean-ups * Clear uncleared_roots even if no updates (cherry picked from commit d861033) # Conflicts: # ledger/src/snapshot_utils.rs # runtime/src/accounts_db.rs # runtime/src/accounts_index.rs # runtime/src/bank.rs

* Do periodic inbound compaction for rooted slots * Add comment * nits * Consider not_compacted_roots in cleanup_dead_slot * Renames in AccountsIndex * Rename to reflect expansion of removed accounts * Fix a comment * rename * Parallelize clean over AccountsIndex * Some niceties * Reduce locks and real chunked parallelism * Measure each step for sampling opportunities * Just noticed par iter is maybe lazy * Replace storage scan with optimized index scan * Various clean-ups * Clear uncleared_roots even if no updates (cherry picked from commit d861033)

automerge

* Do periodic inbound compaction for rooted slots * Add comment * nits * Consider not_compacted_roots in cleanup_dead_slot * Renames in AccountsIndex * Rename to reflect expansion of removed accounts * Fix a comment * rename * Parallelize clean over AccountsIndex * Some niceties * Reduce locks and real chunked parallelism * Measure each step for sampling opportunities * Just noticed par iter is maybe lazy * Replace storage scan with optimized index scan * Various clean-ups * Clear uncleared_roots even if no updates

automerge

ryoqun · 2020-03-03T11:17:47Z

Phew finally I've finished all the back-porting work!

ryoqun added the security Pull requests that address a security vulnerability label Feb 25, 2020

ryoqun changed the title ~~Inbound compaction~~ Do periodic inbound compaction for rooted slots Feb 25, 2020

ryoqun requested a review from sakridge February 25, 2020 09:13

ryoqun commented Feb 25, 2020

View reviewed changes

ryoqun added v0.23 labels Feb 25, 2020

This was referenced Feb 25, 2020

snapshots security may need to be improved #7167

Closed

Keep track of zero-lamport stores and skip purge if there are none #8405

Closed

ryoqun commented Feb 25, 2020

View reviewed changes

runtime/src/accounts_db.rs Outdated Show resolved Hide resolved

sakridge reviewed Feb 25, 2020

View reviewed changes

runtime/src/accounts_db.rs Outdated Show resolved Hide resolved

sakridge reviewed Feb 26, 2020

View reviewed changes

sakridge previously approved these changes Feb 26, 2020

View reviewed changes

ryoqun changed the title ~~Do periodic inbound compaction for rooted slots~~ Do periodic inbound cleaning for rooted slots Feb 26, 2020

ryoqun mentioned this pull request Feb 27, 2020

Improve net/README.md a bit #8503

Merged

ryoqun requested a review from sakridge February 28, 2020 07:44

ryoqun commented Feb 28, 2020

View reviewed changes

runtime/src/accounts_db.rs Outdated Show resolved Hide resolved

ryoqun commented Feb 28, 2020

View reviewed changes

runtime/src/accounts_db.rs Outdated Show resolved Hide resolved

ryoqun added 11 commits March 3, 2020 11:53

Rename to reflect expansion of removed accounts

7f97021

Fix a comment

64dc5eb

rename

4c39fb6

Parallelize clean over AccountsIndex

6ad70b1

Some niceties

57fba8f

Reduce locks and real chunked parallelism

17e87b3

Measure each step for sampling opportunities

258b473

Just noticed par iter is maybe lazy

61ef04a

Replace storage scan with optimized index scan

aff5b19

Various clean-ups

2583b50

Clear uncleared_roots even if no updates

851e530

ryoqun force-pushed the inbound-compaction branch from 11ddaa1 to 851e530 Compare March 3, 2020 03:02

ryoqun merged commit d861033 into solana-labs:master Mar 3, 2020

mergify bot mentioned this pull request Mar 3, 2020

Do periodic inbound cleaning for rooted slots (bp #8436) #8582

Merged

mergify bot mentioned this pull request Mar 3, 2020

Do periodic inbound cleaning for rooted slots (bp #8436) #8583

Merged

This was referenced Mar 3, 2020

Remove needless last_root for better reclaims #8148

Merged

Hack to skip cleanup_dead_slots upon snapshot load #8561

Merged

solana-grimes pushed a commit that referenced this pull request Mar 3, 2020

Do periodic inbound cleaning for rooted slots (#8436) (#8583)

f4385f7

automerge

ryoqun added automerge Merge this Pull Request automatically once CI passes and removed automerge Merge this Pull Request automatically once CI passes labels Mar 3, 2020

solana-grimes pushed a commit that referenced this pull request Mar 3, 2020

Do periodic inbound cleaning for rooted slots (#8436) (#8582)

86f76db

automerge

This was referenced Mar 3, 2020

Remove accounts hack and correctly restore accounts store counts #8569

Merged

Avoid early clean and bad snapshot by ref-counting #8724

Merged

ryoqun mentioned this pull request Oct 1, 2020

Fix zero-lamport accounts preventing slot cleanup #12606

Merged

ryoqun mentioned this pull request Oct 16, 2020

Break up AccountsIndex lock #12787

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do periodic inbound cleaning for rooted slots #8436

Do periodic inbound cleaning for rooted slots #8436

ryoqun commented Feb 25, 2020

ryoqun commented Feb 25, 2020

ryoqun Feb 25, 2020

sakridge Feb 25, 2020

ryoqun Feb 26, 2020

ryoqun Feb 25, 2020

ryoqun Feb 25, 2020

sakridge Feb 25, 2020

ryoqun commented Feb 25, 2020

codecov bot commented Feb 25, 2020 •

edited

Loading

ryoqun commented Feb 26, 2020

sakridge Feb 26, 2020

ryoqun Feb 26, 2020

sakridge commented Feb 26, 2020

ryoqun commented Feb 26, 2020

sakridge left a comment

ryoqun commented Feb 27, 2020 •

edited

Loading

ryoqun commented Feb 27, 2020 •

edited

Loading

sakridge commented Feb 27, 2020

sakridge commented Feb 27, 2020

ryoqun commented Feb 28, 2020

ryoqun commented Mar 3, 2020

Do periodic inbound cleaning for rooted slots #8436

Do periodic inbound cleaning for rooted slots #8436

Conversation

ryoqun commented Feb 25, 2020

Problem

Summary of Changes

Background

ryoqun commented Feb 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Feb 25, 2020

codecov bot commented Feb 25, 2020 • edited Loading

Codecov Report

ryoqun commented Feb 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sakridge commented Feb 26, 2020

ryoqun commented Feb 26, 2020

sakridge left a comment

Choose a reason for hiding this comment

ryoqun commented Feb 27, 2020 • edited Loading

ryoqun commented Feb 27, 2020 • edited Loading

sakridge commented Feb 27, 2020

sakridge commented Feb 27, 2020

ryoqun commented Feb 28, 2020

ryoqun commented Mar 3, 2020

codecov bot commented Feb 25, 2020 •

edited

Loading

ryoqun commented Feb 27, 2020 •

edited

Loading

ryoqun commented Feb 27, 2020 •

edited

Loading