Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do periodic inbound cleaning for rooted slots #8436

Merged
merged 16 commits into from
Mar 3, 2020

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Feb 25, 2020

Problem

After much investigation, I've realized just removing storage entries with count being 0 nor tweaking the purge logic isn't enough.

The root cause of snapshot bloat is that we leak StorageEntry/AppendVec too easily, first of all. Even just running solana ping causes that situation.

In this regard, much like the zero-lamport accounts RAM exhaustion DOS attack, this behavior can be easily used to mount a DOS with rather small amount of SOL.

Summary of Changes

We were too lazy to GC older storage entries. Especially, we do literally nothing for any account states while not touched.

So, introduce a bit eager compaction for non-zero lamports accounts at the snapshot generation intervals.

Background

This is one of two PRs towards #8168. This is an inbound compaction. The other (#8337) is an outbound (when loading snapshot) compaction. To really fix enlarged snapshot on TdS, we need to land both.

Part of #8168

@ryoqun ryoqun added the security Pull requests that address a security vulnerability label Feb 25, 2020
@ryoqun ryoqun changed the title Inbound compaction Do periodic inbound compaction for rooted slots Feb 25, 2020
@ryoqun
Copy link
Member Author

ryoqun commented Feb 25, 2020

Well, I just noticed that when snapshot generation is disabled, this new compaction and existing purge_zero_lamports aren't run never... This could exhaust system resources..

@ryoqun ryoqun requested a review from sakridge February 25, 2020 09:13
) {
let mut accounts_index = self.accounts_index.write().unwrap();
let mut all_pubkeys: HashSet<Pubkey> = HashSet::<Pubkey>::default();
for root in accounts_index.not_compacted_roots.iter() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole purge_zero_lamport thing just needs to be checked for the delta of newly-rooted slots since last purge, inspired by this #8405. Currently, only this new non-zero lamports account compaction is checked in that manner.

We can move to delta-purge for online processing once we finish out-bound compaction in place #8337. Would that be a good direction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delta purge sounds good. It's a bit challenging because of the cascading situation, an account which is not part of the delta-set could be preventing a purge, so you would have to know to go look to see if that one could be purged as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhm. Yeah, that's right.. It looks like harder than what I originally thought.

}
}
let mut reclaims = Vec::new();
for pubkey in all_pubkeys {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't measured how much this impacts the overall validator performance yet...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this incurs slot_vec write locks for the union of all updated accounts since last checked. Effectively this means one more such lock for all account updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. it would be nice to see how it affects it. It seems the main cost is the accounts scan though, so this might not be too much. I dont have a good idea of how many accounts would be purged this way. So maybe we should test it in a testnet.

@ryoqun
Copy link
Member Author

ryoqun commented Feb 25, 2020

@sakridge Could you review this please? I'm wondering there is more better solution or not, especially because of rather high CPU cost of current compaction implementation.

@codecov
Copy link

codecov bot commented Feb 25, 2020

Codecov Report

Merging #8436 into master will increase coverage by <.1%.
The diff coverage is 98.8%.

@@           Coverage Diff            @@
##           master   #8436     +/-   ##
========================================
+ Coverage      80%     80%   +<.1%     
========================================
  Files         256     256             
  Lines       55594   55728    +134     
========================================
+ Hits        44477   44607    +130     
- Misses      11117   11121      +4

runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
@ryoqun
Copy link
Member Author

ryoqun commented Feb 26, 2020

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

// Purge zero lamport accounts for garbage collection purposes
// Reclaim older states of rooted non-zero lamport accounts as a general
// AccountsDB bloat mitigation and preprocess for better zero-lamport purging.
fn purge_old_normal_accounts(&self, purges: &HashMap<Pubkey, Vec<(Slot, AccountInfo)>>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make just one more rename for this function. Maybe something like clean_old_account_roots

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Done: 6c3a696.

I persisted ending with ..._accounts to avoid confusion with AccountsIndex::clean_dead_slot.

The former removes accounts and the later does actual slots. I think root as a noun connotes a (root) slot.

@sakridge
Copy link
Member

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

Looks pretty good. I had just one more rename and can we get an idea of the performance impact?

@ryoqun
Copy link
Member Author

ryoqun commented Feb 26, 2020

@sakridge I've done some naming clarification and unneeded another term introduction is avoided: compaction is gone. How does this PR look now? :)

Looks pretty good. I had just one more rename and can we get an idea of the performance impact?

@sakridge I'll work on. :) My first testnet perf. thing. :)

sakridge
sakridge previously approved these changes Feb 26, 2020
Copy link
Member

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the renames

@ryoqun ryoqun changed the title Do periodic inbound compaction for rooted slots Do periodic inbound cleaning for rooted slots Feb 26, 2020
@mergify mergify bot dismissed sakridge’s stale review February 27, 2020 09:11

Pull request has been modified.

@ryoqun
Copy link
Member Author

ryoqun commented Feb 27, 2020

A somewhat disappointing yet interesting performance reports, considering this is my first one. :)
grafana-testnet-monitor-edge-ryoqun-before.pdf
grafana-testnet-monitor-edge-ryoqun-after.pdf

Findings:

  • Overall, the critical section of snapshot creation got slow (+~1.8s, +3x).
    • Max confirmation time is increased (+1.5x for 99th percentile, +2x for max) due to more noticeable stoppage for each snapshot creation
  • Clean up is working (AppendVes are ~15% fewer)
  • However, the overall TPS is increased (+~10%)
    • I guess this is due to reduced workset.
  • Lock contention is visible on the "Resource Usage" panel.

Considering the above, I've introduced a cheap parallelization: 34ab324

@sakridge What do you think about these observations? Although, confirmation time spikes got worsen, I don't think this blocks this PR to be merged, considering we're somewhat forced to address some kind of AccountsDB bloat for security reasons. Or, Ops of production validators are expected to be a round-robin fashion instead of just using a single computer? In that case, online clean isn't a must; we always can rely on offline clean.

@ryoqun
Copy link
Member Author

ryoqun commented Feb 27, 2020

Commenting is TBD (just performance report pdfs, for now)

In short, no compelling optimization result is observed.... ;)

grafana-testnet-monitor-edge-ryoqun-after-par-v1.pdf
grafana-testnet-monitor-edge-ryoqun-after-par-v2.pdf

@sakridge
Copy link
Member

A somewhat disappointing yet interesting performance reports, considering this is my first one. :)
grafana-testnet-monitor-edge-ryoqun-before.pdf
grafana-testnet-monitor-edge-ryoqun-after.pdf

Findings:

  • Overall, the critical section of snapshot creation got slow (+~1.8s, +3x).

    • Max confirmation time is increased (+1.5x for 99th percentile, +2x for max) due to more noticeable stoppage for each snapshot creation
  • Clean up is working (AppendVes are ~15% fewer)

  • However, the overall TPS is increased (+~10%)

    • I guess this is due to reduced workset.
  • Lock contention is visible on the "Resource Usage" panel.

Considering the above, I've introduced a cheap parallelization: 34ab324

@sakridge What do you think about these observations? Although, confirmation time spikes got worsen, I don't think this blocks this PR to be merged, considering we're somewhat forced to address some kind of AccountsDB bloat for security reasons. Or, Ops of production validators are expected to be a round-robin fashion instead of just using a single computer? In that case, online clean isn't a must; we always can rely on offline clean.

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

@sakridge
Copy link
Member

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

Or maybe sample the number of accounts so that maybe we only clean up say 10% of them with each run.

@ryoqun
Copy link
Member Author

ryoqun commented Feb 28, 2020

1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async?

I see! Ok I'm learning performance sensitivity in Solana... :)

Before we finally give up and considering async or sampling, I've done my next round of optimizations with good resutls.

grafana-testnet-monitor-edge-ryoqun-after-par-v2.5.pdf (Introduced metrics)
grafana-testnet-monitor-edge-ryoqun-after-par-v3.pdf (Removed the storage scan step)

Compared to the base commit:

  • Overall, the critical section of snapshot creation got slow a bit (+~200ms, +1.2x).
    • Max confirmation time is negligibly increased (+3% for 99th percentile, +15% for max) due to somewhat increased stoppage for each snapshot creation
  • Cleaning should be working but not observable (This is a bit odd)
  • The overall TPS isn't changed (=~ +/-1%)
    • I guess this is due to no change to the count of AppendVecs
  • Lock contention isn't visible anymore on the "Resource Usage" panel.

Finally, I think I've come to the compelling optimization result. Is this +200ms latency increase is still a concern? Then, I'll consider further optimization or sampling or asyncs.

@ryoqun ryoqun requested a review from sakridge February 28, 2020 07:44
runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
@mergify mergify bot dismissed sakridge’s stale review March 3, 2020 03:03

Pull request has been modified.

@ryoqun ryoqun merged commit d861033 into solana-labs:master Mar 3, 2020
mergify bot pushed a commit that referenced this pull request Mar 3, 2020
* Do periodic inbound compaction for rooted slots

* Add comment

* nits

* Consider not_compacted_roots in cleanup_dead_slot

* Renames in AccountsIndex

* Rename to reflect expansion of removed accounts

* Fix a comment

* rename

* Parallelize clean over AccountsIndex

* Some niceties

* Reduce locks and real chunked parallelism

* Measure each step for sampling opportunities

* Just noticed par iter is maybe lazy

* Replace storage scan with optimized index scan

* Various clean-ups

* Clear uncleared_roots even if no updates

(cherry picked from commit d861033)

# Conflicts:
#	ledger/src/snapshot_utils.rs
#	runtime/src/accounts_db.rs
#	runtime/src/accounts_index.rs
#	runtime/src/bank.rs
mergify bot pushed a commit that referenced this pull request Mar 3, 2020
* Do periodic inbound compaction for rooted slots

* Add comment

* nits

* Consider not_compacted_roots in cleanup_dead_slot

* Renames in AccountsIndex

* Rename to reflect expansion of removed accounts

* Fix a comment

* rename

* Parallelize clean over AccountsIndex

* Some niceties

* Reduce locks and real chunked parallelism

* Measure each step for sampling opportunities

* Just noticed par iter is maybe lazy

* Replace storage scan with optimized index scan

* Various clean-ups

* Clear uncleared_roots even if no updates

(cherry picked from commit d861033)
solana-grimes pushed a commit that referenced this pull request Mar 3, 2020
ryoqun added a commit that referenced this pull request Mar 3, 2020
* Do periodic inbound compaction for rooted slots

* Add comment

* nits

* Consider not_compacted_roots in cleanup_dead_slot

* Renames in AccountsIndex

* Rename to reflect expansion of removed accounts

* Fix a comment

* rename

* Parallelize clean over AccountsIndex

* Some niceties

* Reduce locks and real chunked parallelism

* Measure each step for sampling opportunities

* Just noticed par iter is maybe lazy

* Replace storage scan with optimized index scan

* Various clean-ups

* Clear uncleared_roots even if no updates
@ryoqun ryoqun added automerge Merge this Pull Request automatically once CI passes and removed automerge Merge this Pull Request automatically once CI passes labels Mar 3, 2020
solana-grimes pushed a commit that referenced this pull request Mar 3, 2020
@ryoqun
Copy link
Member Author

ryoqun commented Mar 3, 2020

Phew finally I've finished all the back-porting work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security Pull requests that address a security vulnerability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants