-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do periodic inbound cleaning for rooted slots #8436
Conversation
Well, I just noticed that when snapshot generation is disabled, this new compaction and existing purge_zero_lamports aren't run never... This could exhaust system resources.. |
runtime/src/accounts_db.rs
Outdated
) { | ||
let mut accounts_index = self.accounts_index.write().unwrap(); | ||
let mut all_pubkeys: HashSet<Pubkey> = HashSet::<Pubkey>::default(); | ||
for root in accounts_index.not_compacted_roots.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole purge_zero_lamport thing just needs to be checked for the delta of newly-rooted slots since last purge, inspired by this #8405. Currently, only this new non-zero lamports account compaction is checked in that manner.
We can move to delta-purge for online processing once we finish out-bound compaction in place #8337. Would that be a good direction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delta purge sounds good. It's a bit challenging because of the cascading situation, an account which is not part of the delta-set could be preventing a purge, so you would have to know to go look to see if that one could be purged as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hhm. Yeah, that's right.. It looks like harder than what I originally thought.
runtime/src/accounts_db.rs
Outdated
} | ||
} | ||
let mut reclaims = Vec::new(); | ||
for pubkey in all_pubkeys { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't measured how much this impacts the overall validator performance yet...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, this incurs slot_vec write locks for the union of all updated accounts since last checked. Effectively this means one more such lock for all account updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea. it would be nice to see how it affects it. It seems the main cost is the accounts scan though, so this might not be too much. I dont have a good idea of how many accounts would be purged this way. So maybe we should test it in a testnet.
@sakridge Could you review this please? I'm wondering there is more better solution or not, especially because of rather high CPU cost of current compaction implementation. |
Codecov Report
@@ Coverage Diff @@
## master #8436 +/- ##
========================================
+ Coverage 80% 80% +<.1%
========================================
Files 256 256
Lines 55594 55728 +134
========================================
+ Hits 44477 44607 +130
- Misses 11117 11121 +4 |
@sakridge I've done some naming clarification and unneeded another term introduction is avoided: |
runtime/src/accounts_db.rs
Outdated
// Purge zero lamport accounts for garbage collection purposes | ||
// Reclaim older states of rooted non-zero lamport accounts as a general | ||
// AccountsDB bloat mitigation and preprocess for better zero-lamport purging. | ||
fn purge_old_normal_accounts(&self, purges: &HashMap<Pubkey, Vec<(Slot, AccountInfo)>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make just one more rename for this function. Maybe something like clean_old_account_roots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
Done: 6c3a696.
I persisted ending with ..._accounts
to avoid confusion with AccountsIndex::clean_dead_slot
.
The former removes accounts and the later does actual slots. I think root
as a noun connotes a (root) slot.
Looks pretty good. I had just one more rename and can we get an idea of the performance impact? |
@sakridge I'll work on. :) My first testnet perf. thing. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for the renames
Pull request has been modified.
A somewhat disappointing yet interesting performance reports, considering this is my first one. :) Findings:
Considering the above, I've introduced a cheap parallelization: 34ab324 @sakridge What do you think about these observations? Although, confirmation time spikes got worsen, I don't think this blocks this PR to be merged, considering we're somewhat forced to address some kind of AccountsDB bloat for security reasons. Or, Ops of production validators are expected to be a round-robin fashion instead of just using a single computer? In that case, online clean isn't a must; we always can rely on offline clean. |
Commenting is TBD (just performance report pdfs, for now) In short, no compelling optimization result is observed.... ;) grafana-testnet-monitor-edge-ryoqun-after-par-v1.pdf |
1.8s latency is pretty scary actually, that's a huge hit. Can we push some things off async? |
Or maybe sample the number of accounts so that maybe we only clean up say 10% of them with each run. |
I see! Ok I'm learning performance sensitivity in Solana... :) Before we finally give up and considering async or sampling, I've done my next round of optimizations with good resutls. grafana-testnet-monitor-edge-ryoqun-after-par-v2.5.pdf (Introduced metrics) Compared to the base commit:
Finally, I think I've come to the compelling optimization result. Is this +200ms latency increase is still a concern? Then, I'll consider further optimization or sampling or asyncs. |
11ddaa1
to
851e530
Compare
* Do periodic inbound compaction for rooted slots * Add comment * nits * Consider not_compacted_roots in cleanup_dead_slot * Renames in AccountsIndex * Rename to reflect expansion of removed accounts * Fix a comment * rename * Parallelize clean over AccountsIndex * Some niceties * Reduce locks and real chunked parallelism * Measure each step for sampling opportunities * Just noticed par iter is maybe lazy * Replace storage scan with optimized index scan * Various clean-ups * Clear uncleared_roots even if no updates (cherry picked from commit d861033) # Conflicts: # ledger/src/snapshot_utils.rs # runtime/src/accounts_db.rs # runtime/src/accounts_index.rs # runtime/src/bank.rs
* Do periodic inbound compaction for rooted slots * Add comment * nits * Consider not_compacted_roots in cleanup_dead_slot * Renames in AccountsIndex * Rename to reflect expansion of removed accounts * Fix a comment * rename * Parallelize clean over AccountsIndex * Some niceties * Reduce locks and real chunked parallelism * Measure each step for sampling opportunities * Just noticed par iter is maybe lazy * Replace storage scan with optimized index scan * Various clean-ups * Clear uncleared_roots even if no updates (cherry picked from commit d861033)
* Do periodic inbound compaction for rooted slots * Add comment * nits * Consider not_compacted_roots in cleanup_dead_slot * Renames in AccountsIndex * Rename to reflect expansion of removed accounts * Fix a comment * rename * Parallelize clean over AccountsIndex * Some niceties * Reduce locks and real chunked parallelism * Measure each step for sampling opportunities * Just noticed par iter is maybe lazy * Replace storage scan with optimized index scan * Various clean-ups * Clear uncleared_roots even if no updates
Phew finally I've finished all the back-porting work! |
Problem
After much investigation, I've realized just removing storage entries with count being 0 nor tweaking the purge logic isn't enough.
The root cause of snapshot bloat is that we leak StorageEntry/AppendVec too easily, first of all. Even just running
solana ping
causes that situation.In this regard, much like the zero-lamport accounts RAM exhaustion DOS attack, this behavior can be easily used to mount a DOS with rather small amount of SOL.
Summary of Changes
We were too lazy to GC older storage entries. Especially, we do literally nothing for any account states while not touched.
So, introduce a bit eager compaction for non-zero lamports accounts at the snapshot generation intervals.
Background
This is one of two PRs towards #8168. This is an inbound compaction. The other (#8337) is an outbound (when loading snapshot) compaction. To really fix enlarged snapshot on TdS, we need to land both.
Part of #8168