-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing account state on every slot inefficient and reduces incremental snapshot efficiency #17178
Comments
Related to #17088 |
cc @ryoqun |
yeah, this is kind of waste. but it was easy to address infinite append vec issue and introduce stale acocunt's rent collection and periodic on-chain whole-state check at the time... I think some slot-based reflink mechanism should alleviate the situation while combining older appendvecs. So, we keep to be apparently rewriting the accounts for inclusion to bank hash for on-chain verification. The trick is that the previous updated slot for given account is deterministic across the cluster. So, we could even embed the slot pointer into delta snapshots. Or rather, we just can give up this on-chain state check altogether and completely relying on background accounts hash check? I rather avoid it for the very blockchain existence. |
Yea. At the time it was the right thing. I think we are getting to the point where we need to re-think the compaction strategy though because of the accounts growth and need for incremental snapshots. |
I'll look into this. |
hi, i revisited this tricky problem again. let's recap some things before any call to establish the common understanding, considering recent progress of incremental snapshots and disk-based indexes (great work and progress!). @brooksprumo @jeffwashington (cc: @sakridge ) problems(1): incremental snapshot is space-inefficient (@brooksprumo, actually how bad as of now? and is there target size of incremental snapshot for mainnet-beta? also, dumb question: how often should full snapshot be recreated ideally?) observations(a): vast of accounts aren't touched by txes in an epoch (or more) requirements(i): relieve the mentioned inefficiency problems (wild) possibilities(x): tweak the on-chain rent protocol? my current ideathe below is the latest solution i came up with: more concrete than the previous rough idea (#17178 (comment)) and some spices from @aeyakovenko 's deletion slot idea: https://discord.com/channels/428295358100013066/838890116386521088/874445426777022474 summary: introduce full-snapshot-persisted last-level overlay ordered account map for all of stale (= not touched in last 432k slots) rent-exempt accounts and restrict eager rent collection to non-rent-exempt accounts only. haha, i guess you cannot get it with summary... the basic idea is offload the costly operations from hot path as much as possible. here's the details. firstly, we introduce new kind of data persistence along side AppendVecs, which contains all of stale rent-exempt accounts. this will be implemented as very large single appendvec and will be accompanied ordered disk-based index for its accounts keyed by pubkeys. (let's call this and it works as the last-level backing store for the AccountsDB. so at high level, any read for AccountsDb consults (now unordered) index for appendvec and then ordered index for stalevec. naturally, stalevec will be persisted to full snapshot (not incremental snapshot). lastly, we tweak eager rent collection only to scan over not-rent-exempt accounts. this again will mean we need a third (small) ordered index for the not-rent-exempt accounts for the much scoped new eager rent collection. so, dos against eager rent collection will force the attacker to burn sizable amount of SOLs. also, we incorporate stale account's hash into bank hash by special handling without actually re-writing them (to satisfy requirement ii). note that this can be calculated separately at the start of new bank by a separate thread. (bonus would be to include some epoch-specific seed into it) thus, new apppendvec will only contain updated accounts and rent-collected accounts. in other words, bare minimal; nice for incremental snapshot. lastly any load from the stalevec will cost a little more than appendvec to reflect actual extra system resource usage. misc:
questions and better ideas are greatly welcome. :) |
Ordering requirement is pretty minimal effect on perf, I think. Ordered range requirement does require binning by high bits of pubkey, which is not ideal for attacks. |
To follow up on ordering: |
@ryoqun Not dumb at all! Right now, the default interval for creating full snapshots is every 100,000 slots. The default interval for creating incremental snapshots is 100 slots. Incremental snapshots can range from very good, to fine. Refer to this dashboard (and set the duration to the past 7 days). Immediately after a full snapshot is taken, the incremental snapshots are very small, ~200 MB. On the opposite side, just before the next full snapshot is taken (i.e. when the incremental snapshot is its largest), it's just under 3 GB. For comparison, the full snapshots are now almost 12 GB! Looking at the charts, the incremental snapshot size grows linearly between two full snapshots. I believe this is due to extra accounts being re-stored by rent collection when they otherwise didn't need to be (since they were unchanged). This could potentially further improve the benefit of incremental snapshots. |
@ryoqun Your idea is very interesting! I previously discussed this a bit first with @sakridge, and then later with @jeffwashington. With Stephen, he posed the idea of storing multiple slots within a single AppendVec. Jeff and I took this idea further into creating the idea of AncientAppendVecs, for accounts that are ~2 (or more) epochs old. Your This lead into thinking about flush, and how with the Accounts Cache, flush (and shrink) are the only two places that create new AppendVecs, and the contents are fully known. This could allow flush to create a single AppendVec that could include multiple slots. This has benefits for what Jeff is looking at (and I'll let him talk more about that). I'm trying to get an idea like this working, as a proof of concept. |
@aeyakovenko posted an idea on Discord today that could solve this:
This would help the validators, as now they'll be compensated for all accounts (instead of just rent-paying accounts, which are the minority). This would help the runtime by removing the need to run rent collection by scanning all the accounts every epoch. I.e. Some questions:
|
To follow along from my comment above, I was thinking about how to tie in lazy rent collection (i.e. no scanning of accounts) with incentives to find-and-close accounts that should be rent collected. There are parallels I see to how some DeFi programs that I've heard about do this (from what I remember). Specifically, accounts that should be liquidated/closed are done via a transaction, not by a runtime. For example, lets say Account A should be closed. Other people (ex Account B) can send transactions to query that account to identify it should be closed, then send another transaction to tell the program to close the account. Then Account B would get some portion of the reward for closing Account A. The remaining portion goes to the program (or runtime/validators in our case). Regular users are now incentivized to find and close accounts. These are regular transactions, so the economics of transactions already handles this. A new What remains is how much rewards should be given to the user, and where those rewards come from. If accounts should only be closed-and-collected once they've reached a balance of 0 lamports, there would be no additional rewards to collect. If accounts can be closed below a threshold (i.e. rent exempt minimum), that could work, but traditionally those rewards have gone 100% to the validators; would the validators now get a smaller reward? Maybe creating new accounts should have a fixed upfront cost in addition to rent? This way, validators get the rewards when accounts are created, and users get the rewards when accounts are closed? Obviously still lots of details to nail down, but is this a feasible path to continue investigating? |
A lot of progress has been made on this. It is making its way to completion. |
Problem
Some percentage of the accounts is rewritten on each slot.
solana/runtime/src/bank.rs
Line 3574 in d4ffd90
This limits the number of append-vecs, but it creates more write IO and makes incremental snapshots larger since there are updates to accounts which are not necessary.
Proposed Solution
Don't rewrite on each slot. Be able to combine cross-slot accounts into a single append-vec.
The text was updated successfully, but these errors were encountered: