-
Notifications
You must be signed in to change notification settings - Fork 4.5k
(Bank/Snapshot) Add prior_roots to bank fields. #23331
Conversation
Looks like the deserialization is handled here in solana/runtime/src/serde_snapshot/newer.rs Lines 252 to 261 in 2207e49
And |
let mut test_prior_roots = vec![0, 1, 2]; | ||
bank2.prior_roots.append(&mut test_prior_roots); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I created a test prior_roots
, and the field will be verified at line 255 assert!(bank2 == dbank);
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right spot for tests like this. Maybe another test can be added that explicitly is said to test for forwards/ backwards compatibility?
Basically I'm not sure what the right behavior is here. An old snapshot should be usable, but will require the Bank to reconstruct prior_roots
(if that's needed, or is prior_root
always empty when loading from a snapshot?). A new snapshot won't work with an old validator, so that's not something I think we need to test for.
If I can pinpoint what I'm thinking about and trying to articulate, I'll make sure to reply. Thanks for bearing with me 😊
I imagine the abi hash test will fail. That is fun to learn about ;-) |
Looks like
I will dive deeper into it! |
I'll take a look at this PR in-depth soon; just wanted to share this PR of mine where I did just this (to add a new snapshot version): |
Thanks for the information, @brooksprumo. It is indeed more complicated than I thought. Quick question: other than test and util, do we always keep only at most two versions of serde_snapshot? Naming older and newer? When we introduce a new one, we always move the existing "newer" into "older", then start modifying the "newer"? Once the transition is done and we purge the older one? In case the first answer is no, then do we want to rename those "older" to something like "verXXX" or something? |
Updated the PR description. Specifically adds the followings:
|
This comment was marked as spam.
This comment was marked as spam.
@yhchiang-sol Please do not merge based on Ashry00's approval. |
Historically (afaict), the max that we've kept around is two versions. Here are two (big) PRs for the history: PR #9980 and PR #10581. (If folks have older snapshots and need to boot them, then they'll need to run that older version of the validator (or other bin) anyway.) The naming was/is "older" and "newer", yes. In those PRs you'll see comments where initially the files/names were based on the version number, but that was changed since moving snapshot versions (default and supported) should happen relatively quickly (sorry for not getting the specific comment!). And yes, purging "older" happens once we say that older snapshot versions cannot be loaded by the cluster anymore. |
Can you also provide context for this PR? What is prompting the addition/use of prior roots? I'm also curious about the use of |
I think @jeffwashington would be the best person to answer this question :p |
For eliminating rewrites/ancient append vecs as well as a change @carllin is working on, we need the ability to remember slots within that last epoch that WERE roots even if they no longer contain any accounts. Clean and shrink cause us to remove outdated accounts from a slot's appendvec over time. If all accounts are removed from a slot (because they were updated later or became 0 lamport in that slot), then the appendvec is removed and the fact that a root existed at that slot is lost. We need to retain knowledge of slots (within the last epoch) that used to exist but are now gone/missing/ghosts. Bank is possibly not the most intuitive place to persist this. Somewhere in the snapshot, we need to persist this list of slots which now have no non-empty appendvecs but which WERE roots within the last epoch. |
@jeffwashington Thanks! Now I remember you explaining this to me for eliminating rewrites. This might need to be in Bank, since the list of prior roots could change in each bank based on the transactions it processes; or at least I'm not sure how to remove it from Bank without just duplicating the same information somewhere else... @yhchiang-sol Can you add a doc comment ( Also, maybe this Thinking more, I'm wondering if the ordering for this PR (updating the snapshot version) should come after introducing/using |
We might need to distinguish what gets persisted vs what exists in memory. Remember that these are old roots. Meaning all contents have been roots. From that sense, every bank will have the same view on the shared data. Roots today are stored from accounts. The in-memory, runtime data structure will probably be a RootsTracker bit field. The persisted information looks like Vec. We can tweak the exact data structures and who has what later. The real value in this pr to me is that we are working through what it takes to add info like this to the snapshot. If we can persist Vec, we can save and reconstruct correctly. Perhaps we do even better. |
Updated the PR:
|
Updated abi hash. Included code comments for |
Codecov Report
@@ Coverage Diff @@
## master #23331 +/- ##
=========================================
+ Coverage 81.3% 81.4% +0.1%
=========================================
Files 572 576 +4
Lines 155876 156849 +973
=========================================
+ Hits 126815 127794 +979
+ Misses 29061 29055 -6 |
Could we add prior_roots by making them optional in the current snapshot format rather than adding an entirely new snapshot version? This could then be rolled out in two stages:
A new snapshot format entirely is a serious PITA to roll out. Trusted validators need to serve the old format until everybody updates to a software version that supports the new format, so lots of server ops coordination |
Yes. This sounds amazing. Can you please point @yhchiang-sol to an example of this? |
It looks like simply adding the solana/sdk/src/deserialize_utils.rs Line 7 in ddfd4f8
If a vector of length 0 for |
/// longer contain any accounts. Without remembering those prior roots, | ||
/// accounts that we skip rewrites might have their rent collection time | ||
/// point to the incorrect roots as their correct roots were removed. | ||
pub(crate) prior_roots: Vec<Slot>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carllin is this a sufficient data structure to store what you were looking into regarding slashing and such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll require (Slot, Hash)
to be perfectly safe, but I wouldn't worry about that for now if that imposes a big burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash = bank hash at that slot?
Are you confident this is useful info for what you're doing? Do we keep around the bank hashes for past slots during the entire epoch? I can look. Just thought you may know.
I could just store default hashes at the moment or Option with None. So, they'd be present in the format, but not populated.
I think this is the piece we need to get into probably 1.9 asap so it gets propagated. |
Adding it to bank persistence doesn't work as expected because we serialize bank, then we serialize accounts db, so reading a bank field that don't exist causes errors in deserializing accounts db. Thankfully, 'prior_roots' belongs better in accounts_db ANYWAY, so we CAN add the field to the end of what we deserialize/serialize for accounts_db. While we're at it, I may try to figure out how to deal with the expectation that we know the hash of the accounts at that slot at the time we serialize. We would like to calculate the full accounts snapshot hash in the background-background, allowing the flush, clean, shrink loop to run more often in the background instead of being gated by calculating the hash... |
|
Summary of Changes
Prior roots are defined as roots that are no longer roots in the current bank instance.
This PR adds prior_roots to bank fields to allow snapshots carrying prior roots information.
As the PR changes the snapshot format, it also includes the following changes:
newer
snapshot becomesolder
.older
(i.e., the existing version).newer
, which contains the newprior_roots
field.Test Plan
Update
test_bank_serialize_style
to include a test prior_roots.Verify the test will fail if I comment out the forwarding assignment for prior_roots.