-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bank_from_latest_snapshot_dir #31115
bank_from_latest_snapshot_dir #31115
Conversation
There is an open question about handling of incomplete bank snapshot directory caused by process crash, as discussed in #30859 (comment). The options are:
I incline to go with #2. Not sure if it should be in this PR, or yet another PR. |
runtime/src/snapshot_utils.rs
Outdated
let bank_snapshot = get_highest_bank_snapshot(&bank_snapshots_dir).ok_or_else(|| { | ||
SnapshotError::NoSnapshotSlotDir(bank_snapshots_dir.as_ref().to_path_buf()) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally from #30859 (comment)
If
bank_snapshot
here is not complete, thenbank_from_snapshot_dir()
below will fail, right? Would it be better to get the highest complete "bank snapshot post" instead?
Without a valid accounts hash (which only happens in post
), a real validator that boots from a pre
will likely not be able to generate incremental snapshot archives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boots from a pre will likely not be able to generate incremental snapshot archives.
want to understand this. Would that only be the case until after it creates the next full snapshot archive, or indefinitely?
@xiangzhu70
I'm not sure I understand the reasoning in the original PR on why we cannot use the highest POST snapshot here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, it's subtle.
Calculating the accounts hash is a required step in order to produce a snapshot. This is handled in AccountsHashVerifier. When AHV is processing an incremental snapshot package, currently it computes a full/complete accounts hash. Soon that will change to be an incremental accounts hash. To ensure these incremental snapshots are still verifiable, they also will include details from their base/full snapshot—including the base/full accounts hash.
So say we boot from a dir, at slot 125,060, and the next snapshot request is an incremental snapshot at slot 125,100. AHV will calculate an incremental accounts hash for slot 125,100 and then also ask for the full accounts hash from the base slot. Here's where there's a problem.
If we booted from a PRE bank snapshot, then it will not have calculated the accounts hash. So this incremental snapshot will ask for the full accounts hash from slot 125,060, which will be Hash::default()
effectively. The incremental snapshot archive will fail to verify if it is used to boot from later. And the same will be true for all future incremental snapshots until a new full snapshot is generated.
However, if we use the full snapshot archive at slot 125,000, we can get its accounts hash. But... we still need to figure out how to get the account storages files at the time of slot 125,000, otherwise the incremental accounts hash will be calculated based on slot 125,060, and then its incremental snapshot archive won't be pair-able with the full snapshot archive at slot 125,000.
These interactions are all quite complicated, and I think need to be tested out/resolved before we can surface the fastboot to the CLI as a supported flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, should be OK to get the highest post bank snapshot. Further testing can happen independently.
Codecov Report
@@ Coverage Diff @@
## master #31115 +/- ##
=======================================
Coverage 81.5% 81.5%
=======================================
Files 729 729
Lines 206304 206361 +57
=======================================
+ Hits 168261 168329 +68
+ Misses 38043 38032 -11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think this needs to be resolved still:
#31115 (comment)
Here is the PR to remove incomplete snapshot dirs. #31131 |
runtime/src/snapshot_utils.rs
Outdated
let bank_snapshot = get_highest_bank_snapshot(&bank_snapshots_dir).ok_or_else(|| { | ||
SnapshotError::NoSnapshotSlotDir(bank_snapshots_dir.as_ref().to_path_buf()) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boots from a pre will likely not be able to generate incremental snapshot archives.
want to understand this. Would that only be the case until after it creates the next full snapshot archive, or indefinitely?
@xiangzhu70
I'm not sure I understand the reasoning in the original PR on why we cannot use the highest POST snapshot here?
It is now committed. |
Please check if there is any remaining issue on this PR. Thanks! |
runtime/src/snapshot_utils.rs
Outdated
let bank_snapshot = get_highest_bank_snapshot(&bank_snapshots_dir).ok_or_else(|| { | ||
SnapshotError::NoSnapshotSlotDir(bank_snapshots_dir.as_ref().to_path_buf()) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, should be OK to get the highest post bank snapshot. Further testing can happen independently.
…pshot dir" This reverts commit 272bb9f.
33ba962
to
f7d4971
Compare
…om a snapshot dir"" This reverts commit 2e345d8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Great! Thanks! Once all the CI tests pass, I will merge it into the master. |
Sounds good. Yes, please always1 wait for CI to pass before merging. Footnotes
|
* bank_from_latest_snapshot_dir * Clear the account paths before constructing a bank from a snapshot dir * Clear the account paths before constructing a bank from a snapshot dir (solana-labs#31170) * use the highest POST snapshot dir to construct a bank * Use POST to fix test_bank_from_latest_snapshot_dir * Revert "Clear the account paths before constructing a bank from a snapshot dir" This reverts commit 272bb9f. * Remove the set_accounts_hash_for_tests call * Revert "Revert "Clear the account paths before constructing a bank from a snapshot dir"" This reverts commit 2e345d8.
Problem
Part of the project https://github.com/orgs/solana-labs/projects/33/ to boot from a snapshot directory.
This is to get the latest snapshot dir and boot from it.
It is split from #30859 as suggested in its review.
Summary of Changes
Add the function bank_from_latest_snapshot_dir
Add its test function
Fixes #