-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construct the initial bank state from the snapshot/ and accounts file… #28745
Conversation
9df0000
to
55312c2
Compare
@xiangzhu70 Can you fill out the PR description some more please? Both the Problem section, to provide some context (can point to a GH issue as well), and the Summary of Changes section to get a high level of what has changed. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got through a decent chunk of this, but still need look over the meat of snapshot_utils
and bank_fork_utils
.
ledger/src/bank_forks_utils.rs
Outdated
@@ -199,11 +206,9 @@ fn bank_forks_from_snapshot( | |||
process::exit(1); | |||
} | |||
|
|||
let (deserialized_bank, full_snapshot_archive_info, incremental_snapshot_archive_info) = | |||
snapshot_utils::bank_from_latest_snapshot_archives( | |||
let file_ret = if snapshot_config.snapshot_from == SnapshotFrom::File { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these blocks can be cleaned up - it's odd to have several maps from Ok(good_ret) => Some(good_ret)
and makes it slightly more confusing as what's happening here, i.e. if SnapshotFrom::File try loading from files, fallback to loading archives, expect we have a valid bank at the end, regardless of path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the flow is:
if the snapshot_config.snapshot_from is not File, then the first stage result is none.
otherwise, try the file route, which can return valid result or none.
For both the above, if the result is none, do the archive way.
Not sure how I can express this better in Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do something like:
(snapshot_config.snapshot_from == SnapshotFrom::File).map(|| ...).map_err(|err| warn!(...) err).ok().unwrap_or(|| ...);
map_err
is kinda a hack here though, we'd really want inspect_err
which isn't stable yet, unfortunately
@@ -2049,6 +2587,22 @@ pub fn verify_snapshot_archive<P, Q, R>( | |||
assert!(crate::serde_snapshot::compare_two_serialized_banks(&p1, &p2).unwrap()); | |||
std::fs::remove_file(p1).unwrap(); | |||
std::fs::remove_file(p2).unwrap(); | |||
|
|||
// The new the status_cache file is inside the slot directory together with the snapshot file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move the file, won't that break it if we run this verify multiple times? Since it won't bethere the second time.
Could you post in the description how much faster loading via files is? I'd assume the speed up is pretty substantial. |
updated th Now I have updated the PR summary. Thanks! |
e3f1a59
to
8db5357
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few concerns I commented, but overall we're getting close on this I think.
validator_config.snapshot_config.snapshot_from == Some(SnapshotFrom::File); | ||
|
||
if !snapshot_from_file { | ||
download_snapshots( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be out of scope for this PR. If we fail our load from files, we try from the existing snapshot archives, but we don't download new ones. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should consider checking the current snapshot slot. If it's too old, then we download and use the downloaded archive regardless of snapshot_from_file
?
3557e21
to
0329e89
Compare
…s, instead of the archives. Some additional checking is done to ensure the files are in a consistent state. If not, it falls back to using the archives.
7944a05
to
8da0667
Compare
Construct the initial bank state from the bank snapshot in the snapshot/, instead of unarchiving from an archive.
If not, it falls back to using the archives.
Problem
The old way is to from the snapshot archive. Decompressing and unarchiving takes 90s+. This is a big chunk
of the validator startup time.
This new way directly uses the snapshot/ directory, shorten the time to 5s.
Summary of Changes
Build the snapshot// directory to contain a complete state for a bank snapshot. Previously it contains only the bank snapshot file. Now it has the status_cache and the hard-links of all matching accounts/ appendvec files.
With the new argument --snapshot-from-files, when starting up, the validator finds the highest bank snapshot directory, and construct the bank from it. For precaution, if it fails, it will fall back to the original unarchiving route. It really should not fail. Once we have verified this as a solid route, we can remove this argument and make this route as the default.
Testing done:
On a devserver, run without --snapshot-from-file
snapshot untar took 92.4s
with it,
full snapshot deserialize took 5.5s
./target/release/solana-validator --identity ~/validator-keypair.json --vote-account ~/vote-account-keypair.json --rpc-port 8899 --entrypoint entrypoint.devnet.solana.com:8001 --limit-ledger-size --log ~/solana-validator.log --no-poh-speed-test --full-snapshot-interval-slots 200 --maximum-full-snapshots-to-retain 5 --snapshot-from-file
Fixes #