Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bank_from_latest_snapshot_dir #31115

Merged

Conversation

xiangzhu70
Copy link
Contributor

@xiangzhu70 xiangzhu70 commented Apr 7, 2023

Problem

Part of the project https://github.com/orgs/solana-labs/projects/33/ to boot from a snapshot directory.
This is to get the latest snapshot dir and boot from it.

It is split from #30859 as suggested in its review.

Summary of Changes

Add the function bank_from_latest_snapshot_dir
Add its test function

Fixes #

@xiangzhu70
Copy link
Contributor Author

There is an open question about handling of incomplete bank snapshot directory caused by process crash, as discussed in #30859 (comment).

The options are:

  1. Do checking in get_highest_bank_snapshot, and filter out incomplete directories;
  2. Clean up incomplete snapshot dir up-front, just like cleaning the account files, at the early booting time, to ensure the starting files are correct, and never need to be worried about.

I incline to go with #2. Not sure if it should be in this PR, or yet another PR.

Comment on lines 1732 to 1739
let bank_snapshot = get_highest_bank_snapshot(&bank_snapshots_dir).ok_or_else(|| {
SnapshotError::NoSnapshotSlotDir(bank_snapshots_dir.as_ref().to_path_buf())
})?;
Copy link
Contributor

@brooksprumo brooksprumo Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally from #30859 (comment)

If bank_snapshot here is not complete, then bank_from_snapshot_dir() below will fail, right? Would it be better to get the highest complete "bank snapshot post" instead?

Without a valid accounts hash (which only happens in post), a real validator that boots from a pre will likely not be able to generate incremental snapshot archives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brooksprumo

boots from a pre will likely not be able to generate incremental snapshot archives.

want to understand this. Would that only be the case until after it creates the next full snapshot archive, or indefinitely?

@xiangzhu70
I'm not sure I understand the reasoning in the original PR on why we cannot use the highest POST snapshot here?

Copy link
Contributor

@brooksprumo brooksprumo Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, it's subtle.

Calculating the accounts hash is a required step in order to produce a snapshot. This is handled in AccountsHashVerifier. When AHV is processing an incremental snapshot package, currently it computes a full/complete accounts hash. Soon that will change to be an incremental accounts hash. To ensure these incremental snapshots are still verifiable, they also will include details from their base/full snapshot—including the base/full accounts hash.

So say we boot from a dir, at slot 125,060, and the next snapshot request is an incremental snapshot at slot 125,100. AHV will calculate an incremental accounts hash for slot 125,100 and then also ask for the full accounts hash from the base slot. Here's where there's a problem.

If we booted from a PRE bank snapshot, then it will not have calculated the accounts hash. So this incremental snapshot will ask for the full accounts hash from slot 125,060, which will be Hash::default() effectively. The incremental snapshot archive will fail to verify if it is used to boot from later. And the same will be true for all future incremental snapshots until a new full snapshot is generated.

However, if we use the full snapshot archive at slot 125,000, we can get its accounts hash. But... we still need to figure out how to get the account storages files at the time of slot 125,000, otherwise the incremental accounts hash will be calculated based on slot 125,060, and then its incremental snapshot archive won't be pair-able with the full snapshot archive at slot 125,000.

These interactions are all quite complicated, and I think need to be tested out/resolved before we can surface the fastboot to the CLI as a supported flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, should be OK to get the highest post bank snapshot. Further testing can happen independently.

@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Merging #31115 (6cc110c) into master (1f67591) will increase coverage by 0.0%.
The diff coverage is 92.9%.

@@           Coverage Diff           @@
##           master   #31115   +/-   ##
=======================================
  Coverage    81.5%    81.5%           
=======================================
  Files         729      729           
  Lines      206304   206361   +57     
=======================================
+ Hits       168261   168329   +68     
+ Misses      38043    38032   -11     

@xiangzhu70 xiangzhu70 marked this pull request as ready for review April 8, 2023 00:59
Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think this needs to be resolved still:
#31115 (comment)

@xiangzhu70
Copy link
Contributor Author

xiangzhu70 commented Apr 10, 2023

Looks good. I think this needs to be resolved still: #31115 (comment)

Here is the PR to remove incomplete snapshot dirs. #31131

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
Comment on lines 1732 to 1739
let bank_snapshot = get_highest_bank_snapshot(&bank_snapshots_dir).ok_or_else(|| {
SnapshotError::NoSnapshotSlotDir(bank_snapshots_dir.as_ref().to_path_buf())
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brooksprumo

boots from a pre will likely not be able to generate incremental snapshot archives.

want to understand this. Would that only be the case until after it creates the next full snapshot archive, or indefinitely?

@xiangzhu70
I'm not sure I understand the reasoning in the original PR on why we cannot use the highest POST snapshot here?

@xiangzhu70
Copy link
Contributor Author

Looks good. I think this needs to be resolved still: #31115 (comment)

Here is the PR to remove incomplete snapshot dirs. #31131

It is now committed.

@xiangzhu70
Copy link
Contributor Author

Please check if there is any remaining issue on this PR. Thanks!

runtime/src/snapshot_utils.rs Outdated Show resolved Hide resolved
Comment on lines 1732 to 1739
let bank_snapshot = get_highest_bank_snapshot(&bank_snapshots_dir).ok_or_else(|| {
SnapshotError::NoSnapshotSlotDir(bank_snapshots_dir.as_ref().to_path_buf())
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, should be OK to get the highest post bank snapshot. Further testing can happen independently.

@xiangzhu70 xiangzhu70 force-pushed the bank_from_latest_snapshot_dir branch from 33ba962 to f7d4971 Compare April 13, 2023 17:48
Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@xiangzhu70
Copy link
Contributor Author

lgtm!

Great! Thanks! Once all the CI tests pass, I will merge it into the master.

@brooksprumo
Copy link
Contributor

Once all the CI tests pass, I will merge it into the master.

Sounds good. Yes, please always1 wait for CI to pass before merging.

Footnotes

  1. There are exceptions for this, like if the master build is broken and the PR is fixing it. In that case the CI is already broken, so merging early doesn't hurt anything.

@xiangzhu70 xiangzhu70 merged commit 9f53c42 into solana-labs:master Apr 13, 2023
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Apr 14, 2023
* bank_from_latest_snapshot_dir

* Clear the account paths before constructing a bank from a snapshot dir

* Clear the account paths before constructing a bank from a snapshot dir (solana-labs#31170)

* use the highest POST snapshot dir to construct a bank

* Use POST to fix test_bank_from_latest_snapshot_dir

* Revert "Clear the account paths before constructing a bank from a snapshot dir"

This reverts commit 272bb9f.

* Remove the set_accounts_hash_for_tests call

* Revert "Revert "Clear the account paths before constructing a bank from a snapshot dir""

This reverts commit 2e345d8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants