-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checks when constructing a BankSnapshotInfo from a directory #30373
Add checks when constructing a BankSnapshotInfo from a directory #30373
Conversation
b4328b9
to
49620ee
Compare
For fastboot, we'll always need the complete accounts state in order to actually boot from that bank snapshot later. I'm not sure I understand the problem that's being solved here; |
In #30171 I follow the archiving path to get a full snapshot and an incremental snapshot. This is part of that PR. I thought it might be better that I split these parts into a small PR. |
runtime/src/snapshot_utils.rs
Outdated
@@ -162,31 +164,52 @@ impl BankSnapshotInfo { | |||
pub fn new_from_dir( | |||
bank_snapshots_dir: impl AsRef<Path>, | |||
slot: Slot, | |||
) -> Option<BankSnapshotInfo> { | |||
) -> Result<BankSnapshotInfo> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this being a Result
much more than an Option
!
runtime/src/snapshot_utils.rs
Outdated
snapshot_dir: bank_snapshot_dir, | ||
}); | ||
let status_cache_file = bank_snapshot_dir.join(SNAPSHOT_STATUS_CACHE_FILENAME); | ||
if fs::metadata(status_cache_file).is_err() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be more explicit with our check here? The status_cache_file
should specifically be a file, not just exist (potentially as a dir), right?
if fs::metadata(status_cache_file).is_err() { | |
if !status_cache_file.is_file() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
runtime/src/snapshot_utils.rs
Outdated
}); | ||
let status_cache_file = bank_snapshot_dir.join(SNAPSHOT_STATUS_CACHE_FILENAME); | ||
if fs::metadata(status_cache_file).is_err() { | ||
return Err(SnapshotError::MissingStatusCacheFile(bank_snapshot_dir)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error should be the status cache file path I think:
return Err(SnapshotError::MissingStatusCacheFile(bank_snapshot_dir)); | |
return Err(SnapshotError::MissingStatusCacheFile(status_cache_file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
runtime/src/snapshot_utils.rs
Outdated
let version_str = snapshot_version_from_file(version_path)?; | ||
let snapshot_version = SnapshotVersion::from_str(version_str.as_str()) | ||
.or(Err(SnapshotError::InvalidVersion))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope:
Seems like these are reasonable to replace the snapshot_version_from_file
. Do we ever actually need the raw string except for (de)serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could combine the actions to a function SnapshotVesion::from_file()
runtime/src/snapshot_utils.rs
Outdated
// There is a time window from the slot directory being created, and the content being completely | ||
// filled. Check the completion to avoid using a highest found slot directory with missing content. | ||
let completion_flag_file = bank_snapshot_dir.join(SNAPSHOT_STATE_COMPLETE_FILENAME); | ||
if fs::metadata(completion_flag_file).is_err() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, I think we should just be as explicit as possible that completion_flag_file
is a file:
if fs::metadata(completion_flag_file).is_err() { | |
if !completion_flag_file.is_file() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
runtime/src/snapshot_utils.rs
Outdated
let snapshot_type: Option<BankSnapshotType> = if bank_snapshot_pre_path.is_file() { | ||
Some(BankSnapshotType::Pre) | ||
} else if bank_snapshot_post_path.is_file() { | ||
Some(BankSnapshotType::Post) | ||
} else { | ||
None | ||
}; | ||
let snapshot_type = snapshot_type | ||
.ok_or_else(|| SnapshotError::MissingSnapshotFile(bank_snapshot_dir.clone()))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can make this a bit more direct:
let snapshot_type: Option<BankSnapshotType> = if bank_snapshot_pre_path.is_file() { | |
Some(BankSnapshotType::Pre) | |
} else if bank_snapshot_post_path.is_file() { | |
Some(BankSnapshotType::Post) | |
} else { | |
None | |
}; | |
let snapshot_type = snapshot_type | |
.ok_or_else(|| SnapshotError::MissingSnapshotFile(bank_snapshot_dir.clone()))?; | |
let snapshot_type = if bank_snapshot_pre_path.is_file() { | |
BankSnapshotType::Pre | |
} else if bank_snapshot_post_path.is_file() { | |
BankSnapshotType::Post | |
} else { | |
return Err(SnapshotError::MissingSnapshotFile(bank_snapshot_dir)); | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. updated.
runtime/src/snapshot_utils.rs
Outdated
/// as a valid one. A dir unpacked from an archive lacks these files. Fill them here to | ||
/// allow new_from_dir() checks to pass. These checks are not needed for unpacked dirs, | ||
/// but it is not clean to add another flag to new_from_dir() to skip them. | ||
fn fill_snapshot_meta_files_for_unchived_snapshot(unpack_dir: impl AsRef<Path>) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note you have to fix calls to this function, so I wouldn't actually commit this suggestion.)
fn fill_snapshot_meta_files_for_unchived_snapshot(unpack_dir: impl AsRef<Path>) -> Result<()> { | |
fn fill_snapshot_meta_files_for_unarchived_snapshot(unpack_dir: impl AsRef<Path>) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Discussed with @brooksprumo offline. He said " An incremental snapshot is only possible as an archive. As a bank snapshot, I don't see this being a good idea because then this bank snapshot could not be used for fastboot.". I'd like to double check with @jeffwashington and @apfitzge, because I recall some mentioning of an incremental snapshot before. If we all agreed that a bank snapshot directory is always a full snapshot, then I will remove all the incremental snapshot directory logic from #30171 and this PR. |
"Full" and "incremental" are currently only terms in the context of snapshot archives. Without all the storages in the bank snapshot directory, it would not be usable for fastboot by itself. If a bank snapshot only had a subset of the storages (i.e. an incremental bank snapshot), where would the remaining storages come from? I think this would require pointing back to an older bank snapshot that had all the storages (i.e. a full bank snapshot). To support this model, we'd have to hold onto all those storages too. So worst case is a full duplication of all storages (100+ GB). It's probably not going to be that bad, but around 25,000 slots will likely touch lots of storages, and it would be important to see how much extra disk space would be required to support this. I don't think we should include that work in a v0 for getting fastboot working. |
@xiangzhu70 Can you update the PR description to describe the problem this PR is solving? |
Updated the problem section. Please check if it is OK now. Thanks! @brooksprumo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this PR is doing three distinct things, and I think should be split up:
- Adding a new function to get the highest bank snapshot, regardless of Pre/Post
We can keep this new function, as it communicates intent correctly. One interesting note is that with the current implementation, all bank snapshots on the snapshots dir are Pre
. The only time there is a Post
is within the tempdir created by AccountsPackage. So get_highest_bank_snapshot_pre()
will return what is needed in this case.
- Dealing with an issue for snapshot archives where some of these new files are missing but need to be there for BankSnapshotInfo::new_from_dir()
After we've unarchived a snapshot archive, we don't have a bank snapshot in the snapshots directory, so I don't understand why this is now required. Did that behavior change?
- Parsing a bank snapshot and extracting its BankSnapshotInfo
I like these changes! I think to make this cleaner, the new errors should go under their own new error enum type, and then a single new error goes in SnapshotError
. Refer to VerifySlotDeltasError
for an example of what I'm describing.
Because the snapshot from dir case and the snapshot from archive share the get_bank_snapshots() and new_from_dir() functions. To let a BankSnapshotInfo be returned after the directory checks in new_from_dir, I have to make the unarchived snapshot directories contain the meta files, even they are only used in snapshot_from_dir case. |
How should it be split up? Among the 3 issues, #2 is really just a result of checks introduced in #3, so they should go together. #1 is just a wrapper function which not much content. I can either remove it (just use get_highest_bank_snapshot_pre instead, although it carries implementation assumption) or move it back to PR #30171. Which way do you prefer? But the test function tests if it can find the highest valid one passing all the checks, so it is also a result of #3. In this regard it looks better to stay instead of being split out. |
It is now done. git pushed. |
Can you link me to the code you're referring to, please? I'm not finding other code that calls |
738e76b
to
758aedc
Compare
test_bank_fields_from_snapshot |
Is it correct to say that calling |
But it is the existing code on archive test which calls get_bank_snapshots to get back Vec<BankSnapshotInfo>. So, the code is passing BankSnapshotInfo objects in the unarchiving operation. It is just the shared operation to get back snapshot info from a directory, regardless of whether the directory is the operation snapshot directory, or an unpacked directory from an archive. It looks fine to me. Could you clarify how you want it to be fixed? |
Yes. it fixes test_bank_fields_from_snapshot |
I pulled down this branch to understand what is happening. Here's one of the call stacks that fails:
Since Inside So we're kind of repurposing the This means we have two use cases for the bank snapshots:
Since we have these two use cases, it makes sense to me to have two different functions for getting the bank snapshot within their respective contexts, returning each's pertinent information. If |
I think it depends on the perception. There are two possible perceptions.
If we follow the 1st perception, the function in the unarchiving context will need to avoid passing BankSnapshotInfo (since that struct is associated with only a snapshot in operation), but the logic about finding the directory and then the file in it would be duplicated. I think the 2nd perception is fine in theory, because we can think a snapshot (a full state in directory) goes in and out of an archive. But the caveat here is that we don't want to change the existing archive format for compatibility reason, so we are not really adding the full state meta into the archive, but instead adding the meta files back at the time of unpacking to make it a full state snapshot dir. I think the 2nd perception is better, despite the caveat. Anyway, in the long term, I believe the clean way would be to split the bank-from-archive flow into two separate pipeline stages:
At stage 1, just unpack the archive to the files. Information goes between archive and files. There is no any memory data structure (BankSnapshotInfo etc) involved. In that way, a bank will always come immediately from a snapshot dir, never from an archive. There will be clean logic separation. The code will be much simpler. If we go down that path, the code of constructing a bank directly from an archive will be obsolete, so the discussion above about how to perceive the logic in it will not matter. |
I like this plan! And then we no longer will need a distinction between bank snapshot and snapshot archive; it becomes just snapshot and archive. Ok, I'm on board with option 2 now. I'll re-review with this plan in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
c572e68
to
e259088
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…ana-labs#30373) * Read snapshot directories and find the highest * format check * Fix test_get_bank_snapshots * fix test_bank_fields_from_snapshot * review changes, is_file etc * removed incremental snapshot * SnapshotNewFromDirError * nit and comments issues * NewFromDir(#[from] SnapshotNewFromDirError) * change fill to create * replace unwrap with map_err and ok_or_else * Remove BankForks, fix the bank loop
Problem
Before constructing a bank from a snapshot, we need to read the snapshot directories and find the highest correct one with the necessary meta info files. Sometimes a directory may have partial information, not fully ready to be used for bank construction. Only the ones passing the checks will be selected.
Summary of Changes
Add snapshot version into BankSnapshotInfo, fill that from the version file;
Add checks to ensure only a correct directory is used to generate a BankSnapshotInfo.
A function to find the one with the highest slot, which is a wrapper of get_bank_snapshots and selecting the highest one.
Add the test function test_get_highest_bank_snapshot. It removes the meta files in the directories, and check if it gets the highest correct snapshot.
This is one of the split PRs of #28745. It precedes #30171
Fixes #