-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make background services aware of incremental snapshots #19401
Make background services aware of incremental snapshots #19401
Conversation
0c5eb27
to
9b5d5f9
Compare
AccountsBackgroundService now knows about incremental snapshots. It is now also in charge of deciding if an AccountsPackage is destined to be a SnapshotPackage or not (or just used by AccountsHashVerifier). !!! New behavior changes !!! Full snapshots (both bank and archive) **MUST** succeed. If there are failures, the functions will be retried. If they still fail, an assert will be triggered. This is required because of how the last full snapshot slot is calculated, which is used by AccountsBackgroundService when calling `clean_accounts()`.
9b5d5f9
to
6d24603
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot going on in this PR. I've tried to call out most important bits!
let block_height = snapshot_root_bank.block_height(); | ||
let snapshot_type = if block_height | ||
% self.snapshot_config.full_snapshot_archive_interval_slots | ||
== 0 | ||
{ | ||
Some(SnapshotType::FullSnapshot) | ||
} else if block_height | ||
% self | ||
.snapshot_config | ||
.incremental_snapshot_archive_interval_slots | ||
== 0 | ||
{ | ||
Some(SnapshotType::IncrementalSnapshot(last_full_snapshot_slot)) | ||
} else { | ||
None | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now AccountsBackgroundService is the one responsible for setting the snapshot type (previously this was in AccountsHashVerifier).
Note that the None
here is for AccountsPackages that will not be archived, which is possible if the incremental snapshot interval does not equal the accounts hash interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah using the block_height
is a lot simpler than trying to find the first slot that crossed the full snapshot boundary. Interesting bit here is the full snapshots will be correlated with skip rate, i.e. the higher the skip rate, the less often - in wallclock time - we will take full snapshots. I think this is fine for v1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! That is interesting. Good to note.
|
||
let snapshot_package = SnapshotPackage::from(accounts_package); | ||
let pending_snapshot_package = pending_snapshot_package.unwrap(); | ||
let _snapshot_config = snapshot_config.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be able to remove SnapshotConfig from AccountsHashVerifier entirely too. I'm thinking about taking some things out of AccountsPackage that live in SnapshotConfig, which I may then use here, so I'm going to leave the SnapshotConfig here for now.
assert_eq!(deserialized_bank.slot(), LAST_SLOT,); | ||
assert_eq!( | ||
deserialized_bank.slot(), | ||
EXPECTED_SLOT_FOR_LAST_SNAPSHOT_ARCHIVE | ||
deserialized_bank, | ||
**bank_forks | ||
.read() | ||
.unwrap() | ||
.get(deserialized_bank.slot()) | ||
.unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! Now we check the banks are equal too!
runtime/src/snapshot_package.rs
Outdated
/// Snapshots come in two flavors, Full and Incremental. The IncrementalSnapshot has a Slot field, | ||
/// which is the incremental snapshot base slot. | ||
#[derive(Clone, Copy, Debug, Eq, PartialEq)] | ||
pub enum SnapshotType { | ||
FullSnapshot, | ||
IncrementalSnapshot, | ||
} | ||
|
||
impl SnapshotType { | ||
/// Get the string prefix of the snapshot type | ||
pub fn to_prefix(&self) -> &'static str { | ||
match self { | ||
SnapshotType::FullSnapshot => TMP_FULL_SNAPSHOT_PREFIX, | ||
SnapshotType::IncrementalSnapshot => TMP_INCREMENTAL_SNAPSHOT_PREFIX, | ||
} | ||
} | ||
IncrementalSnapshot(Slot), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By putting the Slot
in the IncrementalSnapshot
, I can always guarantee there is a last full snapshot if the snapshot type is Incremental. Otherwise I'd have asserts all over the place to ensure that was the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
SnapshotType::IncrementalSnapshot => TMP_INCREMENTAL_SNAPSHOT_PREFIX, | ||
} | ||
} | ||
IncrementalSnapshot(Slot), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While looking through the actual archive and unarchive functions more, I realized that the prefix didn't need to be unique between full and incremental snapshots. So that just makes this a bit simpler.
let snapshot_storages = snapshot_type.map_or_else(SnapshotStorages::default, |snapshot_type| { | ||
let incremental_snapshot_base_slot = match snapshot_type { | ||
SnapshotType::IncrementalSnapshot(incremental_snapshot_base_slot) => { | ||
Some(incremental_snapshot_base_slot) | ||
} | ||
_ => None, | ||
}; | ||
root_bank.get_snapshot_storages(incremental_snapshot_base_slot) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the snapshot type, get the snapshot storages. In the None
case, that means the "snapshot" is not meant to be archived, and will only be used to send to AccountsHashVerifier to push the hash out to the cluster. In that case, we don't need to snapshot any of the storages even.
Codecov Report
@@ Coverage Diff @@
## master #19401 +/- ##
=========================================
- Coverage 82.8% 82.7% -0.2%
=========================================
Files 457 459 +2
Lines 130682 130630 -52
=========================================
- Hits 108289 108115 -174
- Misses 22393 22515 +122 |
pub hash: Hash, // temporarily here while we still have to calculate hash before serializing bank | ||
pub archive_format: ArchiveFormat, | ||
pub snapshot_version: SnapshotVersion, | ||
pub snapshot_archives_dir: PathBuf, | ||
pub expected_capitalization: u64, | ||
pub hash_for_testing: Option<Hash>, | ||
pub cluster_type: ClusterType, | ||
pub snapshot_type: Option<SnapshotType>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this optional? Shouldn't we always know the snapshot type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should hopefully give an answer:
#19401 (comment)
And some more context here too:
#19401 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. We are overloading the snapshot type for some other purpose -- maybe a separate flag for the Hash only. And when it is None, what is really the type? Is it a full or incremental? Does not matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I debated how far to take this... There's a lot of coupling and implied behavior between bank snapshots, snapshot archives, and the hashing... plus how those three interact with the background services... Ultimately I figured that to do it "right" would likely require redoing the whole background services from the ground up, and that would probably be out of scope for this task, so I tried to make it least bad :)
This snapshot_type
is used to decide which snapshot storages to get, when to take a snapshot/what time, and if it should be archived by SnapshotPackagerService.
// Hard link the snapshot into a tmpdir, to ensure its not removed prior to packaging. | ||
let snapshot_links = tempfile::Builder::new() | ||
.prefix(&format!("{}{}-", TMP_BANK_SNAPSHOT_PREFIX, bank.slot())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do full snapshot and incremental snapshot look any different now for the tmp dir entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. None of the different temp directories need different prefixes actually, but now we're only differentiating between bank snapshots and snapshot archives.
runtime/src/snapshot_package.rs
Outdated
/// Snapshots come in two flavors, Full and Incremental. The IncrementalSnapshot has a Slot field, | ||
/// which is the incremental snapshot base slot. | ||
#[derive(Clone, Copy, Debug, Eq, PartialEq)] | ||
pub enum SnapshotType { | ||
FullSnapshot, | ||
IncrementalSnapshot, | ||
} | ||
|
||
impl SnapshotType { | ||
/// Get the string prefix of the snapshot type | ||
pub fn to_prefix(&self) -> &'static str { | ||
match self { | ||
SnapshotType::FullSnapshot => TMP_FULL_SNAPSHOT_PREFIX, | ||
SnapshotType::IncrementalSnapshot => TMP_INCREMENTAL_SNAPSHOT_PREFIX, | ||
} | ||
} | ||
IncrementalSnapshot(Slot), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
pub hash: Hash, // temporarily here while we still have to calculate hash before serializing bank | ||
pub archive_format: ArchiveFormat, | ||
pub snapshot_version: SnapshotVersion, | ||
pub snapshot_archives_dir: PathBuf, | ||
pub expected_capitalization: u64, | ||
pub hash_for_testing: Option<Hash>, | ||
pub cluster_type: ClusterType, | ||
pub snapshot_type: Option<SnapshotType>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. We are overloading the snapshot type for some other purpose -- maybe a separate flag for the Hash only. And when it is None, what is really the type? Is it a full or incremental? Does not matter?
@carllin @lijunwangs Thanks for the reviews so far! I've fixed or commented on everything, so this PR should be ready for another round. Thanks in advance! |
…na-labs#19401)" This reverts commit 78341d7.
AccountsBackgroundService now knows about incremental snapshots. It is
now also in charge of deciding if an AccountsPackage is destined to be a
SnapshotPackage or not (or just used by AccountsHashVerifier).
!!! New behavior changes !!!
Full snapshots (both bank and archive) MUST succeed.
This is required because of how the last full snapshot slot is
calculated, which is used by AccountsBackgroundService when calling
clean_accounts()
.File system calls are now unwrapped and will result in a crash. As Trent told me:
Fixes #19167
Fixes #19168