Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download incremental snapshots during bootstrap #20696

Merged
merged 10 commits into from
Oct 21, 2021

Conversation

brooksprumo
Copy link
Contributor

@brooksprumo brooksprumo commented Oct 14, 2021

Problem

Bootstrapping does not know about incremental snapshots, so it will never download an incremental snapshot.

Summary of Changes

Add methods to rpc_bootstrap() to discover and download an incremental snapshot.

When incremental snapshots is enabled (--incremental-snapshots) and incremental snapshot fetch is not disabled (no --no-incremental-snapshot-fetch), then rpc_bootstrap will discover and download compatible incremental snapshots.

This means:

  • finding a full snapshot & incremental snapshot based on that full snapshot
  • filtering out only snapshots where the full snapshot is a multiple of our full snapshot interval (thus to maximize the chance we do not re-download a full snapshot)
  • filtering out only the snapshots with the highest full snapshot slot
  • filtering out only the snapshots with the highest incremental snapshot slot

Fixes #20225

Extra

This PR is quite large, and likely needs a lot of time to go over. I've split the implementations for with and without incremental snapshots into their own modules. This was so I could keep the exact same code for the "without" case intact, which will keep existing behavior a seamless possibility until more nodes have enabled incremental snapshots and this code has had more testing.

Please do ask questions. I'm also happy to jump on a call to go over anything. While implementing this I encountered a lot of "well what should happen in this case?" scenarios, and those likely need to be checked for correct assumptions.

Additionally, I've tried to preserve the existing behavior around retrying and timeouts. The new code path is quite different, so double checking all of that will be very valuable too. Thanks in advance!

@brooksprumo brooksprumo force-pushed the bootstrap branch 2 times, most recently from 35cccb2 to 0962313 Compare October 14, 2021 19:41
@codecov
Copy link

codecov bot commented Oct 14, 2021

Codecov Report

Merging #20696 (7b48cfe) into master (7baeb04) will decrease coverage by 0.1%.
The diff coverage is 41.3%.

@@            Coverage Diff            @@
##           master   #20696     +/-   ##
=========================================
- Coverage    81.9%    81.7%   -0.2%     
=========================================
  Files         496      496             
  Lines      138317   138952    +635     
=========================================
+ Hits       113319   113607    +288     
- Misses      24998    25345    +347     

@brooksprumo brooksprumo marked this pull request as ready for review October 14, 2021 22:03
@brooksprumo brooksprumo requested review from mvines and ryoqun October 14, 2021 22:03
@brooksprumo
Copy link
Contributor Author

@mvines @ryoqun Thanks in advance for reviewing! Two things:

  1. If there's someone else you think should review, please add them.

  2. Since the PR is so large, it'd probably be easier to split this one up. If the general direction of my implementation looks good, I can work on splitting this PR, so that the structural changes of the inner modules can be its own independent PR.

@mvines
Copy link
Member

mvines commented Oct 18, 2021

Does this PR contain the Draft PR #20730?
If so, please merge #20730 first and rebase. Since #20730 doesn't require review, getting it out of here will cause your reviewers to expend much less effort and increase the quality of review that you'll get here.

@brooksprumo
Copy link
Contributor Author

Does this PR contain the Draft PR #20730? If so, please merge #20730 first and rebase. Since #20730 doesn't require review, getting it out of here will cause your reviewers to expend much less effort and increase the quality of review that you'll get here.

@mvines Definitely agree! I was thinking of getting feedback on this design before making the other PRs live (there's actually four PRs with refactoring). I'm working with @jeffwashington to do these reviews, and then I'll rebase this one.

@mvines
Copy link
Member

mvines commented Oct 18, 2021

Got it. The design that you've written up in the PR description makes sense to me. Let's whittle this PR down to just the implementation of the design sans refactoring churn and I'll jump in on the review!

@brooksprumo
Copy link
Contributor Author

Got it. The design that you've written up in the PR description makes sense to me. Let's whittle this PR down to just the implementation of the design sans refactoring churn and I'll jump in on the review!

Sounds good! I'll move this back to Draft while I refactor.

Fixes solana-labs#20225

- Add fn to download incremental snapshots at bootstrap
  - requires `--incremental-snapshots` and _not_
    `--no-incremental-snapshot-fetch` to be set
@brooksprumo brooksprumo marked this pull request as ready for review October 20, 2021 01:28
@brooksprumo
Copy link
Contributor Author

@mvines @jeffwashington Ok, this PR is ready for review again! Jeff, I added you as well since you mentioned you'd be willing to help review (and thanks for reviewing the refactor PRs!).

Comment on lines 100 to 101
if bootstrap_config.incremental_snapshots == ConfigState::Enabled
&& bootstrap_config.incremental_snapshot_fetch == ConfigState::Enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we only check for incremental_snapshot_fetch here? That is, why would incremental_snapshots ever be false if incremental_snapshot_fetch is true.

Aside: I'm not a fan of ConfigState, I'd prefer a plain bool myself. But I see you/Jeff discussed that in another PR so 🙈

Copy link
Contributor Author

@brooksprumo brooksprumo Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could check for just incremental_snapshot_fetch here, yes. I'll make that change.

Re ConfigState: Well, that's 2-to-1 in favor of bool, so I'll change this to bool instead.

TODO:

  • Check only for incremental_snapshot_fetch
  • Use bool instead of ConfigState

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Right now, --incremental-snapshots defaults to false, so I figured that fetching incremental snapshots during bootstrap should also be false by default.

And since --incremental-snapshots is false by default, and --no-incremental-snapshot-fetch is also false by default, this would mean that if I only check for --no-incremental-snapshot-fetch, then this would allow the case where the user does not set --incremental-snapshots nor --no-incremental-snapshot-fetch, and they would end up trying to download an incremental snapshot.

I think that would be a bit confusing. Maybe it wouldn't work out right either, if the full/incremental snapshot intervals are not multiples of eachother either...

validator/src/bootstrap.rs Show resolved Hide resolved
validator/src/bootstrap.rs Outdated Show resolved Hide resolved
validator/src/bootstrap.rs Show resolved Hide resolved
validator/src/bootstrap.rs Show resolved Hide resolved
@brooksprumo brooksprumo requested a review from mvines October 20, 2021 19:42
/// - ContactInfo: the rpc peer
/// - (Slot, Hash): the base (i.e. full) snapshot hash
/// - Option<(Slot, Hash)>: the highest incremental snapshot hash, if one exists
#[allow(clippy::type_complexity)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanna just wrap a struct around this return type? It'll avoid the clippy and then that comment turns into sweet compile-time verified code instead of an out of date comment in 6 months :)

Copy link
Contributor Author

@brooksprumo brooksprumo Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You bet I do! Hah, I was concerned I was overdoing it on the types, which is why I didn't do it in the first place.

TODO:

  • Make the return type into a struct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I figured it was trauma from that last type discussion! ⚔️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@mvines Can you take another look and see what you think? I put the majority of this into a struct called PeerSnapshotHash. Function names and args have also been updated to use this name instead of "incremental_snapshots", which was already overloaded a bunch.

validator/src/bootstrap.rs Outdated Show resolved Hide resolved
mvines
mvines previously approved these changes Oct 20, 2021
Copy link
Member

@mvines mvines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strictly better than the old version because tests exist!

Would like to the type-complexity clippy allows removed, and I think we have some discussion about the flags themselves. But that discussion doesn't need to block this PR from landing IMO

@mergify mergify bot dismissed mvines’s stale review October 21, 2021 00:48

Pull request has been modified.

mvines
mvines previously approved these changes Oct 21, 2021
Copy link
Member

@mvines mvines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

validator/src/bootstrap.rs Outdated Show resolved Hide resolved
validator/src/bootstrap.rs Outdated Show resolved Hide resolved
brooksprumo and others added 2 commits October 21, 2021 06:26
@mergify mergify bot dismissed mvines’s stale review October 21, 2021 11:26

Pull request has been modified.

@brooksprumo brooksprumo merged commit 4821b0a into solana-labs:master Oct 21, 2021
@brooksprumo brooksprumo deleted the bootstrap branch October 21, 2021 14:14
dankelleher pushed a commit to identity-com/solana that referenced this pull request Nov 24, 2021
frits-metalogix added a commit to identity-com/solana that referenced this pull request Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Download incremental snapshots in bootstrap
2 participants