Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --wait-for-supermajority require --expected-shred-version #192

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

steviez
Copy link

@steviez steviez commented Mar 11, 2024

Problem

In cluster restart scenarios, an important step is scanning the Blockstore for blocks (shreds) that occur after the chosen restart slot with an incorrect shred version. Supposing the cluster is chosen to be restarted at slot R, the check ensures that any slots > R are purged from the Blockstore before the node enters the wait-for-supermajority loop. If a node skips this step, the node can encounter problems when that block is created again (post cluster restart)

This check only occurs if --wait-for-supermajority AND --expected-shred-version are set; however, --expected-... is currently optional when using --wait-...

Summary of Changes

Our restart instructions typically mention that one should specify --expected-... as well, but we should just enforce it at the CLI level to prevent mistakes / wasted time debuggging.

Forcing this will also enable us to run this sanity check:

agave/core/src/validator.rs

Lines 730 to 738 in 88f6a7a

if let Some(expected_shred_version) = config.expected_shred_version {
if expected_shred_version != node.info.shred_version() {
return Err(format!(
"shred version mismatch: expected {} found: {}",
expected_shred_version,
node.info.shred_version(),
));
}
}

In cluster restart scenarios, an important is scanning the Blockstore
for blocks (shreds) that occur after the chosen restart slot with an
incorrect shred version. This check ensures that any blocks that
occurred pre-cluster restart and after the chosen restart slot get
deleted. If a node skips this step, the node can encounter problems when
that block is created again, after the cluster has restarted.

This check only occurs if --wait-for-supermajority AND
--expected-shred-version are set; however, --expected-... is currently
optional when using --wait-...

Our restart instructions typically mention that one should specify
--expected-... as well, but we should just enforce it at the CLI level
to prevent mistakes / wasted time debuggging.
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.8%. Comparing base (e027a8b) to head (cec2f24).
Report is 18 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master     #192    +/-   ##
========================================
  Coverage    81.8%    81.8%            
========================================
  Files         838      838            
  Lines      226389   226526   +137     
========================================
+ Hits       185342   185518   +176     
+ Misses      41047    41008    -39     

@steviez steviez requested a review from t-nelson March 11, 2024 23:17
@steviez steviez merged commit 2ddb50d into anza-xyz:master Mar 12, 2024
35 checks passed
@steviez steviez deleted the wfs_required_esv branch March 12, 2024 06:27
codebender828 pushed a commit to codebender828/agave that referenced this pull request Oct 3, 2024
…xyz#192)

In cluster restart scenarios, an important step is scanning the
Blockstore for blocks that occur after the chosen restart slot with an
incorrect shred version. This check ensures that any blocks that
occurred pre-cluster restart and after the chosen restart slot get
deleted. If a node skips this step, the node can encounter problems when
that block is created again, after the cluster has restarted.

This check only occurs if --wait-for-supermajority AND
--expected-shred-version are set; however, --expected-... is currently
optional when using --wait-...

Our restart instructions typically mention that one should specify
--expected-... as well, but we should just enforce it at the CLI level
to prevent mistakes / wasted time debuggging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants