-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute Switch Threshold #9218
Compute Switch Threshold #9218
Conversation
// 2) Not from before the current root as we can't determine if | ||
// anything before the root was an ancestor of `last_vote` or not | ||
if !last_vote_ancestors.contains(lockout_interval_start) | ||
&& ancestors.contains_key(lockout_interval_start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aeyakovenko, I'm filtering out any branches from before the root, so those forks can't be included in the switching proofs, even though they may be locked out above our last vote. I don't think it's a huge issue, but I could be missing something...
lockout_intervals | ||
.entry(vote.expiration_slot()) | ||
.or_insert_with(|| vec![]) | ||
.push((vote.slot, key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add these keys to the replay_stage all_pubkeys
so they pull from the same reference pool to reduce memory usage.
} | ||
(locked_out_stake as f64 / total_stake as f64) > SWITCH_FORK_THRESHOLD | ||
}) | ||
.unwrap_or(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't generate the proof yet, want to make sure these incremental changes don't break anything first
a6e9a44
to
bfc974f
Compare
Codecov Report
@@ Coverage Diff @@
## master #9218 +/- ##
=======================================
Coverage 80.4% 80.4%
=======================================
Files 284 285 +1
Lines 66235 66388 +153
=======================================
+ Hits 53263 53420 +157
+ Misses 12972 12968 -4 |
08b4b28
to
1deaba9
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Looks pretty good. We really need to pull the consensus stuff out into a non networked simulation environment. |
6996b96
to
b1e9996
Compare
Pull request has been modified.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This stale pull request has been automatically closed. Thank you for your contributions. |
…tors/descendants map for consistency with BankForks and progress map
…sss invalid banks
@carllin do these tests verify that the network can recover from a 33 - 4 failure? Assuming the threshold is 4 |
@aeyakovenko, if the threshold is There's currently no test that sets up this exact scenario, but this one is pretty close: https://github.com/solana-labs/solana/blob/master/local-cluster/tests/local_cluster.rs#L376. It currently kill If we were instead to toggle the test to kill the leader before the partition, that would probably test this case. |
@carllin but once the partition recovers the nodes can come back. ideally we have a local cluster test and a nightly partition test that induces this scenario. |
@aeyakovenko even after the partition resolves, if > 25% are dead then each side of the partition may get stuck, as there's not enough stake on the other side of the partition to generate a switching proof. The only possible way out of this hole is if then if the smaller/less staked partition then itself sub-partitions/forks, and then people on that side of the partition also vote on their own fork (they would have to think their fork is the heaviest, which may not happen if they detect the major fork) allowing some of the validators on that side of the partition to generate a switching proof to switch. But this seems like a very unlikely escape We can add a local cluster test, the nightly partition test can be an expansion of the existing nightly partition tests + ability to kill some of the nodes. |
@carllin i meant the network should recover as soon as we are under 25% dead |
Problem
Missing computation of switch threshold for optimistic confirmation
Summary of Changes
Refactor ReplayStage to not vote when switching threshold fails, instead reset to heaviest descendant of last vote (Will factor this out into another PR)
Compute switching threshold
Fixes #