-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tower: when syncing from vote state, update last_vote #32944
tower: when syncing from vote state, update last_vote #32944
Conversation
562721c
to
f38f95f
Compare
Codecov Report
@@ Coverage Diff @@
## master #32944 +/- ##
=======================================
Coverage 82.0% 82.0%
=======================================
Files 784 784
Lines 212512 212565 +53
=======================================
+ Hits 174313 174398 +85
+ Misses 38199 38167 -32 |
core/src/consensus.rs
Outdated
let result = process_vote_unfiltered( | ||
&mut self.vote_state, | ||
&vote.slots, | ||
&vote, | ||
&[(vote_slot, vote_hash)], | ||
epoch, | ||
); | ||
if result.is_err() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just bubble up the error from process_vote_unchecked
and error log here?
core/src/consensus.rs
Outdated
epoch, | ||
); | ||
if result.is_err() { | ||
warn!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be as strong as error!
vote_slot, vote_hash, result | ||
); | ||
} | ||
self.update_last_vote_from_vote_state(vote_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set this properly when we ingest a new on chain tower in ReplayStage::compute_bank_stats()
, is this still necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need it because we are recording a new vote here. this takes care of this block we were doing previously:
new_vote.set_timestamp(self.maybe_timestamp(self.last_voted_slot().unwrap_or_default()));
self.last_vote = new_vote;
2664045
to
f24821a
Compare
// If our local root is higher than the highest slot in `bank_vote_state` due to | ||
// supermajority roots, then it's expected that the vote state will be empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly clear to me how supermajority roots cause tower to be empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that artificially increases the root could trigger this. I think snapshot is the best example, but I think supermajority roots could also cause this:
solana/ledger/src/blockstore_processor.rs
Lines 1576 to 1580 in d90e158
let _ = bank_forks.write().unwrap().set_root( | |
root, | |
accounts_background_request_sender, | |
None, | |
); |
// In this case we use the root as our last vote. This root cannot be None, because | ||
// `tower.vote_state.last_voted_slot()` is None only if `tower.vote_state.root_slot` | ||
// is Some. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this logic guaranteed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None <= None
. If tower.vote_state.last_voted_slot()
is None
, this means bank_vote_state.last_voted_slot()
must have originally been Some
in order to pass this check:
if bank_vote_state.last_voted_slot()
> tower.vote_state.last_voted_slot()
That means it must have been adjusted here:
bank_vote_state .votes .retain(|lockout| lockout.slot() > local_root);
For it to have been adjusted, it must have passed this check:
if let Some(local_root) = tower.vote_state.root_slot {
So it follows that iff tower.vote_state.root_slot
is Some
then tower.vote_state.last_voted_slot()
is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, i missed the logic where we adjust the root and filter out slots less than the root
@@ -3074,6 +3074,37 @@ impl ReplayStage { | |||
|
|||
tower.vote_state.root_slot = bank_vote_state.root_slot; | |||
tower.vote_state.votes = bank_vote_state.votes; | |||
|
|||
let last_voted_slot = tower.vote_state.last_voted_slot().unwrap_or( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this have to be Some
because we checked
if bank_vote_state.last_voted_slot() > tower.vote_state.last_voted_slot()
above, so there must be at least some vote in bank_vote_state.last_voted_slot()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily, see here:
bank_vote_state
.votes
.retain(|lockout| lockout.slot() > local_root);
progress | ||
.get_hash(last_voted_slot) | ||
.expect("Must exist for us to have frozen descendant"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wanted to make sure root bank exists in bank forks even if we load from a snapshot, I think it does because of bank_from_snapshot_archives()
* tower: when syncing from vote state, update last_vote * pr: bubble error through unchecked (cherry picked from commit 329c6f1) # Conflicts: # core/src/consensus.rs # programs/vote/src/vote_state/mod.rs
* tower: when syncing from vote state, update last_vote * pr: bubble error through unchecked (cherry picked from commit 329c6f1) # Conflicts: # programs/vote/src/vote_state/mod.rs
…t of #32944) (#32960) * tower: when syncing from vote state, update last_vote (#32944) * tower: when syncing from vote state, update last_vote * pr: bubble error through unchecked (cherry picked from commit 329c6f1) # Conflicts: # programs/vote/src/vote_state/mod.rs * fix conflicts * fix bad merge --------- Co-authored-by: Ashwin Sekar <[email protected]>
…t of #32944) (#32959) * tower: when syncing from vote state, update last_vote (#32944) * tower: when syncing from vote state, update last_vote * pr: bubble error through unchecked (cherry picked from commit 329c6f1) # Conflicts: # core/src/consensus.rs # programs/vote/src/vote_state/mod.rs * Fix conflicts --------- Co-authored-by: Ashwin Sekar <[email protected]>
Problem
On replay, if validators observe a newer vote state from a frozen bank, they will adopt that vote state in favor of their local tower. This is to avoid an outdated local tower from sending out slashable votes, or in case of large divergences, having to wait forever for lockout to expire.
However when we adopt vote state, we do not update the tower's
last_vote
field. This causes problems in situations where our fork choice does not agree with the on chain vote state.Consider the following:
Previously our validator had voted on slots
124
and126
which landed in block127
. Currently, our validator has replayed only up to115
and has most recently voted on115
.Next it receives the remaining blocks up to
127
to replay. Upon replay of127
it realizes it has voted on slots124
and126
in the past, and updates the tower vote state to reflect that.solana/core/src/replay_stage.rs
Lines 3029 to 3031 in 14d0759
Suppose that previously, our validator did not get a chance to observe the
120
fork which is why it voted on124
. However in our current run, fork choice rates123
as the best slot. When it comes time to select forks, we select123
as the heaviest slot and123
as the heaviest slot descended from our last vote:solana/core/src/consensus/heaviest_subtree_fork_choice.rs
Lines 1015 to 1017 in 14d0759
Because we have not updated our last vote, we think that the heaviest slot descends from our last vote, and incorrectly
SwitchForkDecision::SameFork
solana/core/src/consensus.rs
Lines 872 to 875 in 14d0759
At this point we vote on
123
, however because we adopted the vote state from127
which contains votes for124
and126
, this vote will fail. Unfortunately we useprocess_vote_unchecked
which silently ignores this error. We think that the vote has succeeded, and construct our newlast_vote
from the tower. Thislast_vote
is garbage, it contains slot126
, but has avote_hash
for slot123
.solana/core/src/consensus.rs
Lines 569 to 578 in 14d0759
Any further operations involving the tower + fork choice will now panic, as the last vote in the tower does not exist in fork choice (invalid
SlotHashKey
).Summary of Changes
tower.last_vote
as well.tower.record_bank_vote...
variants should useprocess_vote_unfiltered
. Although it is unlikely we can recover in case this vote fails, we can log the error to make debugging such situations possible.Fixes #32880
This is the proper fix for #32894