-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consensus (bank hash) divergence between v1.14. and master (mnb) #32889
Comments
CC: @AshwinSekar @carllin @bji |
Using the functionality introduced with #32632, I have debug files that are accessible here. Diffing these files, I see only account is different between the two: Correct (from
Incorrect (from
The files contain the account data as well (base64 encoded); can probably past these base64 strings into some rust code and decode into vote accounts to see what is different. |
Thank you for taking these steps. I will decode the accounts and take a look. |
The difference is in credits accounting; the 9FGFC7MDkVU3igJzk9uavzaFFRdxuDNpgpFrUCf3pjgh version has 607035 credits for that vote account, and the GoZ4MYBB21YiiVCHWZHzKaMebtvk1sFWBqpzT3nZwk6s version has 607034 credits. |
I wonder - is it possible to get the most recent Vote or VoteStateUpdate instruction that preceeded the vote account snapshots for the vote account Fzrwo4KAmX7eGfQ5emmAHXs1hqoaiFXeH69P8LJNce5H? Also the fact that the divergence was after the VoteStateUpdate feature was enabled, suggests to me that it's a difference in how credits are counted for VoteStateUpdate transactions. |
Also I wonder if this is something that only happens on transition between VoteState built from Vote tx, to one built using VoteStateUpdate. The log line with the consensus divergence occurred just after the start of epoch 491, when the VoteStateUpdate feature had just become enabled. |
Here is the transaction from
|
Assuming that only the Lockouts, Root, and Credits are of importance for diagnosing this, here is the account data: (It just confirms that the vote account tower state was updated from the contents of the VoteStateUpdate tx you referenced)
|
Need to see the vote account state prior to the VoteStateUpdate tx (and possibly the prior VoteStateUpdate tx as well, that put the vote account in that state). Some more info: that vote account had no transactions in epoch 491 before the VoteStateUpdate that steviez referenced in slot 212112050. Therefore this was definitely a mismatch after the first VoteStateUpdate for this vote account. |
I'm having a hard time finding any vote tx for this validator leading up to slot 212112050. Have gone all the way back to 212111960 (90 slots prior) without finding one yet. |
Yeah the validator was delinquent for over 16,000 slots. So the edge case here seems to be when a VoteStateUpdate replaced the entirety of a vote state with new slots. |
I think the error may be the change from this: // Count the number of slots at and before the new root within the current vote state lockouts. Start with 1
// for the new root. The purpose of this is to know how many slots were rooted by this state update:
// - The new root was rooted
// - As were any slots that were in the current state but are not in the new state. The only slots which
// can be in this set are those oldest slots in the current vote state that are not present in the
// new vote state; these have been "popped off the back" of the tower and thus represent finalized slots
let mut finalized_slot_count = 1_u64;
if let Some(new_root) = new_root {
for current_vote in &vote_state.votes {
// Find the first vote in the current vote state for a slot greater
// than the new proposed root
if current_vote.slot() <= new_root {
current_vote_state_index = current_vote_state_index
.checked_add(1)
.expect("`current_vote_state_index` is bounded by `MAX_LOCKOUT_HISTORY` when processing new root");
if current_vote.slot() != new_root {
finalized_slot_count = finalized_slot_count
.checked_add(1)
.expect("`finalized_slot_count` is bounded by `MAX_LOCKOUT_HISTORY` when processing new root");
}
continue;
}
break;
}
} To this: // Accumulate credits earned by newly rooted slots.
let mut earned_credits = 0_u64;
if let Some(new_root) = new_root {
for current_vote in &vote_state.votes {
// Find the first vote in the current vote state for a slot greater
// than the new proposed root
if current_vote.slot() <= new_root {
earned_credits = earned_credits
.checked_add(vote_state.credits_for_vote_at_index(current_vote_state_index))
.expect("`earned_credits` does not overflow");
current_vote_state_index = current_vote_state_index
.checked_add(1)
.expect("`current_vote_state_index` is bounded by `MAX_LOCKOUT_HISTORY` when processing new root");
continue;
}
break;
}
} The |
Still trying to work out the specifics of which situation causes a divergence in credits accounting before and after the change, but my strike thoughed text was wrong because if there is a new root, then a credit should be awarded. |
OK I think I understand the problem. Indeed the existing code is flawed, and the timely_vote_credits change fixes it. What is happening is that when a validator is delinquent for a long time, then their first VoteStateUpdate after that long delinquency will include a new root, but it's not a slot that they ever voted on. The existing code (which came from the "one_credit_per_dequeue" feature) adds a credit just because there is a new root in the new vote state. But this is incorrect, as that new root was never voted on by the validator so should not earn a credit. But the timely_vote_credits feature correctly only adds credits for slots from the old vote state that were "popped off of the tower" by the new vote, and doesn't add a credit just because there is a new root in the new vote state (although it would have added the credit if the new root were a slot that was actually voted on in the prior vote state; which is not the case in the circumstances leading to this bug). I have written a test case which demonstrates this flaw: I think that the way to deal with this is to fix the existing code to not have this flaw, which will require a new feature; and then once this feature is enabled, the timely vote credits feature can be re-enabled. |
…oot slot which was not voted on. Fixes issue solana-labs#32889.
…oot slot which was not voted on. Fixes issue solana-labs#32889.
…oot slot which was not voted on. Fixes issue solana-labs#32889.
…oot slot which was not voted on. Fixes issue solana-labs#32889.
…oot slot which was not voted on. Fixes issue solana-labs#32889.
…oot slot which was not voted on. Fixes issue solana-labs#32889.
Closing this issue out.
|
Problem
A consensus divergence was observed on the canary nodes running the tip of master against mnb. Sample log:
Running
git-bisect
withsolana-ledger-tool
, I was able to determine that #31291 causes the divergence. With 35ec7bf,With d26e3ff (the parent of 35ec7bf), I got the correct hash:
Proposed Solution
The text was updated successfully, but these errors were encountered: