-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[leader election] Improve leader election for first 20 rounds #5683
Conversation
exclude_rounds is 20 by default, which means we don't consider last 20 rounds in leader reputation history - but we do so from the same epoch - we consider everyone is caught up with the previous epoch. genesis is epoch=0, round=0 first block is epoch=1 round=1, and that is a reconfig, that triggers epoch 2. because votes in block metadata are for previous round, they are empty in that round. So in that round - only "active" node is proposer. For first 20 rounds of epoch 2, that is the only history we have - and so proposer of epoch=1 round=1 will always be elected for the first 20 rounds. That can cause issues in tests, as they might be executed over very few rounds. So removing epoch=1,round=1 from being considered for history in epoch 2. We can make this change without backward compatible gating, as this only affects first window rounds (10 * num validators), and chain will continue successfully after that (and all active chains are pass the epoch 2)
let first_epoch_to_consider = std::cmp::max( | ||
1, | ||
if epoch_state.epoch == 1 { 1 } else { 2 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t quite understand why we want to consider epoch=1 round=1 when we are still in epoch=1?
i.e., why can’t we just change the min epoch to consider as 2 rather than 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have multiple rounds in epoch 1, they are just short-circuted and removed after we move to epoch 2.
so we should have leader election work normally in epoch 1. we just don't want to take history from epoch 1 into epoch 2 leader election
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this expression is getting ugly, maybe this is simpler?
if epoch <= 2 {
use_history_from_previous_epoch_max_count = 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the same. use_history_from_previous_epoch_max_count is 5 by default (in case we have multiple consecutive reconfigs/governance proposals. So we want first_epoch_to_consider to be 2 for epoch = 6 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm as long as after epoch 1 we have more than 20 rounds it should be good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are saying, but even though expression is shorter, I find it much more confusing at what it does. I think with the comment, the code is understandable.
so I am going to land as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good findings!
let first_epoch_to_consider = std::cmp::max( | ||
1, | ||
if epoch_state.epoch == 1 { 1 } else { 2 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this expression is getting ugly, maybe this is simpler?
if epoch <= 2 {
use_history_from_previous_epoch_max_count = 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
…labs#5683) exclude_rounds is 20 by default, which means we don't consider last 20 rounds in leader reputation history - but we do so from the same epoch - we consider everyone is caught up with the previous epoch. genesis is epoch=0, round=0 first block is epoch=1 round=1, and that is a reconfig, that triggers epoch 2. because votes in block metadata are for previous round, they are empty in that round. So in that round - only "active" node is proposer. For first 20 rounds of epoch 2, that is the only history we have - and so proposer of epoch=1 round=1 will always be elected for the first 20 rounds. That can cause issues in tests, as they might be executed over very few rounds. So removing epoch=1,round=1 from being considered for history in epoch 2. We can make this change without backward compatible gating, as this only affects first window rounds (10 * num validators), and chain will continue successfully after that (and all active chains are pass the epoch 2)
exclude_rounds is 20 by default, which means we don't consider last 20 rounds in leader reputation history - but we do so from the same epoch - we consider everyone is caught up with the previous epoch.
genesis is epoch=0, round=0
first block is epoch=1 round=1, and that is a reconfig, that triggers epoch 2. because votes in block metadata are for previous round, they are empty in that round. So in that round - only "active" node is proposer.
For first 20 rounds of epoch 2, that is the only history we have - and so proposer of epoch=1 round=1 will always be elected for the first 20 rounds.
That can cause issues in tests, as they might be executed over very few rounds. So removing epoch=1,round=1 from being considered for history in epoch 2.
We can make this change without backward compatible gating, as this only affects first window rounds (10 * num validators), and chain will continue successfully after that (and all active chains are pass the epoch 2)
Description
Test Plan
used test_basic_fault_tolerance test, and confirmed before the change that leader doesn't change for first 20 rounds, and after this change it does.