-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix assertion for single-node cluster and periodic warning w/ seed hosts #89828
Conversation
The test CoordinatorTests.testDoesNotPerformElectionWhenRestartingFollower failed for a 2 nodes cluster after a follower node is restarted (the last accepted cluster state has 1 node but no periodic checker, which looks OK to me). This commit changes the assertion to check the existence of the periodic checker only when seed hosts have been configured. Closes elastic#89543
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the last accepted cluster state has 1 node but no periodic checker, which looks OK to me
Hi @tlrx , this is weird. If the node is becoming a leader, we have the following code at the end of becomeLeader()
:
if (applierState.nodes().size() > 1) {
cancelSingleNodeClusterChecker();
} else if (singleNodeClusterChecker == null) {
singleNodeClusterChecker = transportService.getThreadPool()
.scheduleWithFixedDelay(() -> { checkSingleNodeCluster(); }, this.singleNodeClusterSeedHostsCheckInterval, Names.SAME);
}
So I wonder why the invariant fails? It does not have to do with the settings (the settings are checked inside the checker function).
We may need to take a deeper look at the test failure to understand why it was null.
Hi @tlrx , may I actually take over the issue and make a new PR? I understood what's wrong and I can fix it. The problem is that I introduce the checker only when a node is becoming a leader in a single-node cluster, but there's also the case (as it's shown by the test) that it can become single-node cluster when a node leaves a two-node cluster. |
The invariant fails because the cluster consists of two nodes (cancelling the single node cluster checker) and later a follower node leaves the cluster and joins back. After reading again #88013 this morning I understand that we should also emit a warning in this situation. I only noticed yesterday that the failure was related to a test you wrote recently, otherwise I would have transferred the issue to you :) Feel free to close this one and open a better fix. |
Thanks @tlrx will takeover then and make another PR. Indeed, I should re-introduce the check if it goes from 2 nodes to 1 node again. |
The test CoordinatorTests.testDoesNotPerformElectionWhenRestartingFollower
failed for a 2 nodes cluster after a follower node is restarted (the last
accepted cluster state has 1 node but no periodic checker, which looks OK
to me).
This commit changes the assertion to check the existence of the periodic
checker only when seed hosts have been configured.
Closes #89543