Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assertion for single-node cluster and periodic warning w/ seed hosts #89828

Closed
wants to merge 1 commit into from

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Sep 6, 2022

The test CoordinatorTests.testDoesNotPerformElectionWhenRestartingFollower
failed for a 2 nodes cluster after a follower node is restarted (the last
accepted cluster state has 1 node but no periodic checker, which looks OK
to me).

This commit changes the assertion to check the existence of the periodic
checker only when seed hosts have been configured.

Closes #89543

The test CoordinatorTests.testDoesNotPerformElectionWhenRestartingFollower
failed for a 2 nodes cluster after a follower node is restarted (the last
accepted cluster state has 1 node but no periodic checker, which looks OK
to me).

This commit changes the assertion to check the existence of the periodic
checker only when seed hosts have been configured.

Closes elastic#89543
@tlrx tlrx added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.5.0 v8.4.2 labels Sep 6, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 6, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx tlrx requested a review from kingherc September 6, 2022 16:08
Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last accepted cluster state has 1 node but no periodic checker, which looks OK to me

Hi @tlrx , this is weird. If the node is becoming a leader, we have the following code at the end of becomeLeader():

        if (applierState.nodes().size() > 1) {
            cancelSingleNodeClusterChecker();
        } else if (singleNodeClusterChecker == null) {
            singleNodeClusterChecker = transportService.getThreadPool()
                .scheduleWithFixedDelay(() -> { checkSingleNodeCluster(); }, this.singleNodeClusterSeedHostsCheckInterval, Names.SAME);
        }

So I wonder why the invariant fails? It does not have to do with the settings (the settings are checked inside the checker function).

We may need to take a deeper look at the test failure to understand why it was null.

@kingherc
Copy link
Contributor

kingherc commented Sep 6, 2022

Hi @tlrx , may I actually take over the issue and make a new PR? I understood what's wrong and I can fix it. The problem is that I introduce the checker only when a node is becoming a leader in a single-node cluster, but there's also the case (as it's shown by the test) that it can become single-node cluster when a node leaves a two-node cluster.

@tlrx
Copy link
Member Author

tlrx commented Sep 7, 2022

So I wonder why the invariant fails?

The invariant fails because the cluster consists of two nodes (cancelling the single node cluster checker) and later a follower node leaves the cluster and joins back. After reading again #88013 this morning I understand that we should also emit a warning in this situation.

I only noticed yesterday that the failure was related to a test you wrote recently, otherwise I would have transferred the issue to you :) Feel free to close this one and open a better fix.

@kingherc
Copy link
Contributor

kingherc commented Sep 7, 2022

Thanks @tlrx will takeover then and make another PR. Indeed, I should re-introduce the check if it goes from 2 nodes to 1 node again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.4.2 v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] CoordinatorTests testDoesNotPerformElectionWhenRestartingFollower failing
3 participants