Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

Closed
Bukhtawar opened this issue Oct 21, 2022 · 4 comments · Fixed by #15521
Assignees
Labels
bug Something isn't working Cluster Manager

Comments

@Bukhtawar
Copy link
Collaborator

Describe the bug
A node in an OpenSearch cluster can fail due to many reasons(health-check failure/lagging etc)which triggers a node-left cluster state update on the leader which is responsible for removing the connections as a part of applying the new(node-left) cluster state. However it is possible that a data node triggers a node-join quickly(before the leader has removed the connection as a part of cluster state apply) that reuses the connection and attempts to connect to the leader. At this point the leader starts to process a node-join request, updates its followers and schedules a follower checker on the newly joined node. But before the follower checker can get to this connection the in-flight node-left cluster state updates cleans up the connection only for the follower checker to realize the node is not connected thereby failing the follower checks and triggering another node-left. The leader doesn't even send the node-join cluster state to the data node since it thinks it isn't connected as a result the peer finder keeps on sending join request to the leader and the loop goes on.....

Node join

[2022-10-18T17:00:04,691][INFO ][o.e.c.s.MasterService    ] [684c88cd4e749b366ab49c1f195504da] node-join[{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir} join existing leader

Follower checker scheduled post node-join which fails to find a connection

[2022-10-18T17:00:04,788][DEBUG][o.e.c.c.FollowersChecker ] [684c88cd4e749b366ab49c1f195504da] FollowerChecker{discoveryNode={5b7033ca454040458aab223e1090e5f1}{uqcjvUYfSwK-paAxpuwzGA}{dLZWtCQITqa1GwOFQIgelA}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} disconnected
NodeNotConnectedException[[5b7033ca454040458aab223e1090e5f1][172.xx.xx.xx:9300] Node not connected]
        at org.elasticsearch.transport.ClusterConnectionManager.getConnection(ClusterConnectionManager.java:189)
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:682)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:602)
        at org.elasticsearch.cluster.coordination.FollowersChecker$FollowerChecker.handleWakeUp(FollowersChecker.java:326)
        at org.elasticsearch.cluster.coordination.FollowersChecker$FollowerChecker.start(FollowersChecker.java:304)
        at org.elasticsearch.cluster.coordination.FollowersChecker.lambda$setCurrentNodes$3(FollowersChecker.java:155)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
        at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:735)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at org.elasticsearch.cluster.coordination.FollowersChecker.setCurrentNodes(FollowersChecker.java:148)
        at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1115)
        at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:288)
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:270)
        at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73)
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:155)
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:693)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
[2022-10-18T17:00:04,790][DEBUG][o.e.c.c.FollowersChecker ] [684c88cd4e749b366ab49c1f195504da] FollowerChecker{discoveryNode={7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} marking node as faulty

Data node gets added without the connection mapped(non-verifiable through logs)

[2022-10-18T17:00:21,139][INFO ][o.e.c.s.ClusterApplierService] [684c88cd4e749b366ab49c1f195504da] added {{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir},{5b7033ca454040458aab223e1090e5f1}{uqcjvUYfSwK-paAxpuwzGA}{dLZWtCQITqa1GwOFQIgelA}

Data node gets removed

[2022-10-18T17:00:26,326][INFO ][o.e.c.s.MasterService    ] [684c88cd4e749b366ab49c1f195504da] node-left[{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir} reason: disconnected
[2022-10-18T17:00:56,363][INFO ][o.e.c.s.ClusterApplierService] [684c88cd4e749b366ab49c1f195504da] removed {{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir},

Expected behavior
node join and leaves shouldn't interfere and allow transitions to happens cleanly without getting deadlocked

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@Bukhtawar Bukhtawar added bug Something isn't working untriaged distributed framework and removed untriaged labels Oct 21, 2022
@muralikpbhat
Copy link

muralikpbhat commented Oct 22, 2022

Interesting, is there no ordering guarantee in cluster manager processing (it processed node-join before node-left)?

@Bukhtawar
Copy link
Collaborator Author

Thats right @muralikpbhat, the node-left cluster state apply doesn't block on disconnections to complete as is done asynchronously

public void disconnectFromNodesExcept(DiscoveryNodes discoveryNodes) {
final List<Runnable> runnables = new ArrayList<>();
synchronized (mutex) {
final Set<DiscoveryNode> nodesToDisconnect = new HashSet<>(targetsByNode.keySet());
for (final DiscoveryNode discoveryNode : discoveryNodes) {
nodesToDisconnect.remove(discoveryNode);
}
for (final DiscoveryNode discoveryNode : nodesToDisconnect) {
runnables.add(targetsByNode.get(discoveryNode).disconnect());
}
}
runnables.forEach(Runnable::run);
}

@indrajohn7
Copy link
Contributor

Looking into it.

@gbbafna
Copy link
Collaborator

gbbafna commented Jan 26, 2024

@amkhar : Can we please find owner for this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

7 participants