[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

Bukhtawar · 2022-10-21T19:29:27Z

Describe the bug
A node in an OpenSearch cluster can fail due to many reasons(health-check failure/lagging etc)which triggers a node-left cluster state update on the leader which is responsible for removing the connections as a part of applying the new(node-left) cluster state. However it is possible that a data node triggers a node-join quickly(before the leader has removed the connection as a part of cluster state apply) that reuses the connection and attempts to connect to the leader. At this point the leader starts to process a node-join request, updates its followers and schedules a follower checker on the newly joined node. But before the follower checker can get to this connection the in-flight node-left cluster state updates cleans up the connection only for the follower checker to realize the node is not connected thereby failing the follower checks and triggering another node-left. The leader doesn't even send the node-join cluster state to the data node since it thinks it isn't connected as a result the peer finder keeps on sending join request to the leader and the loop goes on.....

Node join

[2022-10-18T17:00:04,691][INFO ][o.e.c.s.MasterService    ] [684c88cd4e749b366ab49c1f195504da] node-join[{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir} join existing leader

Follower checker scheduled post `node-join` which fails to find a connection

[2022-10-18T17:00:04,788][DEBUG][o.e.c.c.FollowersChecker ] [684c88cd4e749b366ab49c1f195504da] FollowerChecker{discoveryNode={5b7033ca454040458aab223e1090e5f1}{uqcjvUYfSwK-paAxpuwzGA}{dLZWtCQITqa1GwOFQIgelA}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} disconnected
NodeNotConnectedException[[5b7033ca454040458aab223e1090e5f1][172.xx.xx.xx:9300] Node not connected]
        at org.elasticsearch.transport.ClusterConnectionManager.getConnection(ClusterConnectionManager.java:189)
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:682)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:602)
        at org.elasticsearch.cluster.coordination.FollowersChecker$FollowerChecker.handleWakeUp(FollowersChecker.java:326)
        at org.elasticsearch.cluster.coordination.FollowersChecker$FollowerChecker.start(FollowersChecker.java:304)
        at org.elasticsearch.cluster.coordination.FollowersChecker.lambda$setCurrentNodes$3(FollowersChecker.java:155)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
        at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:735)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at org.elasticsearch.cluster.coordination.FollowersChecker.setCurrentNodes(FollowersChecker.java:148)
        at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1115)
        at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:288)
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:270)
        at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73)
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:155)
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:693)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

[2022-10-18T17:00:04,790][DEBUG][o.e.c.c.FollowersChecker ] [684c88cd4e749b366ab49c1f195504da] FollowerChecker{discoveryNode={7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} marking node as faulty

Data node gets added without the connection mapped(non-verifiable through logs)

[2022-10-18T17:00:21,139][INFO ][o.e.c.s.ClusterApplierService] [684c88cd4e749b366ab49c1f195504da] added {{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir},{5b7033ca454040458aab223e1090e5f1}{uqcjvUYfSwK-paAxpuwzGA}{dLZWtCQITqa1GwOFQIgelA}

Data node gets removed

[2022-10-18T17:00:26,326][INFO ][o.e.c.s.MasterService    ] [684c88cd4e749b366ab49c1f195504da] node-left[{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir} reason: disconnected

[2022-10-18T17:00:56,363][INFO ][o.e.c.s.ClusterApplierService] [684c88cd4e749b366ab49c1f195504da] removed {{7c1864abfe7e2a2b95a63b437464cfcf}{p12Bffu0Rf2KRQyoqFD_2A}{2Z1b2cXgTHauEVdfXMfspw}{172.xx.xx.xx}{172.xx.xx.xx:9300}{dir},

Expected behavior
node join and leaves shouldn't interfere and allow transitions to happens cleanly without getting deadlocked

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

muralikpbhat · 2022-10-22T06:53:21Z

Interesting, is there no ordering guarantee in cluster manager processing (it processed node-join before node-left)?

Bukhtawar · 2022-10-22T07:55:32Z

Thats right @muralikpbhat, the node-left cluster state apply doesn't block on disconnections to complete as is done asynchronously

OpenSearch/server/src/main/java/org/opensearch/cluster/NodeConnectionsService.java

Lines 164 to 177 in 3aef125

    
           public void disconnectFromNodesExcept(DiscoveryNodes discoveryNodes) { 
        
               final List<Runnable> runnables = new ArrayList<>(); 
        
               synchronized (mutex) { 
        
                   final Set<DiscoveryNode> nodesToDisconnect = new HashSet<>(targetsByNode.keySet()); 
        
                   for (final DiscoveryNode discoveryNode : discoveryNodes) { 
        
                       nodesToDisconnect.remove(discoveryNode); 
        
                   } 
        
                   for (final DiscoveryNode discoveryNode : nodesToDisconnect) { 
        
                       runnables.add(targetsByNode.get(discoveryNode).disconnect()); 
        
                   } 
        
               } 
        
               runnables.forEach(Runnable::run); 
        
           }

indrajohn7 · 2023-07-07T05:42:14Z

Looking into it.

gbbafna · 2024-01-26T03:36:10Z

@amkhar : Can we please find owner for this issue ?

Bukhtawar added bug Something isn't working untriaged distributed framework and removed untriaged labels Oct 21, 2022

anasalkouz added Migration:Backlog and removed Migration:Backlog labels Mar 17, 2023

Bukhtawar mentioned this issue Jun 13, 2023

Establish seed node connections in async during node bootstrap #8038

Merged

6 tasks

shwetathareja added the Cluster Manager label Jul 5, 2023

shwetathareja assigned indrajohn7 Jul 7, 2023

anasalkouz removed the distributed framework label Sep 19, 2023

gbbafna unassigned indrajohn7 Jan 26, 2024

gashutos self-assigned this Feb 28, 2024

rahulkarajgikar mentioned this issue Sep 5, 2024

Fix for race condition in node-join/node-left loop #15521

Merged

1 task

shwetathareja closed this as completed in #15521 Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

Bukhtawar commented Oct 21, 2022

muralikpbhat commented Oct 22, 2022 •

edited

Loading

Bukhtawar commented Oct 22, 2022

indrajohn7 commented Jul 7, 2023

gbbafna commented Jan 26, 2024

[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

[BUG] Race in node-left and node-join can prevent node from joining the cluster indefinitely #4874

Comments

Bukhtawar commented Oct 21, 2022

Node join

Follower checker scheduled post node-join which fails to find a connection

Data node gets added without the connection mapped(non-verifiable through logs)

Data node gets removed

muralikpbhat commented Oct 22, 2022 • edited Loading

Bukhtawar commented Oct 22, 2022

indrajohn7 commented Jul 7, 2023

gbbafna commented Jan 26, 2024

Follower checker scheduled post `node-join` which fails to find a connection

muralikpbhat commented Oct 22, 2022 •

edited

Loading