-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail allocation of new primaries in empty cluster #43284
Conversation
Addition of test case that creates the scenario when there are no data nodes in Cluster and user tries for index Creation. Changing the status of primary shards that are unassigned to AllocationStatus.Deciders_NO when there are no data nodes helps in solving this issue
Pinging @elastic/es-distributed |
Hey! Any comments on the PR? |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Gaurav614 for taking this on. There's been a few changes in master
that mean your branch no longer compiles, but it will be simple enough to fix it up. I have left some small suggestions, including an idea for a slightly different test that covers a few more cases in a simpler way.
...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Show resolved
Hide resolved
...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/health/NoDataNodesHealthTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/health/NoDataNodesHealthTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/health/NoDataNodesHealthTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/health/NoDataNodesHealthTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/health/NoDataNodesHealthTests.java
Outdated
Show resolved
Hide resolved
@DaveCTurner Thanks for reviewing the pull request. I am working on the fix to the comments. |
Hi @Gaurav614, just checking if you're still working on this and if you need any help? |
@DaveCTurner Thanks for your helping gesture. I am working on it. And apologies for the delay, as I was out due to some personal emergency. |
Hey @Gaurav614, just checking in again. Are you still working on this PR? |
Hey @DaveCTurner, I had pushed the new commits on 25th July. And requested for your review on it. Kindly review that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Gaurav614; the last message I received on this PR indicated there'd be more changes so I was waiting for them to land before taking a look. I left some more suggestions. Could you merge the latest master too?
@@ -141,6 +143,24 @@ public ShardAllocationDecision decideShardAllocation(final ShardRouting shard, f | |||
return new ShardAllocationDecision(allocateUnassignedDecision, moveDecision); | |||
} | |||
|
|||
private void failedAllocationOfNewPrimaries(RoutingAllocation allocation){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor change:
private void failedAllocationOfNewPrimaries(RoutingAllocation allocation){ | |
private void failAllocationOfNewPrimaries(RoutingAllocation allocation) { |
(change name to the imperative mood, and fix whitespace)
RoutingNodes routingNodes = allocation.routingNodes(); | ||
assert routingNodes.size() == 0 : routingNodes; | ||
RoutingNodes.UnassignedShards unassignedShards = routingNodes.unassigned(); | ||
RoutingNodes.UnassignedShards.UnassignedIterator unassignedIterator = unassignedShards.iterator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RoutingNodes.UnassignedShards.UnassignedIterator unassignedIterator = unassignedShards.iterator(); | |
final RoutingNodes.UnassignedShards.UnassignedIterator unassignedIterator = routingNodes.unassigned().iterator(); |
private void failedAllocationOfNewPrimaries(RoutingAllocation allocation){ | ||
RoutingNodes routingNodes = allocation.routingNodes(); | ||
assert routingNodes.size() == 0 : routingNodes; | ||
RoutingNodes.UnassignedShards unassignedShards = routingNodes.unassigned(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RoutingNodes.UnassignedShards unassignedShards = routingNodes.unassigned(); |
RoutingNodes.UnassignedShards unassignedShards = routingNodes.unassigned(); | ||
RoutingNodes.UnassignedShards.UnassignedIterator unassignedIterator = unassignedShards.iterator(); | ||
while (unassignedIterator.hasNext()) { | ||
ShardRouting shardRouting = unassignedIterator.next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ShardRouting shardRouting = unassignedIterator.next(); | |
final ShardRouting shardRouting = unassignedIterator.next(); |
RoutingNodes.UnassignedShards.UnassignedIterator unassignedIterator = unassignedShards.iterator(); | ||
while (unassignedIterator.hasNext()) { | ||
ShardRouting shardRouting = unassignedIterator.next(); | ||
UnassignedInfo unassignedInfo = shardRouting.unassignedInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnassignedInfo unassignedInfo = shardRouting.unassignedInfo(); | |
final UnassignedInfo unassignedInfo = shardRouting.unassignedInfo(); |
routingTable = RoutingTable.builder(routingTable).remove("test").build(); | ||
metaData = MetaData.builder(clusterState.metaData()).remove("test").build(); | ||
clusterState = ClusterState.builder(clusterState).routingTable(routingTable).metaData(metaData).build(); | ||
assertTrue(clusterState.nodes().getDataNodes().size() == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, prefer assertEquals
here.
assertTrue(clusterState.nodes().getDataNodes().size() == 0); | |
assertEquals(0, clusterState.nodes().getDataNodes().size()); |
.nodes(DiscoveryNodes.builder(clusterState.getNodes()) | ||
.remove(nodeName).build()) | ||
.build(); | ||
clusterState = createAllocationService().deassociateDeadNodes(clusterState, true, "reroute"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should pass the allocationService
into this method rather than creating a new one here. Also there's no need to mutate clusterState
:
clusterState = createAllocationService().deassociateDeadNodes(clusterState, true, "reroute"); | |
return allocationService.deassociateDeadNodes(clusterState, true, "reroute"); |
|
||
DiscoveryNodes.Builder nodeBuilder = DiscoveryNodes.builder(clusterState.getNodes()); | ||
if (isMaster) { | ||
nodeBuilder = nodeBuilder.add(newNode(nodeName, Collections.singleton(DiscoveryNode.Role.MASTER))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to mutate nodeBuilder
here:
nodeBuilder = nodeBuilder.add(newNode(nodeName, Collections.singleton(DiscoveryNode.Role.MASTER))); | |
nodeBuilder.add(newNode(nodeName, Collections.singleton(DiscoveryNode.Role.MASTER))); |
if (isMaster) { | ||
nodeBuilder = nodeBuilder.add(newNode(nodeName, Collections.singleton(DiscoveryNode.Role.MASTER))); | ||
} else { | ||
nodeBuilder = nodeBuilder.add(newNode(nodeName, Collections.singleton(DiscoveryNode.Role.DATA))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to mutate nodeBuilder
here:
nodeBuilder = nodeBuilder.add(newNode(nodeName, Collections.singleton(DiscoveryNode.Role.DATA))); | |
nodeBuilder.add(newNode(nodeName, Collections.singleton(DiscoveryNode.Role.DATA))); |
} else { | ||
nodeBuilder = nodeBuilder.add(newNode(nodeName, Collections.singleton(DiscoveryNode.Role.DATA))); | ||
} | ||
clusterState = ClusterState.builder(clusterState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to mutate clusterState
here:
clusterState = ClusterState.builder(clusterState) | |
return ClusterState.builder(clusterState).nodes(nodeBuilder).build(); |
server/src/test/java/org/elasticsearch/cluster/health/ClusterHealthAllocationTests.java
Show resolved
Hide resolved
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work @Gaurav614, thanks for the extra iterations to get this so polished. I merged master
, fixed a couple of whitespace issues, and triggered a CI run.
CI failures were apparently expected, see 3c57592. @elasticmachine test this please |
Today if you create an index in a cluster without any data nodes then it will report yellow health because it never attempts to assign any shards if there are no data nodes, so the new shards remain at `AllocationStatus.NO_ATTEMPT`. This commit moves the new primaries to `AllocationStatus.DECIDERS_NO` in this situation, causing the cluster health to move to red. Fixes #41073
@DaveCTurner Thanks a lot man for merging the changes !! |
Today if you create an index in a cluster without any data nodes then it will
report
yellow
health because it never attempts to assign any shards if thereare no data nodes, so the new shards remain at
AllocationStatus.NO_ATTEMPT
. This commitmoves the new primaries to
AllocationStatus.DECIDERS_NO
in this situation,causing the cluster health to move to
red
.Fixes #41073