-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stabilizing org.opensearch.cluster.routing.MovePrimaryFirstTests.test… #2048
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,12 +12,14 @@ | |
import org.opensearch.action.admin.cluster.settings.ClusterUpdateSettingsRequest; | ||
import org.opensearch.cluster.ClusterStateListener; | ||
import org.opensearch.common.settings.Settings; | ||
import org.opensearch.common.unit.TimeValue; | ||
import org.opensearch.test.InternalTestCluster; | ||
import org.opensearch.test.OpenSearchIntegTestCase; | ||
|
||
import java.util.ArrayList; | ||
import java.util.Iterator; | ||
import java.util.List; | ||
import java.util.concurrent.CountDownLatch; | ||
import java.util.stream.Stream; | ||
|
||
import static org.opensearch.test.hamcrest.OpenSearchAssertions.assertAcked; | ||
|
||
|
@@ -84,19 +86,24 @@ public void testClusterGreenAfterPartialRelocation() throws InterruptedException | |
final ClusterStateListener listener = event -> { | ||
if (event.routingTableChanged()) { | ||
final RoutingNodes routingNodes = event.state().getRoutingNodes(); | ||
int startedz2n1 = 0; | ||
int startedz2n2 = 0; | ||
int startedCount = 0; | ||
List<ShardRouting> initz2n1 = new ArrayList<>(), initz2n2 = new ArrayList<>(); | ||
for (Iterator<RoutingNode> it = routingNodes.iterator(); it.hasNext();) { | ||
RoutingNode routingNode = it.next(); | ||
final String nodeName = routingNode.node().getName(); | ||
if (nodeName.equals(z2n1)) { | ||
startedz2n1 = routingNode.numberOfShardsWithState(ShardRoutingState.STARTED); | ||
startedCount += routingNode.numberOfShardsWithState(ShardRoutingState.STARTED); | ||
initz2n1 = routingNode.shardsWithState(ShardRoutingState.INITIALIZING); | ||
} else if (nodeName.equals(z2n2)) { | ||
startedz2n2 = routingNode.numberOfShardsWithState(ShardRoutingState.STARTED); | ||
startedCount += routingNode.numberOfShardsWithState(ShardRoutingState.STARTED); | ||
initz2n2 = routingNode.shardsWithState(ShardRoutingState.INITIALIZING); | ||
} | ||
} | ||
if (startedz2n1 >= primaryShardCount / 2 && startedz2n2 >= primaryShardCount / 2) { | ||
primaryMoveLatch.countDown(); | ||
if (!Stream.concat(initz2n1.stream(), initz2n2.stream()).anyMatch(s -> s.primary())) { | ||
// All primaries are relocated before 60% of overall shards are started on new nodes | ||
if (primaryShardCount <= startedCount && startedCount <= 6 * primaryShardCount / 5) { | ||
primaryMoveLatch.countDown(); | ||
} | ||
} | ||
} | ||
}; | ||
|
@@ -113,6 +120,6 @@ public void testClusterGreenAfterPartialRelocation() throws InterruptedException | |
internalCluster().stopRandomNode(InternalTestCluster.nameFilter(z1n1)); | ||
internalCluster().stopRandomNode(InternalTestCluster.nameFilter(z1n2)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have to shutdown nodes z2n1 and z2n2 as well here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are 4 nodes in the cluster. If we shutdown all 4, cluster will not be green. We want to shutdown all excluded nodes (in this case 2) after 60% of total shards have relocated to z2n1 and z2n2. Due to [#1445 ] all primaries would have started in those 60% and hence, cluster will become eventually green |
||
} catch (Exception e) {} | ||
ensureGreen(TimeValue.timeValueSeconds(60)); | ||
ensureGreen(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's obvious, but it is not immediately clear to me why the
6 * primaryShardCount / 5
math is correct for calculating that 60% of shards are started on new nodes. Can you explain how this works?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total number of shards are double the primary shard count (1 replica) -
2 * primaryShardCount
. Hence, 60% of total shards is3 * total number of shards / 5
which is same as6 * primaryShardCount / 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I suggest creating an intermediate variable just to make this more readable, like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense