Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch more tests to zen2 #36367

Merged
merged 8 commits into from
Dec 11, 2018
Merged

Conversation

andrershov
Copy link
Contributor

@andrershov andrershov commented Dec 7, 2018

  1. CCR tests work without any changes
  2. testDanglingIndices require changes the source code (added TODO).
  3. testIndexDeletionWhenNodeRejoins because it's using just two
    nodes, adding the node to exclusions is needed on restart.
  4. testCorruptTranslogTruncationOfReplica starts dedicated master
    one, because otherwise, the cluster does not form, if nodes are stopped
    and one node is started back.
  5. testResolvePath needs TEST cluster, because all nodes are stopped
    at the end of the test and it's not possible to perform checks needed
    by SUITE cluster.
  6. SnapshotDisruptionIT. Without changes, the test fails because Zen2
    retries snapshot creation as soon as network partition heals. This
    results into the race between creating snapshot and test cleanup logic
    (deleting index). Zen1 on the
    other hand, also schedules retry, but it takes some time after network
    partition heals, so cleanup logic executes latter and test passes. The
    check that snapshot is eventually created is added to
    the end of the test.

@andrershov andrershov requested a review from ywelsch December 7, 2018 14:25
@andrershov andrershov added the :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. label Dec 7, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@andrershov andrershov added the >test Issues or PRs that are addressing/adding tests label Dec 7, 2018
@@ -276,11 +276,18 @@ public void testTwoNodesSingleDoc() throws Exception {
}

public void testDanglingIndices() throws Exception {
/*TODO This test test does not work with Zen2, because once master node looses its cluster state during restart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the analysis on this one. It still has Zen2 disabled though? Can you just add the TODO comment, but leave the test unchanged until we have a proper fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -328,14 +335,14 @@ public boolean clearData(String nodeName) {
*/
public void testIndexDeletionWhenNodeRejoins() throws Exception {
final String indexName = "test-index-del-on-node-rejoin-idx";
final int numNodes = 2;
// We need at least 3 nodes to make sure, that once one node is stopped, remaining nodes can elect a new master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proper fix here is to add the down-scale / up-scale logic that's in InternalTestCluster#stopNodesAndClients to restartNode as well, just as we do with the autoManageMinMasterNodes scaling. The test should then not need any adaptation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -555,8 +546,10 @@ public Settings onNodeStopped(String nodeName) throws Exception {
}

public void testResolvePath() throws Exception {
internalCluster().startMasterOnlyNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one can be fixed by moving the scope to ESIntegTestCase.Scope.TEST. As these tests are starting up nodes anyway, there's no need to keep the cluster per suite. Maybe that also fixes the testCorruptTranslogTruncationOfReplica

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4d9d141 this helps this test but does not help another test in this class (it has different root cause)

Andrey Ershov added 4 commits December 7, 2018 18:39
Without changes test fails, because Zen2 retries snapshot creation
as soon as network partition heals. This results into race between
creating snapshot and test cleanup logic (deleting index).
Zen1 on the other hand, also schedules retry, but it takes some time
after network partition heals, so cleanup logic executes latter and test
passes.
The check that snapshot is eventually created is added to the end of the
test
@andrershov
Copy link
Contributor Author

@ywelsch I've addressed your comments + I've also fixed SnapshotDisruptionIT (see PR description) with edfa48a. Can you please take a look?

nodeAndClient.recreateNode(newSettings, () -> rebuildUnicastHostFiles(emptyList()));
nodeAndClient.startNode();
if (activeDisruptionScheme != null) {
activeDisruptionScheme.applyToNode(nodeAndClient.name, this);
}
if (callback.validateClusterForming() || updateMinMaster) {

if (callback.validateClusterForming() && excludedNodeIds.isEmpty() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an ||, not an &&?

@andrershov andrershov merged commit 8b82170 into elastic:master Dec 11, 2018
jasontedor added a commit to dnhatn/elasticsearch that referenced this pull request Dec 11, 2018
* elastic/master: (36 commits)
  Add check for minimum required Docker version (elastic#36497)
  Minor search controller changes (elastic#36479)
  Add default methods to DocValueFormat (elastic#36480)
  Fix the mixed cluster REST test explain/11_basic_with_types.
  Modify `BigArrays` to take name of circuit breaker (elastic#36461)
  Move LoggedExec to minimumRuntime source set (elastic#36453)
  added 6.5.4 version
  Add test logging for elastic#35644
  Tests- added helper methods to ESRestTestCase for checking warnings (elastic#36443)
  SQL: move requests' parameters to requests JSON body (elastic#36149)
  [Zen2] Respect the no_master_block setting (elastic#36478)
  Require soft-deletes when access changes snapshot (elastic#36446)
  Switch more tests to zen2 (elastic#36367)
  [Painless] Add extensive tests for def to primitive casts (elastic#36455)
  Add setting to bypass Rollover action (elastic#36235)
  Try running CI against Zulu (elastic#36391)
  [DOCS] Reworked the shard allocation filtering info.  (elastic#36456)
  Log [initial_master_nodes] on formation failure (elastic#36466)
  converting ForbiddenPatternsTask to .java (elastic#36194)
  fixed typo
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test Issues or PRs that are addressing/adding tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants