MixedClusterClientYamlTestSuiteIT failure due to trying to delete indices being snapshotted/creating indices that already exist #39721

gwbrown · 2019-03-05T21:17:31Z

This hit a 6.7 intake build on one of my commits. I'm pretty sure it's not related to the changes in the commit, as they pertain mostly to Watcher.

CI Link: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+intake/306/console

A ton of stuff in MixedClusterClientYamlTestSuiteIT failed, none reproduces locally. Sample reproduce line:

./gradlew :qa:mixed-cluster:v5.6.16#mixedClusterTestRunner \
  -Dtests.seed=AFFFC69E10895A69 \
  -Dtests.class=org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT \
  -Dtests.method="test {p0=cat.snapshots/10_basic/Test cat snapshots output}" \
  -Dtests.security.manager=true \
  -Dtests.locale=he-IL \
  -Dtests.timezone=America/Grenada \
  -Dcompiler.java=11 \
  -Druntime.java=8

There's two kinds of exceptions that keep popping up in the logs that look like they might be related.

One is a failure to delete some indices that are currently being snapshotted:

org.elasticsearch.client.ResponseException: method [DELETE], host [http://[::1]:33913], URI [*], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[node-0][127.0.0.1:44482][indices:admin/delete]"}],"type":"illegal_argument_exception","reason":"Cannot delete indices that are being snapshotted: [[index2/vK05jwCyTIiYeTcVLqKQpw], [index1/TvYH0C5FSHSG4l_PhEQMHA]]. Try again after snapshot finishes or cancel the currently running snapshot."},"status":400}
	at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:936)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233)
	at org.elasticsearch.test.rest.ESRestTestCase.wipeCluster(ESRestTestCase.java:455)
	at org.elasticsearch.test.rest.ESRestTestCase.cleanUpCluster(ESRestTestCase.java:273)
	at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.client.ResponseException: method [DELETE], host [http://[::1]:33913], URI [*], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[node-0][127.0.0.1:44482][indices:admin/delete]"}],"type":"illegal_argument_exception","reason":"Cannot delete indices that are being snapshotted: [[index2/vK05jwCyTIiYeTcVLqKQpw], [index1/TvYH0C5FSHSG4l_PhEQMHA]]. Try again after snapshot finishes or cancel the currently running snapshot."},"status":400}
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:552)
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:537)
	at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
	... 1 more

And the other is trying to create an index that already exists, which just so happens to be one of the indices that couldn't be deleted because it was being snapshotted. This one is a bit harder to get a clean stack trace of, so here's a snippet of the returned JSON:

"error" : {
  1>         "root_cause" : [
  1>           {
  1>             "type" : "index_already_exists_exception",
  1>             "reason" : "index [index1/TvYH0C5FSHSG4l_PhEQMHA] already exists",
  1>             "index_uuid" : "TvYH0C5FSHSG4l_PhEQMHA",
  1>             "index" : "index1",
  1>             "stack_trace" : "[index1/TvYH0C5FSHSG4l_PhEQMHA] ResourceAlreadyExistsException[index [index1/TvYH0C5FSHSG4l_PhEQMHA] already exists]
  1> 	at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validateIndexName(MetaDataCreateIndexService.java:147)
  1> 	at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.validate(MetaDataCreateIndexService.java:512)
  1> 	at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService.access$000(MetaDataCreateIndexService.java:106)
  1> 	at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.execute(MetaDataCreateIndexService.java:239)
  1> 	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
  1> 	at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:634)
  1> 	at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:612)
  1> 	at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:571)
  1> 	at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263)
  1> 	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
  1> 	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576)
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247)

Logs, for posterity: consoleText.txt.zip

[edit: pasted the wrong second snippet the first time]

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-03-05T21:17:58Z

Pinging @elastic/es-core-infra

gwbrown · 2019-03-05T21:18:33Z

I've tagged this as core/infra/build mostly because there's a ton of stuff in this test suite and I'm not sure whether this is a problem with one of the tests, or the test infrastructure.

elasticmachine · 2019-03-05T21:22:33Z

Pinging @elastic/es-distributed

original-brownbear · 2019-03-05T23:01:26Z

This is a snapshot Bwc issue between 6.7.0 and 5.6.x resulting from #39550. I'll deal with this tomorrow.
Also, link #39662 which makes the logging for this failure a lot less painful.

original-brownbear · 2019-03-06T09:39:02Z

I tracked this down I think, we are sending the wrong snapshot shard status update message to 5.6 nodes from 6.7, fixing now.

* We were sending the wrong snapshot shard status update format to 5.6 (but reading the correct version) so tests would fail with 5.6 masters but not with 5.6 nodes running against a 6.7 master * Closes elastic#39721

* Fix Snapshot BwC with Version 5.6.x * We were sending the wrong snapshot shard status update format to 5.6 (but reading the correct version) so tests would fail with 5.6 masters but not with 5.6 nodes running against a 6.7 master * Closes #39721

original-brownbear · 2019-03-06T13:53:43Z

Closed via #39737

gwbrown added >test-failure Triaged test failures from CI :Delivery/Build Build or test infrastructure labels Mar 5, 2019

original-brownbear self-assigned this Mar 5, 2019

original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Delivery/Build Build or test infrastructure labels Mar 5, 2019

original-brownbear mentioned this issue Mar 6, 2019

Fix Snapshot BwC with Version 5.6.x #39737

Merged

original-brownbear closed this as completed Mar 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MixedClusterClientYamlTestSuiteIT failure due to trying to delete indices being snapshotted/creating indices that already exist #39721

MixedClusterClientYamlTestSuiteIT failure due to trying to delete indices being snapshotted/creating indices that already exist #39721

gwbrown commented Mar 5, 2019 •

edited

Loading

elasticmachine commented Mar 5, 2019

gwbrown commented Mar 5, 2019 •

edited

Loading

elasticmachine commented Mar 5, 2019

original-brownbear commented Mar 5, 2019

original-brownbear commented Mar 6, 2019

original-brownbear commented Mar 6, 2019

MixedClusterClientYamlTestSuiteIT failure due to trying to delete indices being snapshotted/creating indices that already exist #39721

MixedClusterClientYamlTestSuiteIT failure due to trying to delete indices being snapshotted/creating indices that already exist #39721

Comments

gwbrown commented Mar 5, 2019 • edited Loading

elasticmachine commented Mar 5, 2019

gwbrown commented Mar 5, 2019 • edited Loading

elasticmachine commented Mar 5, 2019

original-brownbear commented Mar 5, 2019

original-brownbear commented Mar 6, 2019

original-brownbear commented Mar 6, 2019

gwbrown commented Mar 5, 2019 •

edited

Loading

gwbrown commented Mar 5, 2019 •

edited

Loading