Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterDisruptionIT.testAckedIndexing test failure #70996

Closed
benwtrent opened this issue Mar 29, 2021 · 1 comment · Fixed by #74940
Closed

ClusterDisruptionIT.testAckedIndexing test failure #70996

benwtrent opened this issue Mar 29, 2021 · 1 comment · Fixed by #74940
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@benwtrent
Copy link
Member

Build scan:
https://gradle-enterprise.elastic.co/s/b3uzruazspiyw
Repro line:
./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.discovery.ClusterDisruptionIT.testAckedIndexing" -Dtests.seed=13B4CBEBAEF4BA7 -Dtests.security.manager=true -Dtests.locale=en-IE -Dtests.timezone=CTT -Druntime.java=11
Reproduces locally?:
Not with seed nor with running 10s of times locally.
Applicable branches:
master
Failure history:
Its failed on master just once in the last 30 days
Failure excerpt:

org.elasticsearch.discovery.ClusterDisruptionIT > testAckedIndexing FAILED
    java.lang.AssertionError: shard [test][0] on node [node_t0] has pending operations:
     --> BulkShardRequest [[test][0]] containing [index {[test][5], source[{"f0":6096035452205472999}]}]
    	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:198)
    	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2873)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:894)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:337)
    	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:293)
    	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:61)
    	at org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:787)
    	at org.elasticsearch.transport.TransportService$3.sendRequest(TransportService.java:114)
    	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:731)
    	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:645)
    	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:571)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.performAction(TransportReplicationAction.java:786)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.performLocalAction(TransportReplicationAction.java:757)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:744)
    	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction.runReroutePhase(TransportReplicationAction.java:178)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:173)
    	at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:83)
    	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:77)
    	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:53)
    	at org.elasticsearch.tasks.TaskManager.registerAndExecute(TaskManager.java:164)
    	at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:97)
    	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:500)
    	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    	at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:628)
    	at org.elasticsearch.action.bulk.TransportBulkAction.doInternalExecute(TransportBulkAction.java:237)
    	at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:158)
    	at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:88)
    	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:77)
    	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:53)
    	at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:41)
    	at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:25)
    	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:77)
    	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:53)
    	at org.elasticsearch.tasks.TaskManager.registerAndExecute(TaskManager.java:164)
    	at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:97)
    	at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:77)
    	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:375)
    	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:364)
    	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:34)
    	at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:55)
    	at org.elasticsearch.discovery.ClusterDisruptionIT.lambda$testAckedIndexing$3(ClusterDisruptionIT.java:160)
    	at java.base/java.lang.Thread.run(Thread.java:834)
        at __randomizedtesting.SeedInfo.seed([13B4CBEBAEF4BA7:8BFAF84DE692ADEC]:0)
        at org.elasticsearch.test.InternalTestCluster.lambda$assertNoPendingIndexOperations$12(InternalTestCluster.java:1197)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:958)
        at org.elasticsearch.test.InternalTestCluster.assertNoPendingIndexOperations(InternalTestCluster.java:1188)
        at org.elasticsearch.test.InternalTestCluster.beforeIndexDeletion(InternalTestCluster.java:1153)
        at org.elasticsearch.test.ESIntegTestCase.beforeIndexDeletion(ESIntegTestCase.java:576)
        at org.elasticsearch.discovery.AbstractDisruptionTestCase.beforeIndexDeletion(AbstractDisruptionTestCase.java:98)
        at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:546)
        at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2073)
        at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
        at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
        at java.base/java.lang.Thread.run(Thread.java:834)
@benwtrent benwtrent added >test-failure Triaged test failures from CI :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Mar 29, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 29, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@henningandersen henningandersen self-assigned this Mar 31, 2021
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Jul 5, 2021
RetryableAction uses randomized and exponential back off. If unlucky,
the randomization would cause a series of very short waits, which would
double the bound every time, risking a subsequent very long wait. Now
randomize between [bound/2, bound[.

Closes elastic#70996
henningandersen added a commit that referenced this issue Aug 5, 2021
RetryableAction uses randomized and exponential back off. If unlucky,
the randomization would cause a series of very short waits, which would
double the bound every time, risking a subsequent very long wait. Now
randomize between [bound/2, bound[.

Closes #70996
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Aug 5, 2021
RetryableAction uses randomized and exponential back off. If unlucky,
the randomization would cause a series of very short waits, which would
double the bound every time, risking a subsequent very long wait. Now
randomize between [bound/2, bound[.

Closes elastic#70996
henningandersen added a commit that referenced this issue Aug 5, 2021
RetryableAction uses randomized and exponential back off. If unlucky,
the randomization would cause a series of very short waits, which would
double the bound every time, risking a subsequent very long wait. Now
randomize between [bound/2, bound[.

Closes #70996
henningandersen added a commit that referenced this issue Aug 5, 2021
RetryableAction uses randomized and exponential back off. If unlucky,
the randomization would cause a series of very short waits, which would
double the bound every time, risking a subsequent very long wait. Now
randomize between [bound/2, bound[.

Closes #70996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants