Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SnapshotStressTestsIT testRandomActivities failing #99477

Closed
cbuescher opened this issue Sep 12, 2023 · 3 comments · Fixed by #100809
Closed

[CI] SnapshotStressTestsIT testRandomActivities failing #99477

cbuescher opened this issue Sep 12, 2023 · 3 comments · Fixed by #100809
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@cbuescher
Copy link
Member

Build scan:
https://gradle-enterprise.elastic.co/s/2a3kxuit7ctay/tests/:server:internalClusterTest/org.elasticsearch.snapshots.SnapshotStressTestsIT/testRandomActivities

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.snapshots.SnapshotStressTestsIT.testRandomActivities" -Dtests.seed=818DDB1C47D6E681 -Dtests.locale=el-GR -Dtests.timezone=America/Iqaluit -Druntime.java=20

Applicable branches:
8.9

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.snapshots.SnapshotStressTestsIT&tests.test=testRandomActivities

Failure excerpt:

java.lang.AssertionError: failed to acquire all permits: [repo-0, snapshot-partial-11, snapshot-partial-10, index-0, index-1, index-2, index-3, index-4, node_s3, node_s2, node_s1, node_s0]

  at org.junit.Assert.fail(Assert.java:88)
  at org.elasticsearch.snapshots.SnapshotStressTestsIT$TrackedCluster.run(SnapshotStressTestsIT.java:342)
  at org.elasticsearch.snapshots.SnapshotStressTestsIT.testRandomActivities(SnapshotStressTestsIT.java:84)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:578)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1623)

@cbuescher cbuescher added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Sep 12, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 12, 2023
@cbuescher
Copy link
Member Author

Apart from several thread leak errors that are probably caused by the underlying failure, thi AssertionError also seems interesting:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=5682, name=elasticsearch[node_s1][snapshot][T#1], state=RUNNABLE, group=TGRP-SnapshotStressTestsIT]
Caused by: java.lang.AssertionError
	at __randomizedtesting.SeedInfo.seed([818DDB1C47D6E681]:0)
	at org.elasticsearch.repositories.SnapshotIndexCommit.indexCommit(SnapshotIndexCommit.java:60)
	at org.elasticsearch.snapshots.SnapshotShardsService.lambda$snapshot$3(SnapshotShardsService.java:394)
	at org.elasticsearch.action.ActionListener.run(ActionListener.java:382)
	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:364)
	at org.elasticsearch.snapshots.SnapshotShardsService.lambda$newShardSnapshotTask$2(SnapshotShardsService.java:291)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewSnapshots$1(SnapshotShardsService.java:259)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1623)

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes elastic#99477
@DaveCTurner DaveCTurner added the low-risk An open issue or test failure that is a low risk to future releases label Oct 13, 2023
DaveCTurner added a commit that referenced this issue Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes #99477
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes elastic#99477
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes elastic#99477
elasticsearchmachine pushed a commit that referenced this issue Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes #99477
elasticsearchmachine pushed a commit that referenced this issue Oct 13, 2023
It is not valid to call `SnapshotIndexCommit#indexCommit()` if the
snapshot is aborted, so we must compute `shardStateId` before adding the
abort listener.

Closes #99477
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants