Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterRestoreGlobalMetadata is flaky #10749

Closed
reta opened this issue Oct 19, 2023 · 8 comments · Fixed by #11981
Assignees
Labels
bug Something isn't working Cluster Manager flaky-test Random test failure that succeeds on second run Storage:Remote untriaged

Comments

@reta
Copy link
Collaborator

reta commented Oct 19, 2023

Describe the bug
The test case org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterRestoreGlobalMetadata is flaky:

java.lang.AssertionError: timed out waiting for green state

java.lang.AssertionError: timed out waiting for green state
	at org.junit.Assert.fail(Assert.java:89)
	at org.opensearch.test.OpenSearchIntegTestCase.ensureColor(OpenSearchIntegTestCase.java:1012)
	at org.opensearch.test.OpenSearchIntegTestCase.ensureGreen(OpenSearchIntegTestCase.java:943)
	at org.opensearch.test.OpenSearchIntegTestCase.ensureGreen(OpenSearchIntegTestCase.java:932)
	at org.opensearch.remotestore.BaseRemoteStoreRestoreIT.verifyRestoredData(BaseRemoteStoreRestoreIT.java:63)
	at org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterRestoreGlobalMetadata(RemoteStoreClusterStateRestoreIT.java:249)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1623)

To Reproduce

./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterRestoreGlobalMetadata" -Dtests.seed=4FA936F087F8BF6B 

Expected behavior
The test must always pass

Plugins
Standard

Screenshots
N/A

Host/Environment (please complete the following information):

  • CI

Additional context

@linuxpi
Copy link
Collaborator

linuxpi commented Oct 30, 2023

We have fixed all flaky tests in org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT. Closing, please reopen if you see any failures

@linuxpi linuxpi closed this as completed Oct 30, 2023
@reta reta reopened this Nov 6, 2023
@reta
Copy link
Collaborator Author

reta commented Nov 6, 2023

And again here https://build.ci.opensearch.org/job/gradle-check/29586/testReport/junit/org.opensearch.remotestore/RemoteStoreClusterStateRestoreIT/testFullClusterRestoreGlobalMetadata/

java.lang.IllegalStateException: Error while downloading cluster metadata - manifest__9223372036854775806__9223372036854775776__P__9223370337568370205__1

@mch2
Copy link
Member

mch2 commented Nov 6, 2023

Got a failure in this test class on [testFullClusterRestoreMultipleIndices](https://build.ci.opensearch.org/job/gradle-check/29562/testReport/junit/org.opensearch.remotestore/RemoteStoreClusterStateRestoreIT/testFullClusterRestoreMultipleIndices/) - https://build.ci.opensearch.org/job/gradle-check/29562/

@peternied
Copy link
Member

Another failure, impacted #11102, logs.

@bowenlan-amzn
Copy link
Member

bowenlan-amzn commented Feb 10, 2024

saw this test failed again https://build.ci.opensearch.org/job/gradle-check/33624/testReport/
@linuxpi could you maybe take a look

notice your fix is not on 2.12, backporting #12282

@peternied
Copy link
Member

@msfroh could you close when this has been backported to 2.12/2.x?

@bowenlan-amzn
Copy link
Member

Close as the backport 2.12 PR has been merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager flaky-test Random test failure that succeeds on second run Storage:Remote untriaged
Projects
None yet
6 participants