Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.remotestore.RemoteStoreRestoreIT.testRTSRestoreWithRefreshedDataPrimaryReplicaDown is flaky #9408

Closed
sachinpkale opened this issue Aug 17, 2023 · 4 comments
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Durability Issues and PRs related to the durability framework v2.10.0

Comments

@sachinpkale
Copy link
Member

java.lang.AssertionError: timed out waiting for green state
	at __randomizedtesting.SeedInfo.seed([BFEADCEF559B8083:2872BB4C7EA40A62]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.opensearch.test.OpenSearchIntegTestCase.ensureColor(OpenSearchIntegTestCase.java:1013)
	at org.opensearch.test.OpenSearchIntegTestCase.ensureGreen(OpenSearchIntegTestCase.java:944)
	at org.opensearch.test.OpenSearchIntegTestCase.ensureGreen(OpenSearchIntegTestCase.java:933)
	at org.opensearch.remotestore.RemoteStoreRestoreIT.restoreAndVerify(RemoteStoreRestoreIT.java:172)
	at org.opensearch.remotestore.RemoteStoreRestoreIT.testRestoreFlowBothPrimaryReplicasDown(RemoteStoreRestoreIT.java:218)
	at org.opensearch.remotestore.RemoteStoreRestoreIT.testRTSRestoreWithRefreshedDataPrimaryReplicaDown(RemoteStoreRestoreIT.java:158)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)

org.opensearch.indices.recovery.RecoveryFailedException: [remote-store-test-idx-1][3]: Recovery failed on {node_s4}{EBAFjMZUR868mBrY09ac-w}{_jc6Y8ZwQUCatSiLOTuuaQ}{127.0.0.1}{127.0.0.1:64717}{d}{shard_indexing_pressure_enabled=true}
	at org.opensearch.index.shard.IndexShard.lambda$executeRecovery$31(IndexShard.java:3566) ~[main/:?]
	at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:88) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.index.shard.StoreRecovery.lambda$recoveryListener$8(StoreRecovery.java:503) ~[main/:?]
	at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:88) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:345) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.index.shard.StoreRecovery.recoverFromRemoteStore(StoreRecovery.java:128) ~[main/:?]
	at org.opensearch.index.shard.IndexShard.restoreFromRemoteStore(IndexShard.java:2644) ~[main/:?]
	at org.opensearch.index.shard.IndexShard.lambda$startRecovery$25(IndexShard.java:3461) ~[main/:?]
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) ~[main/:?]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[main/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1623) [?:?]
Caused by: org.opensearch.index.shard.IndexShardRecoveryException: Exception while recovering from remote store
	at org.opensearch.index.shard.StoreRecovery.recoverFromRemoteStore(StoreRecovery.java:551) ~[main/:?]
	at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromRemoteStore$1(StoreRecovery.java:130) ~[main/:?]
	at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:342) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	... 9 more
Caused by: java.nio.file.NoSuchFileException: /Users/kalsac/Codebase/OpenSearch/server/build/testrun/internalClusterTest/temp/org.opensearch.remotestore.RemoteStoreRestoreIT_BFEADCEF559B8083-001/tempDir-002/repos/dddtNMkvEO/KkAE4ha_T1SBi4EoHvJ_2g/3/translog/data/1/translog-10.ckp
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:261) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:379) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:431) ~[?:?]
	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422) ~[?:?]
	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:193) ~[lucene-test-framework-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:193) ~[lucene-test-framework-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:193) ~[lucene-test-framework-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newInputStream(HandleTrackingFS.java:94) ~[lucene-test-framework-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:193) ~[lucene-test-framework-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newInputStream(HandleTrackingFS.java:94) ~[lucene-test-framework-9.8.0-snapshot-4373c3b.jar:9.8.0-snapshot-4373c3b 4373c3b2612e54bc0c5b992d9423e83e6340fdd5 - 2023-07-24 17:45:44]
	at java.nio.file.Files.newInputStream(Files.java:159) ~[?:?]
	at org.opensearch.common.blobstore.fs.FsBlobContainer.readBlob(FsBlobContainer.java:170) ~[main/:?]
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.downloadBlob(BlobStoreTransferService.java:158) ~[main/:?]
	at org.opensearch.index.translog.transfer.TranslogTransferManager.downloadToFS(TranslogTransferManager.java:183) ~[main/:?]
	at org.opensearch.index.translog.transfer.TranslogTransferManager.downloadTranslog(TranslogTransferManager.java:169) ~[main/:?]
	at org.opensearch.index.translog.RemoteFsTranslog.download(RemoteFsTranslog.java:159) ~[main/:?]
	at org.opensearch.index.translog.RemoteFsTranslog.download(RemoteFsTranslog.java:142) ~[main/:?]
	at org.opensearch.index.shard.IndexShard.syncTranslogFilesFromRemoteTranslog(IndexShard.java:4653) ~[main/:?]
	at org.opensearch.index.shard.StoreRecovery.recoverFromRemoteStore(StoreRecovery.java:539) ~[main/:?]
	at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromRemoteStore$1(StoreRecovery.java:130) ~[main/:?]
	at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:342) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	... 9 more
@sachinpkale sachinpkale added bug Something isn't working untriaged Storage:Durability Issues and PRs related to the durability framework v2.10.0 and removed untriaged labels Aug 17, 2023
@tlfeng tlfeng added the flaky-test Random test failure that succeeds on second run label Aug 22, 2023
@sachinpkale
Copy link
Member Author

This is fixed with #8951

@reta
Copy link
Collaborator

reta commented Oct 20, 2023

Not fixed and coming back: https://build.ci.opensearch.org/job/gradle-check/28595/

@linuxpi
Copy link
Collaborator

linuxpi commented Jan 26, 2024

Not able to repro this with 2K iterations

@linuxpi
Copy link
Collaborator

linuxpi commented Feb 2, 2024

closing this due to lack of occurrence and repro. Please reopen if you find any failures

@linuxpi linuxpi closed this as completed Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Durability Issues and PRs related to the durability framework v2.10.0
Projects
None yet
Development

No branches or pull requests

5 participants