Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SearchableSnapshotsIntegTests testCreateAndRestoreSearchableSnapshot failing #77831

Closed
ywangd opened this issue Sep 16, 2021 · 2 comments · Fixed by #80341
Closed

[CI] SearchableSnapshotsIntegTests testCreateAndRestoreSearchableSnapshot failing #77831

ywangd opened this issue Sep 16, 2021 · 2 comments · Fixed by #80341
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@ywangd
Copy link
Member

ywangd commented Sep 16, 2021

Build scan:
https://gradle-enterprise.elastic.co/s/ni7wnr7bqmz3k/tests/:x-pack:plugin:searchable-snapshots:internalClusterTest/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests/testCreateAndRestoreSearchableSnapshot

Reproduction line:
./gradlew ':x-pack:plugin:searchable-snapshots:internalClusterTest' --tests "org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot" -Dtests.seed=DF8F0E1B6CCE9D8B -Dtests.locale=ga -Dtests.timezone=Etc/GMT -Druntime.java=11

Applicable branches:
7.x

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests&tests.test=testCreateAndRestoreSearchableSnapshot

Failure excerpt:

java.lang.AssertionError: expected:<true> but was:<false>

  at __randomizedtesting.SeedInfo.seed([DF8F0E1B6CCE9D8B:55EC7960FF59CC38]:0)
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:834)
  at org.junit.Assert.assertEquals(Assert.java:118)
  at org.junit.Assert.assertEquals(Assert.java:144)
  at org.elasticsearch.xpack.searchablesnapshots.BaseSearchableSnapshotsIntegTestCase.assertShardFolders(BaseSearchableSnapshotsIntegTestCase.java:246)
  at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot(SearchableSnapshotsIntegTests.java:261)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:566)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
  at java.lang.Thread.run(Thread.java:829)

@ywangd ywangd added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Sep 16, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 16, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx tlrx self-assigned this Sep 16, 2021
@tlrx
Copy link
Member

tlrx commented Nov 3, 2021

tlrx added a commit that referenced this issue Nov 5, 2021
The tests testCreateAndRestoreSearchableSnapshot and 
testCreateAndRestorePartialSearchableSnapshot both 
failed once when asserting the shard folders using 
assertShardFolders(index, true).

The failures occurred when the original index is first closed 
(not deleted) and mounted again under the same name (so 
it will be restored as a searchable snapshot index on top of 
the existing shard files). The SearchableSnapshotDirectory 
implementation takes care to clean up the shard files on disk 
using SearchableSnapshotDirectory.cleanExistingRegularShardFiles() 
and the tests verify that the shard index folder is indeed deleted 
from disk on all nodes but sometime fail because the folder 
is still present.

I wasn't able to reproduce but I think that the closing of the 
original index + the creation of the .snapshot-blob-cache 
index trigger some shard relocations that are cancelled by 
the subsequent mount/restore, leaving some files on disk 
that should be cleaned up but maybe not immediately.

This commit changes the tests to assertBusy() when 
verifying the shard folders and also adds more logging 
information in case waiting for the 
assertShardFolders(index, true) is not enough.

Closes #77831
tlrx added a commit to tlrx/elasticsearch that referenced this issue Nov 5, 2021
The tests testCreateAndRestoreSearchableSnapshot and 
testCreateAndRestorePartialSearchableSnapshot both 
failed once when asserting the shard folders using 
assertShardFolders(index, true).

The failures occurred when the original index is first closed 
(not deleted) and mounted again under the same name (so 
it will be restored as a searchable snapshot index on top of 
the existing shard files). The SearchableSnapshotDirectory 
implementation takes care to clean up the shard files on disk 
using SearchableSnapshotDirectory.cleanExistingRegularShardFiles() 
and the tests verify that the shard index folder is indeed deleted 
from disk on all nodes but sometime fail because the folder 
is still present.

I wasn't able to reproduce but I think that the closing of the 
original index + the creation of the .snapshot-blob-cache 
index trigger some shard relocations that are cancelled by 
the subsequent mount/restore, leaving some files on disk 
that should be cleaned up but maybe not immediately.

This commit changes the tests to assertBusy() when 
verifying the shard folders and also adds more logging 
information in case waiting for the 
assertShardFolders(index, true) is not enough.

Closes elastic#77831
elasticsearchmachine pushed a commit that referenced this issue Nov 5, 2021
The tests testCreateAndRestoreSearchableSnapshot and 
testCreateAndRestorePartialSearchableSnapshot both 
failed once when asserting the shard folders using 
assertShardFolders(index, true).

The failures occurred when the original index is first closed 
(not deleted) and mounted again under the same name (so 
it will be restored as a searchable snapshot index on top of 
the existing shard files). The SearchableSnapshotDirectory 
implementation takes care to clean up the shard files on disk 
using SearchableSnapshotDirectory.cleanExistingRegularShardFiles() 
and the tests verify that the shard index folder is indeed deleted 
from disk on all nodes but sometime fail because the folder 
is still present.

I wasn't able to reproduce but I think that the closing of the 
original index + the creation of the .snapshot-blob-cache 
index trigger some shard relocations that are cancelled by 
the subsequent mount/restore, leaving some files on disk 
that should be cleaned up but maybe not immediately.

This commit changes the tests to assertBusy() when 
verifying the shard folders and also adds more logging 
information in case waiting for the 
assertShardFolders(index, true) is not enough.

Closes #77831
tlrx added a commit to tlrx/elasticsearch that referenced this issue Nov 5, 2021
The tests testCreateAndRestoreSearchableSnapshot and
testCreateAndRestorePartialSearchableSnapshot both
failed once when asserting the shard folders using
assertShardFolders(index, true).

The failures occurred when the original index is first closed
(not deleted) and mounted again under the same name (so
it will be restored as a searchable snapshot index on top of
the existing shard files). The SearchableSnapshotDirectory
implementation takes care to clean up the shard files on disk
using SearchableSnapshotDirectory.cleanExistingRegularShardFiles()
and the tests verify that the shard index folder is indeed deleted
from disk on all nodes but sometime fail because the folder
is still present.

I wasn't able to reproduce but I think that the closing of the
original index + the creation of the .snapshot-blob-cache
index trigger some shard relocations that are cancelled by
the subsequent mount/restore, leaving some files on disk
that should be cleaned up but maybe not immediately.

This commit changes the tests to assertBusy() when
verifying the shard folders and also adds more logging
information in case waiting for the
assertShardFolders(index, true) is not enough.

Closes elastic#77831
elasticsearchmachine pushed a commit that referenced this issue Nov 8, 2021
The tests testCreateAndRestoreSearchableSnapshot and
testCreateAndRestorePartialSearchableSnapshot both
failed once when asserting the shard folders using
assertShardFolders(index, true).

The failures occurred when the original index is first closed
(not deleted) and mounted again under the same name (so
it will be restored as a searchable snapshot index on top of
the existing shard files). The SearchableSnapshotDirectory
implementation takes care to clean up the shard files on disk
using SearchableSnapshotDirectory.cleanExistingRegularShardFiles()
and the tests verify that the shard index folder is indeed deleted
from disk on all nodes but sometime fail because the folder
is still present.

I wasn't able to reproduce but I think that the closing of the
original index + the creation of the .snapshot-blob-cache
index trigger some shard relocations that are cancelled by
the subsequent mount/restore, leaving some files on disk
that should be cleaned up but maybe not immediately.

This commit changes the tests to assertBusy() when
verifying the shard folders and also adds more logging
information in case waiting for the
assertShardFolders(index, true) is not enough.

Closes #77831
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants