Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SearchableSnapshotsBlobStoreCacheIntegTests.testBlobStoreCache #78616

Merged
merged 3 commits into from
Oct 7, 2021

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Oct 4, 2021

This test often fails on CI because it does not wait for the data node that hosts the .snapshot-blob-cache primary shard to process the cluster state update and triggers the clean up.

Closes #78512

@tlrx tlrx added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.16.0 labels Oct 4, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 4, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx tlrx requested review from henningandersen and arteam October 4, 2021 14:30
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added a couple of comments though.

@@ -329,6 +330,10 @@ public Settings onNodeStopped(String nodeName) throws Exception {

logger.info("--> deleting indices, maintenance service should clean up snapshot blob cache index");
assertAcked(client().admin().indices().prepareDelete("restored-*"));
// waits for the data node that hosts the .snapshot-blob-cache primary shard
// to process the cluster state update and to trigger a clean up
ensureClusterStateConsistency();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this waits for anything and it seems sort of unrelated to this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed, thanks.

@@ -338,7 +343,7 @@ public Settings onNodeStopped(String nodeName) throws Exception {
.get(),
0L
);
});
}, 30L, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this might help, but can we perhaps log numberOfCachedBlobs before this assert busy such that we can see if it was trying to delete entries but was simply slow or if it did not even start on it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the numberOfCachedBlobs in the log message before deleting all remaining mounted indices.

Copy link
Contributor

@arteam arteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tlrx tlrx merged commit 9e27100 into elastic:master Oct 7, 2021
@tlrx tlrx deleted the fix-78512 branch October 7, 2021 10:09
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 7, 2021
…lastic#78616)

This test often fails on CI because it does not wait for the data 
node that hosts the .snapshot-blob-cache primary shard to 
process the cluster state update and triggers the clean up.

Closes elastic#78512
elasticsearchmachine pushed a commit that referenced this pull request Oct 7, 2021
…78616) (#78818)

This test often fails on CI because it does not wait for the data 
node that hosts the .snapshot-blob-cache primary shard to 
process the cluster state update and triggers the clean up.

Closes #78512
wjp719 added a commit to wjp719/elasticsearch that referenced this pull request Oct 7, 2021
…' into feature/data_stream_support_routing

* wjp/feature/data_stream_support_routing: (44 commits)
  Revert "Adjust /_cat/templates not to request all metadata (elastic#78812)"
  Allow indices lookup to be built lazily (elastic#78745)
  [DOCS] Document default security in alpha2 (elastic#78227)
  Add cluster applier stats (elastic#77552)
  Fix failing URLDecodeProcessorTests::testProcessor test (elastic#78690)
  Upgrade to lucene snapshot ba75dc5e6bf (elastic#78817)
  Adjust /_cat/templates not to request all metadata (elastic#78812)
  Simplify build plugin license handling (elastic#77009)
  Fix SearchableSnapshotsBlobStoreCacheIntegTests.testBlobStoreCache (elastic#78616)
  Improve Docker image caching and testing (elastic#78552)
  Load knn vectors format with mmapfs (elastic#78724)
  Fix date math zone test to use negative minutes (elastic#78796)
  Changing name of shards field in node/stats api to shard_stats (elastic#78531)
  [DOCS] Fix system index refs in restore tutorial (elastic#78582)
  Add previously removed settings back for 8.0 (elastic#78784)
  TSDB: Fix template name in test
  Add a system property to forcibly format everything (elastic#78768)
  Revert "Adding config so that some tests will break if over-the-wire encryption fails (elastic#78409)" (elastic#78787)
  Must date math test failure
  Adding config so that some tests will break if over-the-wire encryption fails (elastic#78409)
  ...
@tlrx tlrx mentioned this pull request Oct 25, 2021
elasticsearchmachine pushed a commit that referenced this pull request Oct 27, 2021
The attempt to fix this test in #78616 wasn't complete: we need to wait
for all mounted indices, not only the last restored one, before deleting
the indices. Otherwise one of the mounted index might still be
initializing and recreates the cached documents in
`.snapshot-blob-cache`  Closes #78993
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 27, 2021
The attempt to fix this test in elastic#78616 wasn't complete: we need to wait
for all mounted indices, not only the last restored one, before deleting
the indices. Otherwise one of the mounted index might still be
initializing and recreates the cached documents in
`.snapshot-blob-cache`  Closes elastic#78993
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 27, 2021
The attempt to fix this test in elastic#78616 wasn't complete: we need to wait
for all mounted indices, not only the last restored one, before deleting
the indices. Otherwise one of the mounted index might still be
initializing and recreates the cached documents in
`.snapshot-blob-cache`  Closes elastic#78993
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 27, 2021
The attempt to fix this test in elastic#78616 wasn't complete: we need to wait
for all mounted indices, not only the last restored one, before deleting
the indices. Otherwise one of the mounted index might still be
initializing and recreates the cached documents in
`.snapshot-blob-cache`  Closes elastic#78993
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 27, 2021
The attempt to fix this test in elastic#78616 wasn't complete: we need to wait
for all mounted indices, not only the last restored one, before deleting
the indices. Otherwise one of the mounted index might still be
initializing and recreates the cached documents in
`.snapshot-blob-cache`  Closes elastic#78993
elasticsearchmachine pushed a commit that referenced this pull request Oct 27, 2021
The attempt to fix this test in #78616 wasn't complete: we need to wait
for all mounted indices, not only the last restored one, before deleting
the indices. Otherwise one of the mounted index might still be
initializing and recreates the cached documents in
`.snapshot-blob-cache`  Closes #78993
elasticsearchmachine pushed a commit that referenced this pull request Oct 27, 2021
The attempt to fix this test in #78616 wasn't complete: we need to wait
for all mounted indices, not only the last restored one, before deleting
the indices. Otherwise one of the mounted index might still be
initializing and recreates the cached documents in
`.snapshot-blob-cache`  Closes #78993
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SearchableSnapshotsBlobStoreCacheIntegTests testBlobStoreCache failing
5 participants