Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchableSnapshotsIntegTests testCreateAndRestoreSearchableSnapshot failure #66958

Closed
benwtrent opened this issue Jan 4, 2021 · 4 comments
Closed
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@benwtrent
Copy link
Member

Build scan:
https://gradle-enterprise.elastic.co/s/zuplpddr4zy6k
Repro line:

gradlew ':x-pack:plugin:searchable-snapshots:internalClusterTest' --tests "org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot" -Dtests.seed=2CE33D36FC2EB66F -Dtests.security.manager=true -Dtests.locale=is -Dtests.timezone=America/Recife -Druntime.java=11

Reproduces locally?:
No, but this was a windows build and I don't run windows locally
Applicable branches:
master
Failure excerpt:


org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests > testCreateAndRestoreSearchableSnapshot FAILED |  
-- | --
  | com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=908, name=elasticsearch[node_s1][generic][T#2], state=RUNNABLE, group=TGRP-SearchableSnapshotsIntegTests] |  
  | at __randomizedtesting.SeedInfo.seed([2CE33D36FC2EB66F:A6804A4D6FB9E7DC]:0) |  
  |   |  
  | Caused by: |  
  | java.lang.AssertionError: shard eviction should be successful: [snapshotUUID=vHhGCWjPQtGXn6mwBTZOuw, snapshotIndexName=riiwtkwzmp, shardId=[riiwtkwzmp][2]] |  
  | at __randomizedtesting.SeedInfo.seed([2CE33D36FC2EB66F]:0) |  
  | at org.elasticsearch.xpack.searchablesnapshots.cache.CacheService.runIfShardMarkedAsEvictedInCache(CacheService.java:412) |  
  | at org.elasticsearch.xpack.searchablesnapshots.cache.CacheService$1.doRun(CacheService.java:351) |  
  | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) |  
  | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) |  
  | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) |  
  | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:680) |  
  | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) |  
  | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) |  
  | at java.base/java.lang.Thread.run(Thread.java:834)
@benwtrent benwtrent added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI labels Jan 4, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jan 4, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jtibshirani jtibshirani added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels Jan 4, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 4, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@javanna
Copy link
Member

javanna commented Jan 13, 2021

tlrx added a commit that referenced this issue Jan 14, 2021
The searchable snapshot's cache service is notified when cache files 
of a specific shard must be evicted. The notifications are usually done 
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set 
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service 
(and persistent cache service) to be closed before all shared evictions 
are processed, it is possible that invalidating a cache file fails and trips 
an assertion (as it happened in many tests failures recently #66958, #66730).

This commit changes the CacheService so that it now waits for the evictions 
of shards to complete before closing the cache and persistent cache services.
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jan 14, 2021
…c#67160)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently elastic#66958, elastic#66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
tlrx added a commit to tlrx/elasticsearch that referenced this issue Jan 14, 2021
…c#67160)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently elastic#66958, elastic#66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
tlrx added a commit that referenced this issue Jan 14, 2021
#67519)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently #66958, #66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
tlrx added a commit that referenced this issue Jan 14, 2021
#67517)

The searchable snapshot's cache service is notified when cache files
of a specific shard must be evicted. The notifications are usually done
in a cluster state applier thread that calls the CacheService#
markShardAsEvictedInCache method.

The markShardAsEvictedInCache adds the shard to an internal set
of ShardEviction and submits the eviction of the shard to the generic
 thread pool. Because there's nothing preventing the cache service
(and persistent cache service) to be closed before all shared evictions
are processed, it is possible that invalidating a cache file fails and trips
an assertion (as it happened in many tests failures recently #66958, #66730).

This commit changes the CacheService so that it now waits for the evictions
of shards to complete before closing the cache and persistent cache services.
@tlrx
Copy link
Member

tlrx commented Jan 15, 2021

The shard eviction mechanism has been improved in #67160 and the failing assertion removed in #67265, this should now be fixed so I'm closing this issue.

@tlrx tlrx closed this as completed Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants