Introduce a mechanism to notify plugin before an index/shard folder is going to be deleted from disk #65926

tlrx · 2020-12-07T09:01:22Z

This pull request introduces a new type of listeners IndexStorePlugin.IndexFoldersDeletionListener that allows plugins to be notified when an index folder (or a shard folder) is about to be deleted from disk.

This is useful for some plugins that require to take an action before folders are deleted, like searchable snapshots which should evict the cache files that are contained in the folders.

elasticmachine · 2020-12-08T08:19:11Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

We also apparently delete a shard folder in ShardPath#deleteLeftoverShardDirectory. It looks like that's mostly there for legacy reasons but I think we can also hit this if there are multiple data paths involved and we end up with the same shard on more than one path (requires changing the paths and restarting a few times).

DaveCTurner · 2020-12-08T10:50:36Z

server/src/internalClusterTest/java/org/elasticsearch/index/shard/IndexShardIT.java

+
+        final AtomicBoolean listener = new AtomicBoolean();
+        final LockObtainFailedException exception = expectThrows(LockObtainFailedException.class, () ->
+            env.deleteShardDirectoryUnderLock(sLock, indexSettings, indexPaths -> listener.set(true)));


Suggest assert false : indexPaths rather than listener.set(true), it might be useful to see the stack trace that led to calling this listener unexpectedly.

Good suggestion, I pushed bd7aca5

server/src/main/java/org/elasticsearch/indices/store/CompositeIndexFoldersDeletionListener.java

DaveCTurner · 2020-12-08T11:05:53Z

server/src/main/java/org/elasticsearch/plugins/IndexStorePlugin.java

+         * @param indexSettings settings for the index whose folders are going to be deleted
+         * @param indexPaths    the paths of the folders that are going to be deleted
+         */
+        default void beforeIndexFoldersDeleted(Index index, IndexSettings indexSettings, List<Path> indexPaths) {


I think we don't need to supply a no-op default here, we can reasonably require implementations to implement both methods.

I pushed 5925e5a

DaveCTurner · 2020-12-08T11:09:16Z

server/src/main/java/org/elasticsearch/env/NodeEnvironment.java

+    public void deleteShardDirectorySafe(
+        ShardId shardId,
+        IndexSettings indexSettings,
+        Consumer<List<Path>> listener


All callers call List#of(Path[]) on the listener, maybe we should make this a Consumer<Path[]> instead?

Ok, I pushed 850d1a3 to use Path[] all over the place

tlrx · 2020-12-09T12:32:43Z

We also apparently delete a shard folder in ShardPath#deleteLeftoverShardDirectory. It looks like that's mostly there for legacy reasons but I think we can also hit this if there are multiple data paths involved and we end up with the same shard on more than one path (requires changing the paths and restarting a few times).

Thanks for catching this place, I have no explanation why I missed it. I updated the pull request to also call listeners in deleteLeftoverShardDirectory but it took me quite some time to be able to craft a test that use it (I had to move around shard files on disk as there are verifications on on node ids and index uuids before this code is executed).

This is ready for another review.

DaveCTurner

LGTM, two tiny optional nits.

DaveCTurner · 2020-12-10T10:06:46Z

...r/src/internalClusterTest/java/org/elasticsearch/plugins/IndexFoldersDeletionListenerIT.java

+        });
+    }
+
+    public void testListenersInvokedWhenIndexHasLeftOverShard() throws Exception {


Nice work, this looked to be pretty tricky to trigger 👍

DaveCTurner · 2020-12-10T10:09:21Z

server/src/test/java/org/elasticsearch/env/NodeEnvironmentTests.java

+            SetOnce<Path[]> listener = new SetOnce<>();
+            ShardLockObtainFailedException ex = expectThrows(ShardLockObtainFailedException.class,
+                () -> env.deleteShardDirectorySafe(new ShardId(index, 0), idxSettings, listener::set));
+            assertNull(listener.get());


Maybe assert false rather than listener::set?

Oups, I missed it the last time. I pushed aea4e6d

DaveCTurner · 2020-12-10T10:09:25Z

server/src/test/java/org/elasticsearch/env/NodeEnvironmentTests.java

+            SetOnce<Path[]> listener = new SetOnce<>();
+            ShardLockObtainFailedException ex = expectThrows(ShardLockObtainFailedException.class,
+                () -> env.deleteIndexDirectorySafe(index, randomIntBetween(0, 10), idxSettings, listener::set));
+            assertNull(listener.get());


Maybe assert false rather than listener::set?

I pushed aea4e6d

tlrx · 2020-12-10T11:25:06Z

Thanks a lot David!

…s going to be deleted from disk (elastic#65926) This commit introduces a new type of listeners IndexStorePlugin.IndexFoldersDeletionListener that allows plugins to be notified when an index folder (or a shard folder) is about to be deleted from disk. This is useful for some plugins that require to take an action before folders are deleted, like searchable snapshots which should evict the cache files that are contained in the folders.

…s going to be deleted from disk (#66158) This commit introduces a new type of listeners IndexStorePlugin.IndexFoldersDeletionListener that allows plugins to be notified when an index folder (or a shard folder) is about to be deleted from disk. This is useful for some plugins that require to take an action before folders are deleted, like searchable snapshots which should evict the cache files that are contained in the folders. Backport of #65926 for 7.x

…66173) This commit changes the SearchableSnapshotDirectory so that it does not evict all its cache files at closing time, but instead delegates this work to the CacheService. This change is motivated by the fact that Lucene directories are closed as the consequence of applying a new cluster state and as such the closing is executed within the cluster state applier thread; and we want to minimize disk IO operations in such thread (like deleting a lot of evicted cache files). It is also motivated by the future of the searchable snapshot cache which should become persistent. This change is built on top of the existing SearchableSnapshotIndexEventListener and a new SearchableSnapshotIndexFoldersDeletionListener (see #65926) that are used to detect when a searchable snapshot index (or searchable snapshot shard) is removed from a data node. When such a thing happens, the listeners notify the CacheService that maintains an internal list of removed shards. This list is used to evict the cache files associated to these shards as soon as possible (but not in the cluster state applier thread) or right before the same searchable snapshot shard is being built again on the same node. In other situations like opening/closing a searchable snapshot shard then the cache files are not evicted anymore and should be reused.

…lastic#66173) This commit changes the SearchableSnapshotDirectory so that it does not evict all its cache files at closing time, but instead delegates this work to the CacheService. This change is motivated by the fact that Lucene directories are closed as the consequence of applying a new cluster state and as such the closing is executed within the cluster state applier thread; and we want to minimize disk IO operations in such thread (like deleting a lot of evicted cache files). It is also motivated by the future of the searchable snapshot cache which should become persistent. This change is built on top of the existing SearchableSnapshotIndexEventListener and a new SearchableSnapshotIndexFoldersDeletionListener (see elastic#65926) that are used to detect when a searchable snapshot index (or searchable snapshot shard) is removed from a data node. When such a thing happens, the listeners notify the CacheService that maintains an internal list of removed shards. This list is used to evict the cache files associated to these shards as soon as possible (but not in the cluster state applier thread) or right before the same searchable snapshot shard is being built again on the same node. In other situations like opening/closing a searchable snapshot shard then the cache files are not evicted anymore and should be reused.

…66264) This commit changes the SearchableSnapshotDirectory so that it does not evict all its cache files at closing time, but instead delegates this work to the CacheService. This change is motivated by the fact that Lucene directories are closed as the consequence of applying a new cluster state and as such the closing is executed within the cluster state applier thread; and we want to minimize disk IO operations in such thread (like deleting a lot of evicted cache files). It is also motivated by the future of the searchable snapshot cache which should become persistent. This change is built on top of the existing SearchableSnapshotIndexEventListener and a new SearchableSnapshotIndexFoldersDeletionListener (see #65926) that are used to detect when a searchable snapshot index (or searchable snapshot shard) is removed from a data node. When such a thing happens, the listeners notify the CacheService that maintains an internal list of removed shards. This list is used to evict the cache files associated to these shards as soon as possible (but not in the cluster state applier thread) or right before the same searchable snapshot shard is being built again on the same node. In other situations like opening/closing a searchable snapshot shard then the cache files are not evicted anymore and should be reused. Backport of #66173 for 7.11

tlrx added 2 commits December 7, 2020 13:37

Introduce IndexStorePlugin.IndexFoldersDeletionListener

fae5448

add tests

fe0fbbd

tlrx force-pushed the index-folders-deletion-listeners branch from a670145 to fe0fbbd Compare December 7, 2020 16:17

tlrx added 2 commits December 7, 2020 18:15

Merge branch 'master' into index-folders-deletion-listeners

ac6c8bf

Merge branch 'master' into index-folders-deletion-listeners

56b8a45

tlrx marked this pull request as ready for review December 8, 2020 08:18

tlrx added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Dec 8, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 8, 2020

tlrx added >enhancement v7.11.0 v8.0.0 and removed Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Dec 8, 2020

typo

f32967b

tlrx requested review from original-brownbear and DaveCTurner December 8, 2020 09:10

DaveCTurner reviewed Dec 8, 2020

View reviewed changes

tlrx added 6 commits December 9, 2020 09:17

assert false

bd7aca5

no default

5925e5a

Path[]

850d1a3

deleteLeftoverShardDirectory plumbing

81da61f

no swallowing

d6a6e2d

Add test for left-over shards + assertBusy

797cce2

tlrx requested a review from DaveCTurner December 9, 2020 12:32

DaveCTurner approved these changes Dec 10, 2020

View reviewed changes

tlrx added 2 commits December 10, 2020 11:33

feedback

aea4e6d

Merge branch 'master' into index-folders-deletion-listeners

4a13cb1

tlrx merged commit 34cdfee into elastic:master Dec 10, 2020

This was referenced Dec 10, 2020

Introduce a mechanism to notify plugin before an index/shard folder is going to be deleted from disk #66158

Merged

SearchableSnapshotDirectory should not evict cache files when closed #66173

Merged

tlrx mentioned this pull request Dec 11, 2020

Make searchable snapshots cache persistent #65725

Merged

tlrx mentioned this pull request Dec 14, 2020

SearchableSnapshotDirectory should not evict cache files when closed #66264

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a mechanism to notify plugin before an index/shard folder is going to be deleted from disk #65926

Introduce a mechanism to notify plugin before an index/shard folder is going to be deleted from disk #65926

tlrx commented Dec 7, 2020 •

edited

Loading

elasticmachine commented Dec 8, 2020

DaveCTurner left a comment

DaveCTurner Dec 8, 2020

tlrx Dec 9, 2020

DaveCTurner Dec 8, 2020

tlrx Dec 9, 2020

DaveCTurner Dec 8, 2020

tlrx Dec 9, 2020

tlrx commented Dec 9, 2020

DaveCTurner left a comment

DaveCTurner Dec 10, 2020

DaveCTurner Dec 10, 2020

tlrx Dec 10, 2020

DaveCTurner Dec 10, 2020

tlrx Dec 10, 2020

tlrx commented Dec 10, 2020

Introduce a mechanism to notify plugin before an index/shard folder is going to be deleted from disk #65926

Introduce a mechanism to notify plugin before an index/shard folder is going to be deleted from disk #65926

Conversation

tlrx commented Dec 7, 2020 • edited Loading

elasticmachine commented Dec 8, 2020

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Dec 9, 2020

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Dec 10, 2020

tlrx commented Dec 7, 2020 •

edited

Loading