Add Functionality to Consistently Read RepositoryData For CS Updates #55773

original-brownbear · 2020-04-26T10:22:07Z

Using (sort of) optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Fixes issues where deleting a snapshot is enqueued based on an outdated repository generation due to concurrent snapshot or delete operations.

Allows for some follow-ups:

Removing the snapshot INIT state because its only use was adding the snapshot to the cluster state so that no concurrent changes to the repository could occur before reading the repository data.
- This way allows simplifying the abort state machine as well as master fail-over handling. We have a lot of complicated logic for handling INIT state corner case handling in SnapshotsService that can all nicely go away in a follow-up using this change.
Allow deleting multiple snapshots at a time to include currently running snapshots.

Closes #55702

Using optimistic locking, add the ability to run a repository state update task with a consistent view of the current repository data. Allows for a follow-up to remove the snapshot init state. Closes elastic#55702

elasticmachine · 2020-04-26T10:22:09Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2020-04-26T10:28:09Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

        if (snapshotIds.isEmpty()) {
-            listener.onResponse(null);
-            return;
+            return new ClusterStateUpdateTask() {


This is a little less ergonomic than I would've liked but I found it easiest to produce a CS update task in the Repository API compared to other mechanism because the existing code is so heavily built on using state in CS update tasks to do logic between execute and clusterStateProcessed, so I figured doing it this way and in the repository allows for the follow-ups to be relatively isolated changes.

original-brownbear · 2020-04-26T16:19:08Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+    @Override
+    public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
+                                             Consumer<Exception> onFailure) {
+        threadPool.generic().execute(new AbstractRunnable() {


This could technically be done quite a bit more efficiently by not forking off to the generic pool when we just get cached repository data, but I figured this is good enough and I can back-port it to 7.x without trouble because even in a mixed 7.x/6.x cluster, a 7.x master node will always update the repo metadata when writing new repository data so this simple solution avoids dealing with all the details of the org.elasticsearch.repositories.blobstore.BlobStoreRepository#bestEffortConsistency flag.

original-brownbear · 2020-04-26T17:16:22Z

server/src/test/java/org/elasticsearch/snapshots/SnapshotResiliencyTests.java

@@ -738,6 +739,63 @@ public void onFailure(Exception e) {
        assertEquals(0, snapshotInfo.failedShards());
    }

+    public void testConcurrentDeletes() {


Added this test to cover the scenario of concurrent deletes causing the repository data that a delete is based on to be outdated in addition to the coverage we have for concurrent snapshot + delete

original-brownbear · 2020-04-26T17:22:20Z

server/src/main/java/org/elasticsearch/repositories/Repository.java

+     * @param createUpdateTask function to supply cluster state update task
+     * @param onFailure        error handler invoked on failure to get a consistent view of the current {@link RepositoryData}
+     */
+    void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask, Consumer<Exception> onFailure);


With more and more cluster state related logic leaking into the repository code, I think we should just eventually refactor things such that the repository only deals with reading and writing data in some form (obviously lots of tricky details here around source only repos and S3 waiting and whatnot) and completely delegate the responsibility for this kind of magic to either the SnapshotsService or some other component. But until then this is basically the method we want to make life a lot less complicated in the SnapshotsService

ywelsch · 2020-04-27T06:33:54Z

test/framework/src/main/java/org/elasticsearch/index/shard/RestoreOnlyRepository.java

+    @Override
+    public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
+                                             Consumer<Exception> onFailure) {
+    }


throw UnsupportedOperation here?

ywelsch · 2020-04-27T06:36:42Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

-                            repositoriesService.repository(repositoryName).getRepositoryData(ActionListener.wrap(repositoryData ->
-                                    deleteCompletedSnapshots(matchingSnapshotIds(repositoryData, snapshotNames, repositoryName),
-                                            repositoryName, repositoryData.getGenId(), Priority.NORMAL, l), l::onFailure))));
+                    repositoriesService.repository(repositoryName).executeConsistentStateUpdate(repositoryData ->


Could repositoriesService.repository(repositoryName) throw an exception here?

Close to impossible but I suppose there is a slim chance of concurrently removing the repository here so I added a catch for that :)

ywelsch · 2020-04-27T06:43:09Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -330,6 +330,60 @@ protected void doClose() {
        }
    }

+    @Override
+    public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,


This is using ClusterStateUpdateTask in a weird way, directly calling some methods on it, but ignoring other things (e.g. priority / listeners?)

Yea, this is a little less ergonomic than I'd like it to be. Though, practically speaking the priority is not and won't be used anywhere (as in the delete which runs at normal and all upcoming use-cases of this) anyway so I decided to ignore it.
The whole listeners thing I think I got correct though? Since clusterStatePublished is final and empty in ClusterStateUpdateTask and we don't use onNoLongerMaster yet in the task passed I figured why add dead code?

ywelsch · 2020-04-27T07:01:17Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            protected void doRun() {
+                final RepositoryMetadata repositoryMetadataStart = metadata;
+                getRepositoryData(ActionListener.wrap(repositoryData ->
+                        clusterService.submitStateUpdateTask("consistent state update", new ClusterStateUpdateTask() {


As we are just protecting against concurrent changes done on the same node, I wonder if we can avoid calling getRepositoryData here for every caller of executeConsistentStateUpdate, or do some sharing between multiple callers of this method. In particular, I'm worried about situations where it takes a long time for the norma-priority CS update task below to be executed, and where stuff might have been invalidated again and again (think e.g. about a misconfigured deletion policy). The task here also does not seem to support timeouts? (This makes me also notice that deleteCompletedSnapshots should probably use the timeout from the DeleteSnapshotRequest).

As we are just protecting against concurrent changes done on the same node

Practically yes, theoretically no I think it could be across nodes? Technically we could have a delete along the lines of:

Master reads repository data and then gets stuck for 30s or whatever

New master takes over and completes a repository operation then dies

Master from step 1 is back ...

Not likely, but possible and a scenario that SnapshotResiliencyTests could cover (they currently don't but might be worth adding).

I'm worried about situations where it takes a long time for the norma-priority CS update task below to be executed, and where stuff might have been invalidated again and again (think e.g. about a misconfigured deletion policy)

I'm not sure this is all that likely in practice since we keep loading the RepositoryData from cache and currently a delete does 4 state updates of which only two modify the repository metadata? So I would imagine that practically even if you somehow send a barrage of delete requests those will mostly work out fine just because it's so unlikely for these back to back repo metadata updates to go through that quickly (they will always require at least one additional update to the cluster state before). Not saying I'm against optimizing this (always looking for ways to not have to load too many RepositoryData instances on heap) but it seemed not strictly necessary for now (that's why I wrote https://github.com/elastic/elasticsearch/pull/55773/files#r415345654 , we can probably already way optimize this by exploiting the way the cache works).

The task here also does not seem to support timeouts? (This makes me also notice that deleteCompletedSnapshots should probably use the timeout from the DeleteSnapshotRequest).

Right we haven't been using those timeouts in deletes ever I think ... maybe something to fix in a follow-up since it'll require a few changes here and there because we currently don't pass the timeout from the request to the SnapshotsService?

Opened #55798 for the timeout thing (was a small change after all ...)

…-to-repo

original-brownbear

All addressed I think. Let me know if you want to make it more efficient in this iteration already :)

original-brownbear · 2020-04-27T07:44:33Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -330,6 +330,60 @@ protected void doClose() {
        }
    }

+    @Override
+    public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,


Yea, this is a little less ergonomic than I'd like it to be. Though, practically speaking the priority is not and won't be used anywhere (as in the delete which runs at normal and all upcoming use-cases of this) anyway so I decided to ignore it.
The whole listeners thing I think I got correct though? Since clusterStatePublished is final and empty in ClusterStateUpdateTask and we don't use onNoLongerMaster yet in the task passed I figured why add dead code?

original-brownbear · 2020-04-27T08:00:47Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

-                            repositoriesService.repository(repositoryName).getRepositoryData(ActionListener.wrap(repositoryData ->
-                                    deleteCompletedSnapshots(matchingSnapshotIds(repositoryData, snapshotNames, repositoryName),
-                                            repositoryName, repositoryData.getGenId(), Priority.NORMAL, l), l::onFailure))));
+                    repositoriesService.repository(repositoryName).executeConsistentStateUpdate(repositoryData ->


Close to impossible but I suppose there is a slim chance of concurrently removing the repository here so I added a catch for that :)

original-brownbear · 2020-04-27T08:55:09Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            protected void doRun() {
+                final RepositoryMetadata repositoryMetadataStart = metadata;
+                getRepositoryData(ActionListener.wrap(repositoryData ->
+                        clusterService.submitStateUpdateTask("consistent state update", new ClusterStateUpdateTask() {


As we are just protecting against concurrent changes done on the same node

Practically yes, theoretically no I think it could be across nodes? Technically we could have a delete along the lines of:

Master reads repository data and then gets stuck for 30s or whatever

New master takes over and completes a repository operation then dies

Master from step 1 is back ...

Not likely, but possible and a scenario that SnapshotResiliencyTests could cover (they currently don't but might be worth adding).

I'm worried about situations where it takes a long time for the norma-priority CS update task below to be executed, and where stuff might have been invalidated again and again (think e.g. about a misconfigured deletion policy)

I'm not sure this is all that likely in practice since we keep loading the RepositoryData from cache and currently a delete does 4 state updates of which only two modify the repository metadata? So I would imagine that practically even if you somehow send a barrage of delete requests those will mostly work out fine just because it's so unlikely for these back to back repo metadata updates to go through that quickly (they will always require at least one additional update to the cluster state before). Not saying I'm against optimizing this (always looking for ways to not have to load too many RepositoryData instances on heap) but it seemed not strictly necessary for now (that's why I wrote https://github.com/elastic/elasticsearch/pull/55773/files#r415345654 , we can probably already way optimize this by exploiting the way the cache works).

The task here also does not seem to support timeouts? (This makes me also notice that deleteCompletedSnapshots should probably use the timeout from the DeleteSnapshotRequest).

Right we haven't been using those timeouts in deletes ever I think ... maybe something to fix in a follow-up since it'll require a few changes here and there because we currently don't pass the timeout from the request to the SnapshotsService?

original-brownbear · 2020-04-27T08:55:42Z

test/framework/src/main/java/org/elasticsearch/index/shard/RestoreOnlyRepository.java

+    @Override
+    public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
+                                             Consumer<Exception> onFailure) {
+    }


…-to-repo

ywelsch

Two more comments

ywelsch · 2020-04-28T08:06:31Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            protected void doRun() {
+                final RepositoryMetadata repositoryMetadataStart = metadata;
+                getRepositoryData(ActionListener.wrap(repositoryData ->
+                        clusterService.submitStateUpdateTask("consistent state update", new ClusterStateUpdateTask() {


"consistent state update". Can you pass in a more descriptive state update message?

Also, can we add the timeout here now?

Sure added the timeout, passing a source now + took the opportunity to also pass the task priority along now.

…-to-repo

ywelsch

LGTM

original-brownbear · 2020-04-28T11:54:18Z

Thanks Yannick!

Making use of elastic#55773 to simplify snapshot state machine. 1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away instead of doing a second CS update after the fist update was a NOOP. 2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot and then move on to delete from the repository.

…lastic#55773) Using optimistic locking, add the ability to run a repository state update task with a consistent view of the current repository data. Allows for a follow-up to remove the snapshot INIT state.

…55773) (#56091) Using optimistic locking, add the ability to run a repository state update task with a consistent view of the current repository data. Allows for a follow-up to remove the snapshot INIT state.

Making use of #55773 to simplify snapshot state machine. 1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away instead of doing a second CS update after the fist update was a NOOP. 2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot and then move on to delete from the repository.

Making use of elastic#55773 to simplify snapshot state machine. 1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away instead of doing a second CS update after the fist update was a NOOP. 2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot and then move on to delete from the repository.

Making use of #55773 to simplify snapshot state machine. 1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away instead of doing a second CS update after the fist update was a NOOP. 2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot and then move on to delete from the repository.

With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine. This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.

With elastic#55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine. This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.

With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine. This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.

Add Functionality to Consistently Update RepositoryData

095b269

Using optimistic locking, add the ability to run a repository state update task with a consistent view of the current repository data. Allows for a follow-up to remove the snapshot init state. Closes elastic#55702

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.8.0 labels Apr 26, 2020

original-brownbear added WIP and removed WIP labels Apr 26, 2020

original-brownbear commented Apr 26, 2020

View reviewed changes

more efficient

7fbebe2

original-brownbear commented Apr 26, 2020

View reviewed changes

original-brownbear changed the title ~~Add Functionality to Consistently Update RepositoryData~~ Add Functionality to Consistently Read RepositoryData For CS Updates Apr 26, 2020

original-brownbear requested review from ywelsch and tlrx April 26, 2020 17:14

original-brownbear commented Apr 26, 2020

View reviewed changes

ywelsch reviewed Apr 27, 2020

View reviewed changes

original-brownbear added 3 commits April 27, 2020 09:29

Merge remote-tracking branch 'elastic/master' into add-lock-mechanism…

17e1042

…-to-repo

changes

b46783b

Merge remote-tracking branch 'elastic/master' into add-lock-mechanism…

c451794

…-to-repo

original-brownbear commented Apr 27, 2020

View reviewed changes

original-brownbear requested a review from ywelsch April 27, 2020 08:57

original-brownbear added 3 commits April 27, 2020 13:41

Merge remote-tracking branch 'elastic/master' into add-lock-mechanism…

677c679

…-to-repo

Merge remote-tracking branch 'elastic/master' into add-lock-mechanism…

ee60aaf

…-to-repo

Merge remote-tracking branch 'elastic/master' into add-lock-mechanism…

67d06f4

…-to-repo

ywelsch reviewed Apr 28, 2020

View reviewed changes

original-brownbear added 2 commits April 28, 2020 10:32

Merge remote-tracking branch 'elastic/master' into add-lock-mechanism…

702f265

…-to-repo

timeout + source

e1427e0

original-brownbear requested a review from ywelsch April 28, 2020 10:44

ywelsch approved these changes Apr 28, 2020

View reviewed changes

original-brownbear merged commit c1fca12 into elastic:master Apr 28, 2020

original-brownbear deleted the add-lock-mechanism-to-repo branch April 28, 2020 11:54

original-brownbear added the backport pending label Apr 28, 2020

original-brownbear mentioned this pull request Apr 29, 2020

Remove Snapshot INIT Step #55918

Merged

original-brownbear mentioned this pull request Apr 30, 2020

Allow Bulk Snapshot Deletes to Abort #56009

Merged

original-brownbear mentioned this pull request May 4, 2020

Add Functionality to Consistently Read RepositoryData For CS Updates (#55773) #56091

Merged

original-brownbear removed the backport pending label May 4, 2020

original-brownbear mentioned this pull request May 4, 2020

Allow Bulk Snapshot Deletes to Abort (#56009) #56111

Merged

original-brownbear mentioned this pull request Jul 12, 2020

Remove Snapshot INIT Step (#55918) #59374

Merged

original-brownbear restored the add-lock-mechanism-to-repo branch August 6, 2020 18:24

original-brownbear mentioned this pull request Jul 1, 2021

Concurrent Snapshot Delete and Snapshot Repository Removal can Lead to Master Nodes Looping #74858

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Functionality to Consistently Read RepositoryData For CS Updates #55773

Add Functionality to Consistently Read RepositoryData For CS Updates #55773

original-brownbear commented Apr 26, 2020 •

edited

Loading

elasticmachine commented Apr 26, 2020

original-brownbear Apr 26, 2020

original-brownbear Apr 26, 2020

original-brownbear Apr 26, 2020

original-brownbear Apr 26, 2020

ywelsch Apr 27, 2020

original-brownbear Apr 27, 2020

ywelsch Apr 27, 2020

original-brownbear Apr 27, 2020

ywelsch Apr 27, 2020

original-brownbear Apr 27, 2020

ywelsch Apr 27, 2020

original-brownbear Apr 27, 2020

original-brownbear Apr 27, 2020

original-brownbear left a comment

original-brownbear Apr 27, 2020

original-brownbear Apr 27, 2020

original-brownbear Apr 27, 2020

original-brownbear Apr 27, 2020

ywelsch left a comment

ywelsch Apr 28, 2020

ywelsch Apr 28, 2020

original-brownbear Apr 28, 2020

ywelsch left a comment

original-brownbear commented Apr 28, 2020

Add Functionality to Consistently Read RepositoryData For CS Updates #55773

Add Functionality to Consistently Read RepositoryData For CS Updates #55773

Conversation

original-brownbear commented Apr 26, 2020 • edited Loading

elasticmachine commented Apr 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

original-brownbear commented Apr 28, 2020

original-brownbear commented Apr 26, 2020 •

edited

Loading