Make snapshot deletes not block the repository during data blob deletes #86514

original-brownbear · 2022-05-06T11:16:24Z

Snapshot deletes need not block other operations beyond the updates to the repository metadata at the beginning of the delete operation. All subsequent blob deletes can safely run async and concurrent to other operations.

This is the simplest possible implementation of this change that I could find. It's not the most optimal since concurrent deletes are not guarded against trying to delete the same blobs twice. I believe this is not a big issue in practice though.
For one, we batch overlapping deletes into single operations, so we will only try to redundantly delete blobs leaked by previous operations that are part of indices still referenced (which will generally by a very limited number of blobs I believe) and indices that went out of scope. Indices that went out of scope are deleted by listing out blobs and deleting them in turn, which means that we likely won't be attempting all that many redundant deletes even if the same index would be touched by concurrent delete operations and even if we did, the additional memory use would be bounded.

…lete

original-brownbear · 2022-05-12T12:26:09Z

Jenkins run
elasticsearch-ci/part-2 ( unrelated)

…lete

original-brownbear · 2022-05-31T08:59:18Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+
+                        @Override
+                        public void onFailure(Exception e) {
+                            submitUnbatchedTask(


Pulled this branch out of removeSnapshotDeletionFromClusterState to make the code a little less confusing than it already is with the listenersHandler.

original-brownbear · 2022-05-31T09:03:11Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/SnapshotStressTestsIT.java

-                            assertThat(Strings.toString(result), result.blobs(), equalTo(0L));
-                            assertThat(Strings.toString(result), result.bytes(), equalTo(0L));
-                            startCleaner();
+                            if (result.bytes() > 0L || result.blobs() > 0L) {


I believe the only reason we did not see this earlier is timing. Previously a delete would be a hard barrier that would always clean up all dangling blobs from previous operations. Now, operations may create dangling blobs while a delete is running and deletes won't pick those up. Theoretically, this failure mode was possible prior to the changes here as far as I can see.

Hmm. What we're really trying to say here is that explicit repo cleanup is unnecessary because other operations also clean things up properly. With this change we're kind of saying the opposite. Can we make those failures clean up after themselves more robustly? If not could we at least do a delete before the no-op cleanup (after acquiring all the permits) instead of the first cleanup?

Making this more robust might be interesting but I wonder if it's worth it in the real world. Technically, it wouldn't be so hard to make a data node fire off a bunch of deletes for the stuff it just wrote in case it runs into an abort I guess. I'm not sure the added complexity is worth it given how little data this tends to leak in the real world in practice, but it sure would make aborts cleaner in theory. I think I like the idea :) should I do that in a separate PR maybe?

If not could we at least do a delete before the no-op cleanup (after acquiring all the permits) instead of the first cleanup?

It's tricky right? We'd have to do a delete that touches all the indices that could have something leaked in them. Which I guess (unless we're doing something really tricky) means we'd have to run a delete that touches a set of snapshots such that all indices in the repo are touched? It shouldn't be endlessly hard to do that but is it really worth it, given that we have a bunch of focused tests around deletes and not leaking blobs elsewhere?

original-brownbear · 2022-05-31T09:03:35Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/RepositoriesIT.java

@@ -270,7 +268,7 @@ public void testRepositoryConflict() throws Exception {
        final String snapshot1 = "test-snap1";
        client().admin().cluster().prepareCreateSnapshot(repo, snapshot1).setWaitForCompletion(true).get();
        String blockedNode = internalCluster().getMasterName();
-        ((MockRepository) internalCluster().getInstance(RepositoriesService.class, blockedNode).repository(repo)).blockOnDataFiles();
+        blockMasterOnWriteIndexFile(repo);


Blocking data files doesn't block CS updates any longer.

elasticmachine · 2022-05-31T09:11:23Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2022-05-31T09:11:45Z

Hi @original-brownbear, I've created a changelog YAML for you.

DaveCTurner

General approach seems ok to me (subject to the usual concerns about losing track of blobs and having to fall back to listing things). I left a few small comments.

DaveCTurner · 2022-06-06T10:17:07Z

server/src/internalClusterTest/java/org/elasticsearch/snapshots/SnapshotStressTestsIT.java

-                            assertThat(Strings.toString(result), result.blobs(), equalTo(0L));
-                            assertThat(Strings.toString(result), result.bytes(), equalTo(0L));
-                            startCleaner();
+                            if (result.bytes() > 0L || result.blobs() > 0L) {


Hmm. What we're really trying to say here is that explicit repo cleanup is unnecessary because other operations also clean things up properly. With this change we're kind of saying the opposite. Can we make those failures clean up after themselves more robustly? If not could we at least do a delete before the no-op cleanup (after acquiring all the permits) instead of the first cleanup?

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java

server/src/main/java/org/elasticsearch/snapshots/SnapshotDeleteListener.java

DaveCTurner · 2022-06-06T12:21:42Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+
+                        @Override
+                        public void onMetaUpdated(RepositoryData updatedRepoData) {
+                            removeSnapshotDeletionFromClusterState(


I worry that completely dropping the deletion from the cluster state here before completion will lead to (a higher probability of) shard-level blob leaks in situations like master failovers. As things stand today I think the new master would retry the cleanups of these shards, whereas with this change we have to do another delete that touches the same indices.

It's true that this leads to a little more leaking on the shard level statistically speaking but I'm not so worried about it. If you think about it, it's quite unlikely for an index to run into a a partially failed delete and then not be touched again for a long time. The only scenario in which this would realistically happen is when a delete fails for the last regular backup of a shard/index that has moved to frozen and thus still has a snapshot around.
In that case though, the contents of the frozen-tier backing snapshot are probably the same as those of the last backup anyway so idk.
Master fail-overs + unlikely and I can't really see a common case where we'd have a massive diff between snapshots in a shard and then never touch it again => seems not too important?
Also note, that this brings us back to how things worked pre-7.6 in a sense (where we didn't track repository generation in the CS and thus the second delete after fail-over wouldn't do anything) and in my recollection, we hardly had any blobs leak past 7.4 back then (we did collect some data on cloud about that).

I share David's concerns but on the other hand I think this change has benefits that outweigh the (apparently unlikely) inconvenience of leaking blobs, so I'm +1 on this as long as we keep an eye on this on Cloud

…tep-snapshot-delete

DaveCTurner

LGTM

original-brownbear · 2022-06-16T10:38:02Z

Thanks David and Tanguy!

As a result of elastic#86514 we now remove the snapshot delete from the cluster state before cleaning up stale shard level blobs so we have to block master earlier on deleting the root level index-N blob to get the desired cluster state where delete and snapshot creation are both in the cluster state at the same time. closes elastic#88061

…#88065) As a result of #86514 we now remove the snapshot delete from the cluster state before cleaning up stale shard level blobs so we have to block master earlier on deleting the root level index-N blob to get the desired cluster state where delete and snapshot creation are both in the cluster state at the same time. closes #88061

For large snapshot clusters the final cleanup step that removes previous shard metadata might have to delete tens of thousands of blobs. This can take many seconds, for which the repository is needlessly blocked from running subsequent finalizations or deletes. Similar to how the final cleanup step was made async and taken out of the finalization loop for deletes in elastic#86514 this PR moves the final cleanup step for finalization async.

For large snapshot clusters the final cleanup step that removes previous shard metadata might have to delete tens of thousands of blobs. This can take many seconds, for which the repository is needlessly blocked from running subsequent finalizations or deletes. Similar to how the final cleanup step was made async and taken out of the finalization loop for deletes in #86514 this PR moves the final cleanup step for finalization async.

original-brownbear added 5 commits January 21, 2022 16:00

noice

897cfe0

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

4968f0d

…lete

bckj

4cef596

fixed s3

936cfab

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

27f0246

…lete

original-brownbear added >enhancement WIP :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.3.0 labels May 6, 2022

original-brownbear added 11 commits May 9, 2022 11:32

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

067849f

…lete

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

a818fb9

…lete

simpler

dd749fa

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

b586879

…lete

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

7d69d46

…lete

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

a609e0d

…lete

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

766cae0

…lete

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

c3ba37e

…lete

fix test + better logging

7643e6c

noice

7f32d92

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

68b67dd

…lete

craigtaverner added v8.4.0 and removed v8.3.0 labels May 25, 2022

Merge remote-tracking branch 'elastic/master' into 2-step-snapshot-de…

0561b50

…lete

original-brownbear commented May 31, 2022

View reviewed changes

original-brownbear removed the WIP label May 31, 2022

original-brownbear marked this pull request as ready for review May 31, 2022 09:11

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 31, 2022

Update docs/changelog/86514.yaml

35c0098

original-brownbear requested review from DaveCTurner and tlrx May 31, 2022 10:11

DaveCTurner reviewed Jun 6, 2022

View reviewed changes

original-brownbear added 2 commits June 7, 2022 11:07

nits

67cf153

Merge remote-tracking branch 'origin/2-step-snapshot-delete' into 2-s…

ff64c59

…tep-snapshot-delete

original-brownbear requested a review from DaveCTurner June 7, 2022 09:38

DaveCTurner approved these changes Jun 16, 2022

View reviewed changes

original-brownbear merged commit 55acdfa into elastic:master Jun 16, 2022

original-brownbear deleted the 2-step-snapshot-delete branch June 16, 2022 10:38

original-brownbear mentioned this pull request Jun 27, 2022

Reproducible Failure in ConcurrentSnapshotsIT.testMasterFailoverDuringStaleIndicesCleanup #88061

Closed

original-brownbear mentioned this pull request Jun 27, 2022

Fix ConcurrentSnapshotsIT.testMasterFailoverDuringStaleIndicesCleanup #88065

Merged

original-brownbear mentioned this pull request Aug 24, 2022

Shorten time that snapshot finalization blocks repo #89572

Merged

original-brownbear restored the 2-step-snapshot-delete branch April 18, 2023 20:36

DaveCTurner mentioned this pull request Sep 30, 2023

Snapshot Deletion Could Run more Concurrently to Snapshot Creation #82853

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make snapshot deletes not block the repository during data blob deletes #86514

Make snapshot deletes not block the repository during data blob deletes #86514

original-brownbear commented May 6, 2022 •

edited

Loading

original-brownbear commented May 12, 2022

original-brownbear May 31, 2022

original-brownbear May 31, 2022

DaveCTurner Jun 6, 2022

original-brownbear Jun 7, 2022

original-brownbear May 31, 2022

elasticmachine commented May 31, 2022

elasticsearchmachine commented May 31, 2022

DaveCTurner left a comment

DaveCTurner Jun 6, 2022

DaveCTurner Jun 6, 2022

original-brownbear Jun 7, 2022

tlrx Jun 7, 2022

DaveCTurner left a comment

original-brownbear commented Jun 16, 2022

Make snapshot deletes not block the repository during data blob deletes #86514

Make snapshot deletes not block the repository during data blob deletes #86514

Conversation

original-brownbear commented May 6, 2022 • edited Loading

original-brownbear commented May 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented May 31, 2022

elasticsearchmachine commented May 31, 2022

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

original-brownbear commented Jun 16, 2022

original-brownbear commented May 6, 2022 •

edited

Loading