Add snapshots pending deletion in cluster state to delete snapshot once index is deleted #79156

tlrx · 2021-10-14T14:34:26Z

Note: This pull request is another attempt to delete searchable snapshot when the mounted index is deleted. It is extracted from #75565 but this one stores the information of the snapshot to delete in the cluster state as SnapshotDeletionsInPending custom objects whereas the older ones tried to store those information in the RepositoryMetadata.

In #74977 we introduced a new index setting index.store.snapshot.delete_searchable_snapshot that can be set when mounting a snapshot as an index to inform that the snapshot should be deleted once the searchable snapshot index is deleted.

The previous pull request adds the index setting and the verifications around it. This pull request now adds the logic to detect that a searchable snapshot index with this specific logic is being deleted and triggers the deletion of the backing snapshot.

In order to do this, when a searchable snapshot index is deleted we check if the setting index.store.snapshot.delete_searchable_snapshot is set. If the index to be deleted is the last searchable snapshot index that uses the snapshot then the snapshot informations are added to cluster state in a new SnapshotDeletionsInPending custom object.

Once a snapshot is pending deletion it cannot be cloned, mounted or restored in the cluster.

Snapshots pending deletions are deleted by the SnapshotsService. On cluster state updates the SnapshotsService retrieves the list of snapshots to delete and trigger the deletion by executing an explicit snapshot delete request. Deletions of snapshots are executed per repository, the service tries to prevent conflicting situations before triggering deletions.

There are situations were a snapshot pending deletion could not be deleted, for example if a repository is updated to point to a different location or if a repository is set to read-only. In such cases the snapshot information is kept around in SnapshotDeletionsInPending so that the deletion can be retried in the future. A hard limit of 5000 pending snapshot is implemented to avoid cluster state 💥 .

…en searchable snapshot index

elasticmachine · 2021-10-14T14:34:32Z

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen

I did an initial read of this and left some initial mostly minor comments. I will need more time to fully digest this.

henningandersen · 2021-10-15T09:21:29Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsInPending.java

+/**
+ * Represents snapshots marked as to be deleted and pending deletion.
+ */
+public class SnapshotDeletionsInPending extends AbstractNamedDiffable<Custom> implements Custom {


I think In should be removed from the class name?

Sure, I pushed 579c21f

henningandersen · 2021-10-15T09:24:37Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsInPending.java

+    public static final SnapshotDeletionsInPending EMPTY = new SnapshotDeletionsInPending(Collections.emptySortedSet());
+    public static final String TYPE = "snapshot_deletions_pending";
+
+    public static final int MAX_PENDING_DELETIONS = 500;


Do we need a setting for this?

I don't have a strong opinion about this but it does not add much complexity and can help with edge cases so I pushed d15ac48

henningandersen · 2021-10-15T09:28:37Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsInPending.java

+    /**
+     * A list of snapshots to delete, sorted by creation time
+     */
+    private final SortedSet<Entry> entries;


I wonder if it was simpler to have a list here? That would remove any dependency on clock differences between masters and seems slightly simpler and cheaper too.

Still want the timestamp on the entry though.

Yes, much more simple. I pushed 7f9c32c

henningandersen · 2021-10-15T09:35:20Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsInPending.java

+            builder.append('[').append(entry.repositoryName).append('/').append(entry.repositoryUuid).append(']');
+            builder.append('[').append(entry.snapshotId).append(',').append(entry.creationTime).append(']');
+            builder.append('\n');


I am not sure I follow why there are two square brackets and a newline here? Can we format it as one entry [repo/repouuid, snapshotid, creationtime]? Also, I would prefer not to have the newline, but maybe there is precedence for having this?

Can we move the formatting of each Entry to Entry.toString?

Sorry, that was leftovers from debugging sessions. I pushed 88c0a0b

henningandersen · 2021-10-15T09:38:16Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsInPending.java

+        }
+
+        public Builder add(String repositoryName, String repositoryUuid, SnapshotId snapshotId, long creationTime) {
+            ensureLimit();


I think we can enforce this in build alone, much like how we do in IndexGraveyard.

Yes, no need to enforce the limit every time an entry is added. I changed this as part of d15ac48

henningandersen · 2021-10-15T09:56:01Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataDeleteIndexService.java

+        final Set<Index> indicesToDelete,
+        final Metadata metadata
+    ) {
+        if (indicesToDelete.isEmpty() == false) {


Will we ever get here without indices? And evenso, this method seems to work just fine without this outer level if?

Agreed, I pushed 9151c28

henningandersen · 2021-10-15T10:58:45Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+        for (SnapshotDeletionsInPending.Entry snapshot : snapshotDeletionsInPending.entries()) {
+            final SnapshotId snapshotId = snapshot.getSnapshotId();
+
+            // early add to avoid doing too much work on successive cluster state updates


Did you mean "concurrent cluster state updates"? Otherwise I wonder if we might not as well add it where you assign triggered=true instead.

There should not be concurrent cluster state updates here since this work is done on the cluster state applier thread, and only from there the we can simplify this like you suggested: 12835bb

…letes

tlrx · 2021-10-18T11:31:51Z

Thanks for your feedback @henningandersen ! Let me know when you'll have more :)

henningandersen

I have done a second round of all the production changes, will await a 3rd round before I look into tests too.

henningandersen · 2021-11-02T09:09:39Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsPending.java

+    );
+
+    /**
+     * A list of snapshots to delete, sorted by creation time


A small precision improvement (time stamps may not be strictly in order).

Suggested change

* A list of snapshots to delete, sorted by creation time

* A list of snapshots to delete, in the order deletions were requests.

OK, I pushed 77dfe00

henningandersen · 2021-11-02T09:12:01Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsPending.java

+
+    @Override
+    public Version getMinimalSupportedVersion() {
+        return Version.CURRENT.minimumCompatibilityVersion();


I think this should be a fixed version, i.e., Version.V_8_1_0 to avoid streaming this to 8.0 and 7.16?

Right, I pushed 77dfe00

henningandersen · 2021-11-02T09:21:48Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsPending.java

+        boolean changed = false;
+        final List<Entry> updatedEntries = new ArrayList<>();
+        for (Entry entry : entries) {
+            if (snapshotIds.contains(entry.snapshotId)) {


I wonder if we should build the set of snapshotIds to make this lookup a hash-lookup rather than a linear scan? Just to avoid degenerate cases.

Makes sense, I changed this in 77dfe00.

henningandersen · 2021-11-02T09:30:56Z

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsPending.java

+        final Iterator<Entry> iterator = entries.stream().iterator();
+        while (iterator.hasNext()) {
+            if (prepend == false) {
+                builder.append(',');
+            }
+            builder.append(iterator.next());
+            prepend = false;
+        }
+        builder.append(']');
+        return builder.toString();


This is nearly identical output to entries.toString(), except for a space. I wonder if we can use that here instead. OK to leave as is if you prefer this variant (but then I wonder if we need this as a utility).

I used this to add extra information when debugging, but it's not needed anymore. I changed to what you suggested as part of 77dfe00.

server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsPending.java

henningandersen · 2021-11-03T08:33:26Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+    }
+
+    private static Set<SnapshotId> listOfRestoreSources(final ClusterState state) {
+        final Set<SnapshotId> snapshotIds = new HashSet<>();


nit: I wonder if this could not be a one-liner stream-map-collect? The 3 methods are sort of doing similar work but written with different styles.

I renamed the methods in 77dfe00 and used streams + filters to make the method more similar.

henningandersen · 2021-11-03T08:42:15Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+     *
+     * @param state the current {@link ClusterState}
+     */
+    private void triggerSnapshotsPendingDeletions(final ClusterState state) {


Can we short circuit this processing in some cases, like when none of the criteria used here are changed:

Pending deletes

Restores in progress (includes clones)

Deletions in progress

(perhaps more)?

If we have 100's in the pending list that we cannot delete, it could become a continuous tax on the master to process this for every cluster state update? It may not be bad enough to warrant a complex check, but it would be nice if we can do a simple short circuit catching many cases.

Ok, this suggestion is a very good one but it required a larger change than I expected (see 916689e).

I reworked this and introduced the pendingDeletionsChanged(ClusterChangedEvent) and pendingDeletionsWithConflictsChanged(ClusterChangedEvent) methods that checks the cluster state update and returns true if it is worth to iterate over the pending snapshot deletions. Those methods are based on Set<SnapshotId> that are kept and updated locally on the master node. Those sets are used to know if a pending snapshot deletion is waiting for a conflicting situation, for example a restore, to be completed.

henningandersen · 2021-11-03T08:52:59Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+                            shouldRetry = RepositoryData.MISSING_UUID.equals(repositoryUuid) == false;
+
+                        } else if (e instanceof ConcurrentSnapshotExecutionException) {
+                            logger.debug(


I wonder if we should ever get here? Since we check for clone, in progress snapshot and deletion before submitting it here, I think it will never throw this. AFAICS, we handle concurrent deletes sliently?

If we think it should not happen, I would keep this but add assert false here.

Yes, we handle concurrent deletes silently but I think we can still get there with concurrent repository clean ups. I changed this code to handle all exceptions the same way (except snapshot missing exception) in cf66c46.

henningandersen · 2021-11-03T09:01:28Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+                                snapshotId
+                            );
+                        } else if (e instanceof RepositoryMissingException) {
+                            logger.warn(


warn or debug here is tricky. If this ever happens we would certainly want to log it. But if it happens, we would then be spamming the log I think with this message for every cluster state update (or at least often) and per snapshot-id.

It ties in with how we handle repo deletions. If we only allow forcing those when there are pending deletes, we could perhaps log (and respond) the these snapshots at that time. Also, we could log the snapshots when adding to the pending deletions list if the repo is not even there when the index is deleted?

I think I am ok with this as is, provided that we address this in follow-ups.

It ties in with how we handle repo deletions. If we only allow forcing those when there are pending deletes, we could perhaps log (and respond) the these snapshots at that time.

I think this is a good suggestion. I'd like to do this in a follow up.

we could log the snapshots when adding to the pending deletions list if the repo is not even there when the index is deleted

I added a warning in MetadataDeleteIndexService#updateSnapshotDeletionsPending() for this.

henningandersen · 2021-11-03T09:10:14Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java

+                continue;
+            }
+
+            // should we add some throttling to not always retry?


Yeah, that sort of makes sense. Perhaps we can even give up if the snapshot fails to be deleted for reasons other than repo missing enough times over enough time.

IIUC, we should really only retry for "system" issues, i.e., no available connection to repo, network issues, repo-issue or similar I think? Which would be the sort of things where some back-off will help and no back-off could be harmful.

I implemented a basic retry mechanism in cf66c46 that retries all exceptions at a constant interval (30s by default) over a given expiration time after which the retries will be stopped and the pending snapshot deletion removed from the cluster state with a warning message.

This expiration delay is only evaluated after the pending deletion has been already triggered and failed. It is still possible for a pending deletion to stay in cluster state for a long time if the deletion is blocked by a missing/read only repository or a conflicting operation.

…letes

DaveCTurner · 2024-11-15T08:22:09Z

I wonder if this'd be simpler if we turn it around and instead introduce a call to MetadataDeleteIndexService#deleteIndices(ClusterState, ...) in the cluster state update that commits the snapshot deletion (i.e. the one submitted in BlobStoreRepository#writeIndexGen) so that the index deletion happens in the same cluster state update.

If we did that it'd mostly work ok except for the period where we've written an updated index-N blob but not updated the repository generation in the cluster state. If there's a master failover during this period then the new master can at least tell it has to do some fixup (the repository generations in the cluster state are not equal) and I think we can include in this fixup process a check for indices-to-be-deleted so we don't miss any.

Add snapshots pending deletion in cluster state to delete snapshot wh…

9a73ba7

…en searchable snapshot index

tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Oct 14, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 14, 2021

elasticsearchmachine added the v8.0.0 label Oct 14, 2021

tlrx requested a review from henningandersen October 14, 2021 15:43

tlrx mentioned this pull request Oct 15, 2021

Delete backing snapshot when searchable snapshot index is deleted #75565

Closed

henningandersen reviewed Oct 15, 2021

View reviewed changes

tlrx added 8 commits October 18, 2021 10:08

rename to SnapshotDeletionsPending

579c21f

MAX_PENDING_DELETIONS_SETTING

d15ac48

list

7f9c32c

tostring

88c0a0b

remove if is empty

9151c28

remove if

12835bb

Merge branch 'master' into delete-searchable-snapshot-with-pending-de…

7a6ec3f

…letes

remove triggered

7be719b

tlrx requested a review from henningandersen October 18, 2021 11:31

henningandersen reviewed Nov 3, 2021

View reviewed changes

tlrx added 10 commits November 9, 2021 12:20

format conflicting files

086889d

Merge branch 'master' into delete-searchable-snapshot-with-pending-de…

26b255f

…letes

nits

77dfe00

feedback

ea065e9

Merge branch 'master' into delete-searchable-snapshot-with-pending-de…

ef6ca4c

…letes

spotless

91095b7

Merge branch 'master' into delete-searchable-snapshot-with-pending-de…

13a7f69

…letes

skip cluster state updates

916689e

add repo clean up test

858d4c1

also clean ups

34de0d0

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

tlrx mentioned this pull request Aug 3, 2022

Prevent deletion of snapshots that are used by snapshot backed indices #73821

Closed

csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022

kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

pugnascotia added v8.10.0 and removed v8.9.0 labels Jun 22, 2023

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

kunisen mentioned this pull request May 9, 2024

Searchable snapshot should not be able to be deleted if it's in use by a frozen index #108450

Closed

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

tlrx mentioned this pull request Nov 14, 2024

ILM delete phase deletes searchable snapshot before index #116801

Open

	* A list of snapshots to delete, sorted by creation time
	* A list of snapshots to delete, in the order deletions were requests.

Add snapshots pending deletion in cluster state to delete snapshot once index is deleted #79156

Are you sure you want to change the base?

Add snapshots pending deletion in cluster state to delete snapshot once index is deleted #79156

Conversation

tlrx commented Oct 14, 2021 • edited Loading

elasticmachine commented Oct 14, 2021

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Oct 18, 2021

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner commented Nov 15, 2024

tlrx commented Oct 14, 2021 •

edited

Loading