-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add snapshots pending deletion in cluster state to delete snapshot once index is deleted #79156
base: main
Are you sure you want to change the base?
Add snapshots pending deletion in cluster state to delete snapshot once index is deleted #79156
Conversation
…en searchable snapshot index
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial read of this and left some initial mostly minor comments. I will need more time to fully digest this.
/** | ||
* Represents snapshots marked as to be deleted and pending deletion. | ||
*/ | ||
public class SnapshotDeletionsInPending extends AbstractNamedDiffable<Custom> implements Custom { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think In
should be removed from the class name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I pushed 579c21f
public static final SnapshotDeletionsInPending EMPTY = new SnapshotDeletionsInPending(Collections.emptySortedSet()); | ||
public static final String TYPE = "snapshot_deletions_pending"; | ||
|
||
public static final int MAX_PENDING_DELETIONS = 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a setting for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion about this but it does not add much complexity and can help with edge cases so I pushed d15ac48
/** | ||
* A list of snapshots to delete, sorted by creation time | ||
*/ | ||
private final SortedSet<Entry> entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it was simpler to have a list here? That would remove any dependency on clock differences between masters and seems slightly simpler and cheaper too.
Still want the timestamp on the entry though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much more simple. I pushed 7f9c32c
builder.append('[').append(entry.repositoryName).append('/').append(entry.repositoryUuid).append(']'); | ||
builder.append('[').append(entry.snapshotId).append(',').append(entry.creationTime).append(']'); | ||
builder.append('\n'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow why there are two square brackets and a newline here? Can we format it as one entry [repo/repouuid, snapshotid, creationtime]? Also, I would prefer not to have the newline, but maybe there is precedence for having this?
Can we move the formatting of each Entry
to Entry.toString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was leftovers from debugging sessions. I pushed 88c0a0b
} | ||
|
||
public Builder add(String repositoryName, String repositoryUuid, SnapshotId snapshotId, long creationTime) { | ||
ensureLimit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can enforce this in build
alone, much like how we do in IndexGraveyard
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no need to enforce the limit every time an entry is added. I changed this as part of d15ac48
final Set<Index> indicesToDelete, | ||
final Metadata metadata | ||
) { | ||
if (indicesToDelete.isEmpty() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we ever get here without indices? And evenso, this method seems to work just fine without this outer level if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I pushed 9151c28
for (SnapshotDeletionsInPending.Entry snapshot : snapshotDeletionsInPending.entries()) { | ||
final SnapshotId snapshotId = snapshot.getSnapshotId(); | ||
|
||
// early add to avoid doing too much work on successive cluster state updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean "concurrent cluster state updates"? Otherwise I wonder if we might not as well add it where you assign triggered=true
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be concurrent cluster state updates here since this work is done on the cluster state applier thread, and only from there the we can simplify this like you suggested: 12835bb
Thanks for your feedback @henningandersen ! Let me know when you'll have more :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a second round of all the production changes, will await a 3rd round before I look into tests too.
); | ||
|
||
/** | ||
* A list of snapshots to delete, sorted by creation time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small precision improvement (time stamps may not be strictly in order).
* A list of snapshots to delete, sorted by creation time | |
* A list of snapshots to delete, in the order deletions were requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I pushed 77dfe00
|
||
@Override | ||
public Version getMinimalSupportedVersion() { | ||
return Version.CURRENT.minimumCompatibilityVersion(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a fixed version, i.e., Version.V_8_1_0
to avoid streaming this to 8.0 and 7.16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I pushed 77dfe00
boolean changed = false; | ||
final List<Entry> updatedEntries = new ArrayList<>(); | ||
for (Entry entry : entries) { | ||
if (snapshotIds.contains(entry.snapshotId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should build the set of snapshotIds
to make this lookup a hash-lookup rather than a linear scan? Just to avoid degenerate cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I changed this in 77dfe00.
final Iterator<Entry> iterator = entries.stream().iterator(); | ||
while (iterator.hasNext()) { | ||
if (prepend == false) { | ||
builder.append(','); | ||
} | ||
builder.append(iterator.next()); | ||
prepend = false; | ||
} | ||
builder.append(']'); | ||
return builder.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nearly identical output to entries.toString()
, except for a space. I wonder if we can use that here instead. OK to leave as is if you prefer this variant (but then I wonder if we need this as a utility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this to add extra information when debugging, but it's not needed anymore. I changed to what you suggested as part of 77dfe00.
server/src/main/java/org/elasticsearch/cluster/SnapshotDeletionsPending.java
Show resolved
Hide resolved
} | ||
|
||
private static Set<SnapshotId> listOfRestoreSources(final ClusterState state) { | ||
final Set<SnapshotId> snapshotIds = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if this could not be a one-liner stream-map-collect? The 3 methods are sort of doing similar work but written with different styles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the methods in 77dfe00 and used streams + filters to make the method more similar.
* | ||
* @param state the current {@link ClusterState} | ||
*/ | ||
private void triggerSnapshotsPendingDeletions(final ClusterState state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we short circuit this processing in some cases, like when none of the criteria used here are changed:
- Pending deletes
- Restores in progress (includes clones)
- Deletions in progress
(perhaps more)?
If we have 100's in the pending list that we cannot delete, it could become a continuous tax on the master to process this for every cluster state update? It may not be bad enough to warrant a complex check, but it would be nice if we can do a simple short circuit catching many cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this suggestion is a very good one but it required a larger change than I expected (see 916689e).
I reworked this and introduced the pendingDeletionsChanged(ClusterChangedEvent)
and pendingDeletionsWithConflictsChanged(ClusterChangedEvent)
methods that checks the cluster state update and returns true
if it is worth to iterate over the pending snapshot deletions. Those methods are based on Set<SnapshotId>
that are kept and updated locally on the master node. Those sets are used to know if a pending snapshot deletion is waiting for a conflicting situation, for example a restore, to be completed.
shouldRetry = RepositoryData.MISSING_UUID.equals(repositoryUuid) == false; | ||
|
||
} else if (e instanceof ConcurrentSnapshotExecutionException) { | ||
logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should ever get here? Since we check for clone, in progress snapshot and deletion before submitting it here, I think it will never throw this. AFAICS, we handle concurrent deletes sliently?
If we think it should not happen, I would keep this but add assert false
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we handle concurrent deletes silently but I think we can still get there with concurrent repository clean ups. I changed this code to handle all exceptions the same way (except snapshot missing exception) in cf66c46.
snapshotId | ||
); | ||
} else if (e instanceof RepositoryMissingException) { | ||
logger.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warn or debug here is tricky. If this ever happens we would certainly want to log it. But if it happens, we would then be spamming the log I think with this message for every cluster state update (or at least often) and per snapshot-id.
It ties in with how we handle repo deletions. If we only allow forcing those when there are pending deletes, we could perhaps log (and respond) the these snapshots at that time. Also, we could log the snapshots when adding to the pending deletions list if the repo is not even there when the index is deleted?
I think I am ok with this as is, provided that we address this in follow-ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ties in with how we handle repo deletions. If we only allow forcing those when there are pending deletes, we could perhaps log (and respond) the these snapshots at that time.
I think this is a good suggestion. I'd like to do this in a follow up.
we could log the snapshots when adding to the pending deletions list if the repo is not even there when the index is deleted
I added a warning in MetadataDeleteIndexService#updateSnapshotDeletionsPending()
for this.
continue; | ||
} | ||
|
||
// should we add some throttling to not always retry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sort of makes sense. Perhaps we can even give up if the snapshot fails to be deleted for reasons other than repo missing enough times over enough time.
IIUC, we should really only retry for "system" issues, i.e., no available connection to repo, network issues, repo-issue or similar I think? Which would be the sort of things where some back-off will help and no back-off could be harmful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented a basic retry mechanism in cf66c46 that retries all exceptions at a constant interval (30s by default) over a given expiration time after which the retries will be stopped and the pending snapshot deletion removed from the cluster state with a warning message.
This expiration delay is only evaluated after the pending deletion has been already triggered and failed. It is still possible for a pending deletion to stay in cluster state for a long time if the deletion is blocked by a missing/read only repository or a conflicting operation.
I wonder if this'd be simpler if we turn it around and instead introduce a call to If we did that it'd mostly work ok except for the period where we've written an updated |
Note: This pull request is another attempt to delete searchable snapshot when the mounted index is deleted. It is extracted from #75565 but this one stores the information of the snapshot to delete in the cluster state as
SnapshotDeletionsInPending
custom objects whereas the older ones tried to store those information in theRepositoryMetadata
.In #74977 we introduced a new index setting index.store.snapshot.delete_searchable_snapshot that can be set when mounting a snapshot as an index to inform that the snapshot should be deleted once the searchable snapshot index is deleted.
The previous pull request adds the index setting and the verifications around it. This pull request now adds the logic to detect that a searchable snapshot index with this specific logic is being deleted and triggers the deletion of the backing snapshot.
In order to do this, when a searchable snapshot index is deleted we check if the setting index.store.snapshot.delete_searchable_snapshot is set. If the index to be deleted is the last searchable snapshot index that uses the snapshot then the snapshot informations are added to cluster state in a new
SnapshotDeletionsInPending
custom object.Once a snapshot is pending deletion it cannot be cloned, mounted or restored in the cluster.
Snapshots pending deletions are deleted by the SnapshotsService. On cluster state updates the SnapshotsService retrieves the list of snapshots to delete and trigger the deletion by executing an explicit snapshot delete request. Deletions of snapshots are executed per repository, the service tries to prevent conflicting situations before triggering deletions.
There are situations were a snapshot pending deletion could not be deleted, for example if a repository is updated to point to a different location or if a repository is set to read-only. In such cases the snapshot information is kept around in
SnapshotDeletionsInPending
so that the deletion can be retried in the future. A hard limit of 5000 pending snapshot is implemented to avoid cluster state 💥 .