-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete backing snapshot when searchable snapshot index is deleted #75565
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
@elasticmachine run elasticsearch-ci/docs |
@original-brownbear @DaveCTurner I know this is a big pull request and you are busy with other things but I'd appreciate your feedback once you find the time to give one :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tlrx, sorry it's taken a while to get around to this. I left a few comments that I hope will help to simplify it a bit.
Do you think we should hide the snapshots-to-be-deleted from APIs like get-snapshots and get-snapshot-status?
continue; // other index is backed by a different snapshot, skip | ||
} | ||
final String otherRepositoryName = repositoryNameFromIndexSettings(currentState, otherSettings); | ||
if (Objects.equals(repositoryName, otherRepositoryName) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might break in some odd corner cases involving registering the same repository under multiple names (maybe without repository UUIDs). But do we need to check this? By this point we know the snapshot UUID matches, that should be enough to tell us not to delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and I removed this check in e1995d7
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataDeleteIndexService.java
Show resolved
Hide resolved
boolean changed = false; | ||
for (Map.Entry<String, Set<SnapshotId>> snapshotToDelete : snapshotsToDelete.entrySet()) { | ||
RepositoryMetadata repository = repositories.repository(snapshotToDelete.getKey()); | ||
if (repository != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be null and actually we don't need to re read the repository metadata here so I pushed e1995d7
private final Map<String, Integer> numberOfOnGoingSnapshotsToDeletes = new HashMap<>(); | ||
|
||
private volatile int maxConcurrentSnapshotsToDeletes; | ||
private volatile TimeValue snapshotsToDeleteRetryInterval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A time-based retry feels wrong to me. We retry when concurrent operations block our delete attempt, which is ok, but the blocking operations are removed by a cluster state update so I think we should make some attempt to detect conflicting operations and retry when applying a cluster state update that appears to have no conflicts rather than by using a timer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detecting conflicting situations and not trigger snapshot deletions sounds like a good idea, thanks. I refactored the deletion logic in 3e8dc65 to detect conflicts and do nothing but rely on subsequent cluster state updates to allow and to trigger snapshot deletions.
Still, I went back and forth with the time base retry. I agree this should not be the main path but I think that it will be useful in some situations, like the snapshot deletions failing because of expired repository or network issue when accessing the repository. I made an infinite retry with 30s delay but we could imagine some backoff policy or just a max number of retries before giving up.
*/ | ||
private final Map<String, Integer> numberOfOnGoingSnapshotsToDeletes = new HashMap<>(); | ||
|
||
private volatile int maxConcurrentSnapshotsToDeletes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We basically coalesce all snapshot deletions on a repository into a single operation (see e.g. the reusedExistingDelete
flag) so I think it would be preferable, and simpler, not to have this kind of concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, thanks for reminding me this. I've refactored and removed this "concurrency".
try { | ||
logger.debug("[{}] triggering deletion of snapshot [{}]", repository, snapshotId); | ||
final PlainActionFuture<Void> future = PlainActionFuture.newFuture(); | ||
deleteSnapshots(new DeleteSnapshotRequest(repository, snapshotId.getName()), future); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're relying on the name of the snapshot here, but I don't think we protect against anyone deleting the snapshot manually and then creating a new one with the same name while this potentially-delayed action is in flight, so we might end up deleting the wrong snapshot here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.debug("[{}] triggering deletion of snapshot [{}]", repository, snapshotId); | ||
final PlainActionFuture<Void> future = PlainActionFuture.newFuture(); | ||
deleteSnapshots(new DeleteSnapshotRequest(repository, snapshotId.getName()), future); | ||
future.actionGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to block a thread to wait for completion here, we should be able to handle the response async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I reworked this.
/** | ||
* List of {@link org.elasticsearch.snapshots.SnapshotId} marked as "to delete" | ||
*/ | ||
private final List<SnapshotId> snapshotsToDelete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised that if we track the pending deletes it here then it'll get lost if someone does a put or delete on this repo, maybe just to adjust its settings or rename it or something. Maybe we should use a separate Metadata.Custom
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I took care of propagating the pending deletes in case of a put on the repository but actually you're right, there's a missing bit I think (see canUpdateInPlace(newRepositoryMetadata, existing)
in RepositoriesService
).
Putting the pending deletes in a separate Custom
also makes sense but we have to decide how to match the repository when it's re-created (by uuid?) and how long should the pending deletes be kept in cluster state. I think it would be easier to keep the pending deletes where it is today and prevent the deletion of a repository if it still have pending deletes (like we prevent repo deletion if there are ongoing operations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ there's no easy answer. Blocking the removal of the repository would worry me a bit, if we introduced some awful bug that made deletes fail then we could be retrying forever without a way to fix it. I guess we'd also want to prevent marking a repo as readonly for similar reasons. Maybe we'd need a ?force
parameter to the delete repo API as an escape hatch just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored things a bit and I added some specific checks for read-only repositories so that we don't trigger snapshot deletions on such repositories. Pending deletes are still added to the repository metadata so that they'll be processed once the repository is writeable again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I did not implement the logic to prevent deleting repository (or making it read-only) when they have pending deletes. I'd like to discuss and address this as a follow up if it is ok for you.
@DaveCTurner Thanks for your feedback and sorry for the time it took me to apply your comments to the code. I've reimplemented the deletion logic which should be better now but maybe not yet bullet proof.
I'm tempted to not hide the pending snapshots deletions in these APIs so that they reflect what exist exactly in the repository. Maybe we could add a runtime flag to the response that will indicate pending deletions. |
ping @DaveCTurner - sorry for the long standing PR but if you have the time to have another look I'd be grateful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a number of minor comments, but the main ones are around the identity of the deletion and where to store it.
I wish we could rely on repository uuids, but I suppose we cannot due to bwc?
.custom(RepositoriesMetadata.TYPE, RepositoriesMetadata.EMPTY) | ||
.repository(repositoryName); | ||
if (repositoryMetadata != null && repositoryMetadata.snapshotsToDelete().contains(sourceSnapshotId)) { | ||
throw new ConcurrentSnapshotExecutionException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, I wonder if we want similar protection in TransportMountSearchableSnapshotAction
?
@@ -217,7 +217,7 @@ public ClusterState execute(ClusterState currentState) { | |||
updatedMetadata = repositoryMetadata.withSettings(newRepositoryMetadata.settings()); | |||
} else { | |||
ensureRepositoryNotInUse(currentState, request.name()); | |||
updatedMetadata = newRepositoryMetadata; | |||
updatedMetadata = newRepositoryMetadata.withSnapshotsToDelete(repositoryMetadata.snapshotsToDelete()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could update the repo-reference to point to a completely different repo. I wonder if we should not registere the repository uuid together with the snapshot to delete - and then remove the snapshots to delete if the uuid does not match when it is assigned later?
Also relates to David's comment in RepositoryMetadata
, keeping the list of snapshots to delete separate from the repo-registration sort of makes sense I think.
final String suffix = getTestName().toLowerCase(Locale.ROOT); | ||
final String repository = "repository-" + suffix; | ||
createRepository(repository, FsRepository.TYPE, randomRepositorySettings()); | ||
|
||
final String index = "index-" + suffix; | ||
createAndPopulateIndex(index, Settings.builder().put(INDEX_SOFT_DELETES_SETTING.getKey(), true)); | ||
|
||
final TotalHits totalHits = internalCluster().client().prepareSearch(index).setTrackTotalHits(true).get().getHits().getTotalHits(); | ||
|
||
final String snapshot = "snapshot-" + suffix; | ||
createSnapshot(repository, snapshot, List.of(index)); | ||
assertAcked(client().admin().indices().prepareDelete(index)); | ||
|
||
final String mounted = mountSnapshot(repository, snapshot, index, deleteSnapshotIndexSettings(true)); | ||
assertHitCount(client().prepareSearch(mounted).setTrackTotalHits(true).get(), totalHits.value); | ||
assertAcked(client().admin().indices().prepareDelete(mounted)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this first part is the same across several tests, perhaps we can refactor out a setup method?
return repository.name(); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should return null in case there is a repo-uuid on the index but no such repo was found? This works differently from RepositoriesService.indexSettingsMatchRepositoryMetadata
if (snapshotsToDelete.isEmpty() == false) { | ||
RepositoriesMetadata repositories = currentState.metadata().custom(RepositoriesMetadata.TYPE, RepositoriesMetadata.EMPTY); | ||
for (Map.Entry<String, Set<SnapshotId>> snapshotToDelete : snapshotsToDelete.entrySet()) { | ||
repositories = repositories.addSnapshotsToDelete(snapshotToDelete.getKey(), snapshotToDelete.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we risk getting an IllegalArgumentException
from RepositoriesMetadata.withUpdate
when the repo has been deleted while deleting the index.
*/ | ||
private void deleteSnapshots( | ||
final Function<SnapshotId, String> mapping, | ||
final DeleteSnapshotRequest request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to let this method have the raw parameters rather than use the request, since in the uuid case it is sort of bending the original purpose of DeleteSnapshotRequest
.
), | ||
e | ||
); | ||
} else if (e instanceof RepositoryMissingException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want to retry when/if the repo reappears? Ties in with the discussion on where the list of snapshots to delete lives.
snapshotId | ||
); | ||
} else { | ||
logger.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am semi-torn on this. We log at warn, but retry every 30s. If we think we can recover via retries, perhaps we should be more silent (debug) about it?
logger.trace("snapshot to delete [{}] is being cloned, waiting for operation to complete", snapshotId); | ||
} else if (currentDeletions.contains(snapshotId)) { | ||
logger.trace("snapshot to delete [{}] is already queued", snapshotId); | ||
} else if (onGoingSnapshotsDeletions.add(snapshotId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ongoingSnapshotDeletions
?
assertThat(updatedRepos.repository("repo_name").snapshotsToDelete(), hasSize(1)); | ||
assertThat(updatedRepos.repository("repo_name").snapshotsToDelete(), hasItem(new SnapshotId("snap_name", "snap_uuid"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be:
assertThat(updatedRepos.repository("repo_name").snapshotsToDelete(), equalTo(List.of(new SnapshotId("snap_name", "snap_uuid"))));
Thanks @DaveCTurner and @henningandersen. I've opened #79156 to see how it looks like when snapshots pending deletion are stored as dedicated |
Hi @tlrx, I've created a changelog YAML for you. |
Closing this one, a better implementation is #79156 |
In #74977 we introduced a new index setting
index.store.snapshot.delete_searchable_snapshot
that can be set when mounting a snapshot as an index to inform that the snapshot should be deleted once the searchable snapshot index is deleted.The previous pull request adds the index setting and the verifications around it. This pull request now adds the logic to detect that a searchable snapshot index with this specific logic is being deleted and triggers the deletion of the backing snapshot.
In order to do this, when a searchable snapshot index is deleted we check if the setting
index.store.snapshot.delete_searchable_snapshot
is set. If the index to be deleted is the last searchable snapshot index that uses the snapshot then the snapshot informations are added to theRepositoryMetadata
in the cluster state in a list of "snapshots to delete". Once a snapshot is marked as "to delete" and appears in the repository metadata the snapshot cannot be cloned, mounted or restored in the cluster.Snapshots marked as "to delete" are deleted in the background by the
SnapshotsService
. On cluster state updates theSnapshotsService
retrieve the list of snapshots to delete and trigger the deletion by executing an explicit snapshot delete request (I tried to sneak into the snapshot state machine to directly add the snapshots asSnapshotDeletionInProgress
but this requires a consistent view of the repository that is not available at the time the snapshots are marked as to delete).Deletions of snapshots marked as "to delete" are executed per repository with some limitations to avoid hundreds of snapshots being deleted concurrently.
This PR is split into separate commits that should be reviewable on their own.
Next steps will be to make use of this setting in ILM so that it does not require a specific "delete searchable snapshot" action but instead rely on the mechanism implemented here to clean up snapshots; and finally prevent deletion of snapshots used by mounted indices (this will need BWC support).