Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce searchable snapshots index setting for cascade deletion of snapshots #74977

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jul 6, 2021

Today there is no relationship between the lifecycle of a snapshot mounted as an index and the lifecycle of index itself. This lack of relationship makes it possible to delete a snapshot in a repository while the mounted index still exists, causing the shards to fail in the future.

On the other hand creating a snapshot that contains a single index to later mount it as a searchable snapshot index becomes more and more natural for users and ILM; but it comes with the risk to forget to delete the snapshot when the searchable snapshot index is deleted from the cluster, "leaking" snapshots in repositories.

We'd like to improve the situation and provide a way to automatically delete the snapshot when the mounted index is deleted. To opt in for this behaviour, a user has to enable a specific index setting when mounting the snapshot (the proposed name for the setting is index.store.snapshot.delete_searchable_snapshot). Elasticsearch then verifies that the snapshot to mount contains only 1 snapshotted index and that the snapshot is not used by another mounted index with a different value for the opt in setting, and mounts the snapshot with the new private index setting. This is the part implemented in this pull request.

In follow-up pull requests this index setting will be used when the last mounted index is deleted and removed from the cluster state in order to add the searchable snapshot id to a list of snapshots to delete in the repository metadata. Snapshots that are marked as "to delete" will not be able to be restored or cloned, and Elasticsearch will take care of deleting the snapshot as soon as possible. Then ILM will be changed to use this setting when mounting a snapshot as a cold or frozen index and delete_searchable_snapshot option in ILM will be removed. Finally, deleting a snapshot that is still used by mounted indices will be prevented.

Note to reviewers: in order to not break ILM and make this change reviewable I decided to split this into multiple PRs and to describe the path here. Let me know if you prefer a single big PR instead (that would look like this). Alternatively we can go progressively here too.

@tlrx tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 labels Jul 6, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tlrx I have some questions on the end goal in terms of behavior here :) I'm fine reviewing in chunks or all in one, whatever works for you. A 1k diff PR is fine too if chunking this into pieces adds overhead for you.

snapshotInfo.snapshotId().getName(),
String.format(
Locale.ROOT,
"cannot mount snapshot [%s/%s:%s] as index [%s] with the deletion of snapshot on index removal enabled "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this quite confusing. If we only have a single index in the snapshot and already have it mounted, wouldn't it make sense to forbid mounting it yet again regardless of settings? It seems like this situation is just wasting resources and you could just as well add an alias if you wanted to use that same snapshot via a different name?
Also, looking at this from a different perspective where we don't change anything in the above, wouldn't it make sense to simply force the setting to true for all mounts as soon as there is one mount with true. Then on index delete we can run the deletion only if there's no more mounts of the snapshot around (so only deleting the last mount deletes the snapshot) instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this quite confusing. If we only have a single index in the snapshot and already have it mounted, wouldn't it make sense to forbid mounting it yet again regardless of settings? It seems like this situation is just wasting resources and you could just as well add an alias if you wanted to use that same snapshot via a different name?

I didn't want to break the current behaviour but I guess that's something we can do if needed. Allowing to mount the same snapshot make it easier to transition an index from fully mounted to partially mounted. If we were to forbid the same single index snapshot to be mounted multiple times then it requires to clone the snapshot to transition from fully to partially, which is what I'd like ILM to do but I don't have a strong opinion for non-ILM use cases.

Then on index delete we can run the deletion only if there's no more mounts of the snapshot around (so only deleting the last mount deletes the snapshot) instead?

I had this idea as it would avoid ILM to clone the snapshot when moving to frozen, but I finally found it easier to reason about when only 1 index is linked to 1 snapshot. Again, no strong opinion so I'd like to hear what others think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal of cloning before frozen has an issue related to snapshots of searchable snapshots. An example:

  1. Index I1 is mounted on cold tier using snapshot S1.
  2. ILM clones S1 into S2 and partially mounts I2 using S2
  3. ILM does alias swap and delete index I1, which also deletes S1.

Assume that shortly after, a disaster happens and there is a need to restore the cluster from snapshot. The last snapshot was taken before the steps above were taken. That snapshot will still contain index I1 mounted using snapshot S1, which no longer exists. Manual intervention will be necessary to fix this. It is much worse than at the end of the ILM lifecycle, since the data is supposed to still exist in the transition cold to frozen, whereas after the delete phase, the expectation is that the data is gone.

If we instead allow multiple mounts of the same single index snapshot, with the requirement that they must all have the flag set (or not), the transition from cold to frozen will not have this issue with snapshots of searchable snapshots. I am therefore in favor of going that route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @henningandersen for finding the right argument 👍 I am going to revisit the code here to allow multiple mounts; that will also simplify ILM changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 76480bb to allow snapshots to be mounted multiple times but only with the same value for the new setting.

@original-brownbear original-brownbear self-requested a review July 8, 2021 14:07
@tlrx
Copy link
Member Author

tlrx commented Jul 8, 2021

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

false,
Setting.Property.IndexScope,
Setting.Property.PrivateIndex,
Setting.Property.NotCopyableOnResize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we shrink/split/clone searchable snapshots and what is the result (a regular index or a searchable snapshot)? Not really a comment to this PR, just a question 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect shrink & split actions to fail because if the number of shards is changed it won't match the number of shards in the snapshot. I'll take a look at this separately, we should prevent this I think. Clone should work but I'll verify this too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #75227 to prevent resize actions on searchable snapshot indices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloning is supported since #56595

public static final Setting<Boolean> DELETE_SEARCHABLE_SNAPSHOT_ON_INDEX_DELETION = Setting.boolSetting(
SEARCHABLE_SNAPSHOTS_DELETE_SNAPSHOT_ON_INDEX_DELETION,
false,
Setting.Property.IndexScope,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add "Final" too, in case the searchable snapshot is closed? Mostly a nit because it is private anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added Final too but I also implemented a check that prevents to change the value of this setting when restoring a searchable snapshots index (as Final does not prevent changes at restore time)

}

private static Settings deleteSnapshotIndexSettings(boolean value) {
return Settings.builder().put(SEARCHABLE_SNAPSHOTS_DELETE_SNAPSHOT_ON_INDEX_DELETION, value).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is having to use this setting in the API an interim state, with a follow-up PR adding a direct flag in the API?

Otherwise, I think the setting should not be private if the intention is that external users should set this flag using the mount call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is having to use this setting in the API an interim state, with a follow-up PR adding a direct flag in the API?

I planned to use the setting in the Mount API and not introducing a flag in the API.

I still think it can be private - in a follow up I'd like to always set this setting on indices mounted in 8.0+.

@@ -110,4 +117,130 @@ public void testRepositoryUsedBySearchableSnapshotCanBeUpdatedButNotUnregistered

assertAcked(clusterAdmin().prepareDeleteRepository(updatedRepositoryName));
}

public void testMountIndexWithDeletionOfSnapshotFailsIfNotSingleIndexSnapshot() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that demonstrates that restoring a searchable snapshot cannot be done if its delete_searchable_snapshot flag is in conflict with existing searchable snapshots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I pushed more tests in ff0f834

@tlrx
Copy link
Member Author

tlrx commented Jul 12, 2021

@original-brownbear would you mind having a look at this change when you have time? Thanks a lot

tlrx added a commit that referenced this pull request Jul 13, 2021
Today if we try to shrink or to split a searchable snapshot 
index using the Resize API a new index will be created 
but can't be assigned, and even if it was assigned it won't 
work as the number of shards can't be changed and must 
always match the number of shards from the snapshot.

This commit adds some verification to prevent a snapshot 
backed indices to be resized and if an attempt is made, 
throw a better error message.

Note that cloning is supported since #56595 and in this 
change we make sure that it is only used to convert the 
searchable snapshot index back to a regular index.

Relates #74977 (comment)
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jul 13, 2021
Today if we try to shrink or to split a searchable snapshot
index using the Resize API a new index will be created
but can't be assigned, and even if it was assigned it won't
work as the number of shards can't be changed and must
always match the number of shards from the snapshot.

This commit adds some verification to prevent a snapshot
backed indices to be resized and if an attempt is made,
throw a better error message.

Note that cloning is supported since elastic#56595 and in this
change we make sure that it is only used to convert the
searchable snapshot index back to a regular index.

Relates elastic#74977 (comment)
Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Tanguy + sorry again for the delay here!

elasticsearchmachine pushed a commit that referenced this pull request Jul 13, 2021
)

Today if we try to shrink or to split a searchable snapshot
index using the Resize API a new index will be created
but can't be assigned, and even if it was assigned it won't
work as the number of shards can't be changed and must
always match the number of shards from the snapshot.

This commit adds some verification to prevent a snapshot
backed indices to be resized and if an attempt is made,
throw a better error message.

Note that cloning is supported since #56595 and in this
change we make sure that it is only used to convert the
searchable snapshot index back to a regular index.

Relates #74977 (comment)
@tlrx
Copy link
Member Author

tlrx commented Jul 15, 2021

Thanks Henning and Armin!

@tlrx tlrx merged commit 935a388 into elastic:master Jul 15, 2021
@tlrx tlrx deleted the add-setting-to-delete-searchable-snapshot-on-index-deletion branch July 15, 2021 10:22
masseyke pushed a commit to masseyke/elasticsearch that referenced this pull request Jul 16, 2021
…snapshots (elastic#74977)

Today there is no relationship between the lifecycle of a 
snapshot mounted as an index and the lifecycle of index 
itself. This lack of relationship makes it possible to delete 
a snapshot in a repository while the mounted index still 
exists, causing the shards to fail in the future.

On the other hand creating a snapshot that contains a 
single index to later mount it as a searchable snapshot 
index becomes more and more natural for users and 
ILM; but it comes with the risk to forget to delete the
 snapshot when the searchable snapshot index is 
deleted from the cluster, "leaking" snapshots in 
repositories.

We'd like to improve the situation and provide a way 
to automatically delete the snapshot when the mounted 
index is deleted. To opt in for this behaviour, a user has 
to enable a specific index setting when mounting the 
snapshot (the proposed name for the setting is 
index.store.snapshot.delete_searchable_snapshot). 

Elasticsearch then verifies that the snapshot to mount 
contains only 1 snapshotted index and that the snapshot 
is not used by another mounted index with a different 
value for the opt in setting, and mounts the snapshot 
with the new private index setting. 

This is the part implemented in this commit.

In follow-up pull requests this index setting will be used 
when the last mounted index is deleted and removed 
from the cluster state in order to add the searchable 
snapshot id to a list of snapshots to delete in the 
repository metadata. Snapshots that are marked as 
"to delete" will not be able to be restored or cloned, and 
Elasticsearch will take care of deleting the snapshot as 
soon as possible. Then ILM will be changed to use this 
setting when mounting a snapshot as a cold or frozen 
index and delete_searchable_snapshot option in ILM 
will be removed. Finally, deleting a snapshot that is 
still used by mounted indices will be prevented.
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request Jul 30, 2021
…snapshots (elastic#74977)

Today there is no relationship between the lifecycle of a 
snapshot mounted as an index and the lifecycle of index 
itself. This lack of relationship makes it possible to delete 
a snapshot in a repository while the mounted index still 
exists, causing the shards to fail in the future.

On the other hand creating a snapshot that contains a 
single index to later mount it as a searchable snapshot 
index becomes more and more natural for users and 
ILM; but it comes with the risk to forget to delete the
 snapshot when the searchable snapshot index is 
deleted from the cluster, "leaking" snapshots in 
repositories.

We'd like to improve the situation and provide a way 
to automatically delete the snapshot when the mounted 
index is deleted. To opt in for this behaviour, a user has 
to enable a specific index setting when mounting the 
snapshot (the proposed name for the setting is 
index.store.snapshot.delete_searchable_snapshot). 

Elasticsearch then verifies that the snapshot to mount 
contains only 1 snapshotted index and that the snapshot 
is not used by another mounted index with a different 
value for the opt in setting, and mounts the snapshot 
with the new private index setting. 

This is the part implemented in this commit.

In follow-up pull requests this index setting will be used 
when the last mounted index is deleted and removed 
from the cluster state in order to add the searchable 
snapshot id to a list of snapshots to delete in the 
repository metadata. Snapshots that are marked as 
"to delete" will not be able to be restored or cloned, and 
Elasticsearch will take care of deleting the snapshot as 
soon as possible. Then ILM will be changed to use this 
setting when mounting a snapshot as a cold or frozen 
index and delete_searchable_snapshot option in ILM 
will be removed. Finally, deleting a snapshot that is 
still used by mounted indices will be prevented.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants