-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Redundant CS Update on Snapshot Finalization #55276
Remove Redundant CS Update on Snapshot Finalization #55276
Conversation
This change folds the removal of the in-progress snapshot entry into setting the safe repository generation. Outside of removing an unnecessary cluster state update, this also has the advantage of removing a somewhat inconsistent cluster state where the safe repository generation points at `RepositoryData` that contains a finished snapshot while it is still in-progress in the cluster state, making it easier to reason about the state machine of upcoming concurrent snapshot operations.
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@@ -189,8 +189,7 @@ public void clusterChanged(ClusterChangedEvent event) { | |||
final RepositoriesMetadata repoMeta = | |||
event.state().metadata().custom(RepositoriesMetadata.TYPE); | |||
final RepositoryMetadata metadata = repoMeta.repository("test-repo"); | |||
if (metadata.generation() == metadata.pendingGeneration() | |||
&& metadata.generation() > snapshotEntry.repositoryStateId()) { | |||
if (metadata.pendingGeneration() > snapshotEntry.repositoryStateId()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concrete situation that this test was covering is technically gone with this change, but I think it's reasonable to keep testing the very similar situation of master fail-over during the repository side of the finalization here now.
Jenkins run elasticsearch-ci/2 (some ML thing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left one comment about naming. I think that technically this change is ok, but I wonder about the separation of SnapshotsService and BlobStoreRepository (which is only one implementation of Repository), where it becomes less and less clear where the responibilities for each class lie.
@@ -1301,10 +1304,11 @@ public boolean isReadOnly() { | |||
* @param repositoryData RepositoryData to write | |||
* @param expectedGen expected repository generation at the start of the operation | |||
* @param writeShardGens whether to write {@link ShardGenerations} to the new {@link RepositoryData} blob | |||
* @param stateFilter filter for the last cluster state update executed by this method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a filter, rather a transformation function. Perhaps adapt naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed :)
+1 in theory. In practice, we're already exposing a lot of the specific internals of the blob store repository (e.g. shard generations are handled in snapshots-in-progress, repo generation, ...) so that ship seems to have sailed already. Also, practically speaking all our repos are eventually backed by a blob store repository for writing. Maybe it would make more sense to eventually adjust the repository interface to be more centered around just writing blobs and move all the state handling out of it completely. But for now I don't see a short way of enabling more concurrent operations without this kind of mixing of the code. |
Jenkins run elasticsearch-ci/1 (unrelated reindex failure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks Yannick! |
This change folds the removal of the in-progress snapshot entry into setting the safe repository generation. Outside of removing an unnecessary cluster state update, this also has the advantage of removing a somewhat inconsistent cluster state where the safe repository generation points at `RepositoryData` that contains a finished snapshot while it is still in-progress in the cluster state, making it easier to reason about the state machine of upcoming concurrent snapshot operations.
This change folds the removal of the in-progress snapshot entry into setting the safe repository generation. Outside of removing an unnecessary cluster state update, this also has the advantage of removing a somewhat inconsistent cluster state where the safe repository generation points at `RepositoryData` that contains a finished snapshot while it is still in-progress in the cluster state, making it easier to reason about the state machine of upcoming concurrent snapshot operations.
Same as elastic#55276 but for snapshot deletes. This change folds the removal of the snapshot delete state entry into the the safe generation step where possible. This measn that for repositories that write shard generations, the time the snapshot delte entry will stay in the cluster state will be shortened a lot and reduced to the time it takes to update the repository metadata. It is fully safe in this case to run other snapshot operations after the metadata. We can not do this for repositories that do not write shard generations so those need to go through a different path and submit a separate state update task still. Also, this PR fixes a problem with the cooldown period for S3 non-shard-generation repos introduced by elastic#55286. We can not run the state update outright in the repository because we enforce the cooldown via the listener wrapping. I fixed this by folding the final state update into the listener in this case.
We do the delete in three steps here: 1. Get the repository data and find the snapshot ids to delete 2. Put the delete entry in the CS 3. Run the actual delete The test `testConcurrentSnapshotCreateAndDeleteOther` was failing because between `1.` and `2.` a full snapshot completed moving the repository generation ahead by 1 which we chose to fail on because we expect the repository generation from step `1.` to still be there in step `3.`. In the past, using the repository generation from step `1.` made sense as a safety measure because any rapid increase in repository generation could have spelled trouble on eventually consistent blob stores. Nowadays, it's just needless to fail here though and we can simply rely on the generation we read from the repository in step `3.` to avoid ever passing a broken repository generation to the repository when deleting. NOTE: This exception was always a possibility but became massively more likely due to improved/faster snapshot finalization via elastic#55276 so it only showed up now. Closes elastic#55702
This change folds the removal of the in-progress snapshot entry
into setting the safe repository generation.
Outside of removing an unnecessary cluster state update, this also has the advantage
of removing a somewhat inconsistent cluster state where the safe repository generation points at
RepositoryData
that contains a finished snapshot while it is still in-progress in the clusterstate, making it easier to reason about the state machine of upcoming concurrent snapshot operations.
Note: We can do the same for snapshot deletion where it is a much more interesting change because most of the work in the delete (deleting all unreferenced blobs) can happen after the new index-N is written and does not require the delete-in-progress entry in the cluster state (assuming the repo is using shard-generations). I just wanted to start with the snapshot finalization since it's much easier to review and has no BwC implications.