Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Functionality to Consistently Read RepositoryData For CS Updates #55773

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Apr 26, 2020

Using (sort of) optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Fixes issues where deleting a snapshot is enqueued based on an outdated repository generation due to concurrent snapshot or delete operations.

Allows for some follow-ups:

  1. Removing the snapshot INIT state because its only use was adding the snapshot to the cluster state so that no concurrent changes to the repository could occur before reading the repository data.
    • This way allows simplifying the abort state machine as well as master fail-over handling. We have a lot of complicated logic for handling INIT state corner case handling in SnapshotsService that can all nicely go away in a follow-up using this change.
  2. Allow deleting multiple snapshots at a time to include currently running snapshots.

Closes #55702

Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot init state.

Closes elastic#55702
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

if (snapshotIds.isEmpty()) {
listener.onResponse(null);
return;
return new ClusterStateUpdateTask() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little less ergonomic than I would've liked but I found it easiest to produce a CS update task in the Repository API compared to other mechanism because the existing code is so heavily built on using state in CS update tasks to do logic between execute and clusterStateProcessed, so I figured doing it this way and in the repository allows for the follow-ups to be relatively isolated changes.

@Override
public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
Consumer<Exception> onFailure) {
threadPool.generic().execute(new AbstractRunnable() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could technically be done quite a bit more efficiently by not forking off to the generic pool when we just get cached repository data, but I figured this is good enough and I can back-port it to 7.x without trouble because even in a mixed 7.x/6.x cluster, a 7.x master node will always update the repo metadata when writing new repository data so this simple solution avoids dealing with all the details of the org.elasticsearch.repositories.blobstore.BlobStoreRepository#bestEffortConsistency flag.

@original-brownbear original-brownbear changed the title Add Functionality to Consistently Update RepositoryData Add Functionality to Consistently Read RepositoryData For CS Updates Apr 26, 2020
@@ -738,6 +739,63 @@ public void onFailure(Exception e) {
assertEquals(0, snapshotInfo.failedShards());
}

public void testConcurrentDeletes() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this test to cover the scenario of concurrent deletes causing the repository data that a delete is based on to be outdated in addition to the coverage we have for concurrent snapshot + delete

* @param createUpdateTask function to supply cluster state update task
* @param onFailure error handler invoked on failure to get a consistent view of the current {@link RepositoryData}
*/
void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask, Consumer<Exception> onFailure);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With more and more cluster state related logic leaking into the repository code, I think we should just eventually refactor things such that the repository only deals with reading and writing data in some form (obviously lots of tricky details here around source only repos and S3 waiting and whatnot) and completely delegate the responsibility for this kind of magic to either the SnapshotsService or some other component. But until then this is basically the method we want to make life a lot less complicated in the SnapshotsService

@Override
public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
Consumer<Exception> onFailure) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw UnsupportedOperation here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

repositoriesService.repository(repositoryName).getRepositoryData(ActionListener.wrap(repositoryData ->
deleteCompletedSnapshots(matchingSnapshotIds(repositoryData, snapshotNames, repositoryName),
repositoryName, repositoryData.getGenId(), Priority.NORMAL, l), l::onFailure))));
repositoriesService.repository(repositoryName).executeConsistentStateUpdate(repositoryData ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could repositoriesService.repository(repositoryName) throw an exception here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close to impossible but I suppose there is a slim chance of concurrently removing the repository here so I added a catch for that :)

@@ -330,6 +330,60 @@ protected void doClose() {
}
}

@Override
public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using ClusterStateUpdateTask in a weird way, directly calling some methods on it, but ignoring other things (e.g. priority / listeners?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is a little less ergonomic than I'd like it to be. Though, practically speaking the priority is not and won't be used anywhere (as in the delete which runs at normal and all upcoming use-cases of this) anyway so I decided to ignore it.
The whole listeners thing I think I got correct though? Since clusterStatePublished is final and empty in ClusterStateUpdateTask and we don't use onNoLongerMaster yet in the task passed I figured why add dead code?

protected void doRun() {
final RepositoryMetadata repositoryMetadataStart = metadata;
getRepositoryData(ActionListener.wrap(repositoryData ->
clusterService.submitStateUpdateTask("consistent state update", new ClusterStateUpdateTask() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are just protecting against concurrent changes done on the same node, I wonder if we can avoid calling getRepositoryData here for every caller of executeConsistentStateUpdate, or do some sharing between multiple callers of this method. In particular, I'm worried about situations where it takes a long time for the norma-priority CS update task below to be executed, and where stuff might have been invalidated again and again (think e.g. about a misconfigured deletion policy). The task here also does not seem to support timeouts? (This makes me also notice that deleteCompletedSnapshots should probably use the timeout from the DeleteSnapshotRequest).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are just protecting against concurrent changes done on the same node

Practically yes, theoretically no I think it could be across nodes? Technically we could have a delete along the lines of:

  1. Master reads repository data and then gets stuck for 30s or whatever
  2. New master takes over and completes a repository operation then dies
  3. Master from step 1 is back ...

Not likely, but possible and a scenario that SnapshotResiliencyTests could cover (they currently don't but might be worth adding).

I'm worried about situations where it takes a long time for the norma-priority CS update task below to be executed, and where stuff might have been invalidated again and again (think e.g. about a misconfigured deletion policy)

I'm not sure this is all that likely in practice since we keep loading the RepositoryData from cache and currently a delete does 4 state updates of which only two modify the repository metadata? So I would imagine that practically even if you somehow send a barrage of delete requests those will mostly work out fine just because it's so unlikely for these back to back repo metadata updates to go through that quickly (they will always require at least one additional update to the cluster state before). Not saying I'm against optimizing this (always looking for ways to not have to load too many RepositoryData instances on heap) but it seemed not strictly necessary for now (that's why I wrote https://github.com/elastic/elasticsearch/pull/55773/files#r415345654 , we can probably already way optimize this by exploiting the way the cache works).

The task here also does not seem to support timeouts? (This makes me also notice that deleteCompletedSnapshots should probably use the timeout from the DeleteSnapshotRequest).

Right we haven't been using those timeouts in deletes ever I think ... maybe something to fix in a follow-up since it'll require a few changes here and there because we currently don't pass the timeout from the request to the SnapshotsService?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #55798 for the timeout thing (was a small change after all ...)

Copy link
Member Author

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All addressed I think. Let me know if you want to make it more efficient in this iteration already :)

@@ -330,6 +330,60 @@ protected void doClose() {
}
}

@Override
public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is a little less ergonomic than I'd like it to be. Though, practically speaking the priority is not and won't be used anywhere (as in the delete which runs at normal and all upcoming use-cases of this) anyway so I decided to ignore it.
The whole listeners thing I think I got correct though? Since clusterStatePublished is final and empty in ClusterStateUpdateTask and we don't use onNoLongerMaster yet in the task passed I figured why add dead code?

repositoriesService.repository(repositoryName).getRepositoryData(ActionListener.wrap(repositoryData ->
deleteCompletedSnapshots(matchingSnapshotIds(repositoryData, snapshotNames, repositoryName),
repositoryName, repositoryData.getGenId(), Priority.NORMAL, l), l::onFailure))));
repositoriesService.repository(repositoryName).executeConsistentStateUpdate(repositoryData ->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close to impossible but I suppose there is a slim chance of concurrently removing the repository here so I added a catch for that :)

protected void doRun() {
final RepositoryMetadata repositoryMetadataStart = metadata;
getRepositoryData(ActionListener.wrap(repositoryData ->
clusterService.submitStateUpdateTask("consistent state update", new ClusterStateUpdateTask() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are just protecting against concurrent changes done on the same node

Practically yes, theoretically no I think it could be across nodes? Technically we could have a delete along the lines of:

  1. Master reads repository data and then gets stuck for 30s or whatever
  2. New master takes over and completes a repository operation then dies
  3. Master from step 1 is back ...

Not likely, but possible and a scenario that SnapshotResiliencyTests could cover (they currently don't but might be worth adding).

I'm worried about situations where it takes a long time for the norma-priority CS update task below to be executed, and where stuff might have been invalidated again and again (think e.g. about a misconfigured deletion policy)

I'm not sure this is all that likely in practice since we keep loading the RepositoryData from cache and currently a delete does 4 state updates of which only two modify the repository metadata? So I would imagine that practically even if you somehow send a barrage of delete requests those will mostly work out fine just because it's so unlikely for these back to back repo metadata updates to go through that quickly (they will always require at least one additional update to the cluster state before). Not saying I'm against optimizing this (always looking for ways to not have to load too many RepositoryData instances on heap) but it seemed not strictly necessary for now (that's why I wrote https://github.com/elastic/elasticsearch/pull/55773/files#r415345654 , we can probably already way optimize this by exploiting the way the cache works).

The task here also does not seem to support timeouts? (This makes me also notice that deleteCompletedSnapshots should probably use the timeout from the DeleteSnapshotRequest).

Right we haven't been using those timeouts in deletes ever I think ... maybe something to fix in a follow-up since it'll require a few changes here and there because we currently don't pass the timeout from the request to the SnapshotsService?

@Override
public void executeConsistentStateUpdate(Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
Consumer<Exception> onFailure) {
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more comments

protected void doRun() {
final RepositoryMetadata repositoryMetadataStart = metadata;
getRepositoryData(ActionListener.wrap(repositoryData ->
clusterService.submitStateUpdateTask("consistent state update", new ClusterStateUpdateTask() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"consistent state update". Can you pass in a more descriptive state update message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we add the timeout here now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure added the timeout, passing a source now + took the opportunity to also pass the task priority along now.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Thanks Yannick!

@original-brownbear original-brownbear merged commit c1fca12 into elastic:master Apr 28, 2020
@original-brownbear original-brownbear deleted the add-lock-mechanism-to-repo branch April 28, 2020 11:54
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 30, 2020
Making use of elastic#55773 to simplify snapshot state machine.
1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away
instead of doing a second CS update after the fist update was a NOOP.
2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot
and then move on to delete from the repository.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request May 4, 2020
…lastic#55773)

Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot INIT state.
original-brownbear added a commit that referenced this pull request May 4, 2020
…55773) (#56091)

Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot INIT state.
original-brownbear added a commit that referenced this pull request May 4, 2020
Making use of #55773 to simplify snapshot state machine.
1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away
instead of doing a second CS update after the fist update was a NOOP.
2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot
and then move on to delete from the repository.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request May 4, 2020
Making use of elastic#55773 to simplify snapshot state machine.
1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away
instead of doing a second CS update after the fist update was a NOOP.
2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot
and then move on to delete from the repository.
original-brownbear added a commit that referenced this pull request May 4, 2020
Making use of #55773 to simplify snapshot state machine.
1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away
instead of doing a second CS update after the fist update was a NOOP.
2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot
and then move on to delete from the repository.
original-brownbear added a commit that referenced this pull request May 5, 2020
With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine.

This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request May 18, 2020
With elastic#55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine.

This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 12, 2020
With elastic#55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine.

This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.
original-brownbear added a commit that referenced this pull request Jul 13, 2020
With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine.

This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.
@original-brownbear original-brownbear restored the add-lock-mechanism-to-repo branch August 6, 2020 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SnapshotResiliencyTests.testConcurrentSnapshotCreateAndDeleteOther failure
4 participants