Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making snapshot deletes distributed across data nodes. #39656

Closed
wants to merge 1 commit into from
Closed

Making snapshot deletes distributed across data nodes. #39656

wants to merge 1 commit into from

Conversation

backslasht
Copy link
Contributor

Unlike snapshot creation operation, snapshot deletion operation is executed
in master. This is acceptable for small to medium sized clusters where the
delete operation completes in less than a minute. But, for bigger clusters with
few thousand primary shards, the deletion operation takes considerable amount of
time as there is only one thread working on deleting the snapshot data index by
index, shard by shard.

To reduce the time spent on deletion, parallelizing (distributing) the snapshot deletes
across all data nodes similar to snapshot creation but with no tight coupling between
data node and snapshot shards that are getting deleted.

With this changes, for a cluster with 3 dedicated masters and 20 data nodes and 20,000
primary shards, the deletion took around 1 minute where as without this change it took
around 70 minutes.

Unlike snapshot creation operation, snapshot deletion operation is executed
in master. This is acceptable for small to medium sized clusters where the
delete operation completes in less than a minute. But, for bigger clusters with
few thousand primary shards, the deletion operation takes considerable amount of
time as there is only one thread working on deleting the snapshot data index by
index, shard by shard.

To reduce the time spent on deletion, parallelizing (distributing) the snapshot deletes
across all data nodes similar to snapshot creation but with no tight coupling between
data node and snapshot shards that are getting deleted.

With this changes, for a cluster with 3 dedicated masters and 20 data nodes and 20,000
primary shards, the deletion took around 1 minute where as without this change it took
around 70 minutes.
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that deletes are processed sequentially on the master. I don't think we should add a whole new distributed layer for this, but just look into doing some of these calls concurrently on the master.

@backslasht before working on such a large item, it's best to open an issue first to discuss the solution.

@ywelsch ywelsch added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Mar 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@backslasht
Copy link
Contributor Author

backslasht commented Mar 5, 2019

@ywelsch - Like you mentioned, I initially started with making calls concurrent on the master. It does help a little but still the time taken in quite high.

Let us a take following setup as an example.
Dedicated master available: Yes
Number of Data Nodes: 100
Number of Primary Shards: 10,000
Time taken to compute list of shards = y seconds (say 10 for example)
Average time to delete a snapshot of a shard = x seconds (say 5 for example)

  1. Deletion time with single thread (as it exists today)
    -> y + 10000 * x

  2. Deletion time with parallelization in master (5 threads) - ** 5 times improvement over 1**
    -> y + (10000/5) * x
    -> y + 2000 * x

  3. Deletion time with multiple threads across data nodes (proposed change) - ** 500 times improvement over 1 and 100 times improvement over 2**
    -> y + ((10000/100)/5) * x
    -> y +( 100/5) * x
    -> y + 20 * x

So, the proposed change makes snapshot delete operation time as a function of number of data nodes and number of primary shards similar to snapshot creation. The deletion time will remain the same with increase in data nodes (and corresponding increase in shards).

@ywelsch
Copy link
Contributor

ywelsch commented Mar 7, 2019

I initially started with making calls concurrent on the master. It does help a little but still the time taken in quite high ...

That's a bit of an odd comparison. It all depends on how many concurrent calls you globally do. Whether the calls are done from a single node or multiple nodes does not matter given that the requests are very small and latency seems to be the determining factor.

We had a team discussion on this implementation and think it's too complex, which is why I'm closing this PR. We still think this is a problem worth tackling, though, but are looking at other ways of achieving this without building a full distributed task queue. Two options we're looking at is 1) parallelizing deletes on the master, possibly using async APIs, and 2) using repository-specific options (e.g. bulk-deletions on S3).

@ywelsch ywelsch closed this Mar 7, 2019
original-brownbear added a commit that referenced this pull request Apr 5, 2019
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. 

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. 
* See #39656 (comment)
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 26, 2019
Motivated by slow snapshot deletes reported in e.g. elastic#39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See elastic#39656 (comment)
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in elastic#39657
original-brownbear added a commit that referenced this pull request Apr 26, 2019
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See #39656 (comment)
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Motivated by slow snapshot deletes reported in e.g. elastic#39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. 

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. 
* See elastic#39656 (comment)
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in elastic#39657
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs team-discuss
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants