Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all Snapshot Master Node Steps to SnapshotsService (#56365) #59373

Merged
merged 1 commit into from
Jul 12, 2020

Conversation

original-brownbear
Copy link
Member

This refactoring has three motivations:

  1. Separate all master node steps during snapshot operations from all data node steps in code.
  2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in SnapshotsService so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service).
    • This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons)
  3. Make the SnapshotsService less expensive on the CS thread for large snapshots

backport of #56365

This refactoring has three motivations:

1. Separate all master node steps during snapshot operations from all data node steps in code.
2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in `SnapshotsService` so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service).
    * This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons)
3. Make the `SnapshotsService` less expensive on the CS thread for large snapshots
@original-brownbear original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Jul 12, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 12, 2020
@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/2 (GCS port infra issue)

@original-brownbear original-brownbear merged commit 4833861 into elastic:7.x Jul 12, 2020
@original-brownbear original-brownbear deleted the 56365-7.x branch July 12, 2020 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants