Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send Raft snapshots between follower replicas #42491

Closed
nvanbenschoten opened this issue Nov 14, 2019 · 6 comments · Fixed by #83991
Closed

Send Raft snapshots between follower replicas #42491

nvanbenschoten opened this issue Nov 14, 2019 · 6 comments · Fixed by #83991
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Nov 14, 2019

Raft currently requires all snapshots to come from leader replicas. This generally works well and is straightforward to think about.

However, this does lead to restrictions in flexibility which would be beneficial for certain classes of replica movement. Specifically, this limits flexibility in cases where there is an asymmetry between the leader replica, an up-to-date follower replica, and a follower replica that requires a snapshot. In the cases where the follower replicas as closer together than the leader replica, it would be beneficial to be able to source a snapshot from the up-to-date follower replica. Here are three concrete cases where this would be important:

  1. replica movement within a region in a cross-region replication group.
  2. rehoming an entire replication group between regions in a cross-AZ replication group where the destination region already contains a learner replica.
  3. replica movement within a single host across disks.

The "follower snapshots" would allow the first two of these cases to avoid WAN traffic, opting for faster intra-region traffic. It would also be a general enough mechanism to allow for the third case to avoid traversing the network stack at all, opting for filesystem-level data movement instead.

This might make a good intern project sometime in the future.

Epic: CRDB-5354

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Nov 14, 2019
@luojiebin
Copy link

Hi, I would like to work on this issue. Should I write a RFC to describe my idea first? Or Should I just start off to write code?

@nvanbenschoten
Copy link
Member Author

Hi @luojiebin, thanks for the interest in Cockroach! I think an RFC is a good place to start off for this change. We'll want to make sure we're all on the same page before hitting the code. There are also going to be some delicate edge cases here that would be better to resolve ahead of time.

@luojiebin
Copy link

Hi @nvanbenschoten, I am sorry that I don't have enough time to work on this issue in the next 5 months for some reason. It's ok that if anyone else would like to work on this. I am really appreciate cockroachdb and hope I can help in the future.

@nvanbenschoten
Copy link
Member Author

That's not a problem @luojiebin. Thanks for keeping us in the loop about your progress. I wish you the best and we'd love to work with you again in the future!

@nvanbenschoten
Copy link
Member Author

@tbg why the change in labels here? This lives in the replication layer, doesn't it?

@tbg
Copy link
Member

tbg commented Sep 27, 2021

I'm not sure. The key change to make here is to determine who sends the snapshot. We've pulled these decisions into the distribution layer, so I was expecting ~zero work here at the replication layer. I'll put both for now.

@tbg tbg added the A-kv-replication Relating to Raft, consensus, and coordination. label Sep 27, 2021
@amygao9 amygao9 self-assigned this Feb 15, 2022
amygao9 added a commit to amygao9/cockroach that referenced this issue Mar 18, 2022
This commit adds a new rpc stream for sending raft message requests between
replicas which allows for delegating snapshots. Currently this patch implements
the leaseholder delegating to itself, but in future patches the leaseholder
will be able to delegate snapshot sending to follower replicas. A new message
request type of `DelegatedSnapshotRequest` includes a `SnapshotRequest` and the
replica descriptor of the new sender replica. This allows the leaseholder to
fill in some snapshot metadata before delegating to the new sender store to
generate the snapshot and transmit it to the recipient.

Fixes: cockroachdb#42491
Release note: None

Release justification:
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Dec 30, 2022
Fixes: cockroachdb#42491

This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot.  If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.

By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.

There are two settings that control this feature. The first,
`kv.snapshot_delegation.num_follower`, controls how many followers
the snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
`kv.snapshot_delegation_queue.enabled`, controls whether delegated
snapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.

Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.

Release note (performance improvement): Adds delegated snapshots which
can reduce WAN traffic for snapshot movement.
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Jan 27, 2023
Fixes: cockroachdb#42491

This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot.  If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.

By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.

There are two settings that control this feature. The first,
`kv.snapshot_delegation.num_follower`, controls how many followers
the snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
`kv.snapshot_delegation_queue.enabled`, controls whether delegated
snapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.

Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.

Release note (performance improvement): Adds delegated snapshots which
can reduce WAN traffic for snapshot movement.
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Jan 31, 2023
Fixes: cockroachdb#42491

This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot.  If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.

By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.

There are two settings that control this feature. The first,
`kv.snapshot_delegation.num_follower`, controls how many followers
the snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
`kv.snapshot_delegation_queue.enabled`, controls whether delegated
snapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.

Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.

Release note (performance improvement): Adds delegated snapshots which
can reduce WAN traffic for snapshot movement.
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Feb 1, 2023
Fixes: cockroachdb#42491

This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot.  If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.

By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.

There are two settings that control this feature. The first,
`kv.snapshot_delegation.num_follower`, controls how many followers
the snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
`kv.snapshot_delegation_queue.enabled`, controls whether delegated
snapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.

Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.

Release note (performance improvement): Adds delegated snapshots which
can reduce WAN traffic for snapshot movement.
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Feb 2, 2023
Fixes: cockroachdb#42491

This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot.  If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.

By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.

There are two settings that control this feature. The first,
`kv.snapshot_delegation.num_follower`, controls how many followers
the snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
`kv.snapshot_delegation_queue.enabled`, controls whether delegated
snapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.

Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.

Release note (performance improvement): Adds delegated snapshots which
can reduce WAN traffic for snapshot movement. If there is another
replica for this range with a closer locality than the delegate, the
leaseholder will attempt to have that delegate send the snapshot. This
is particularly useful in the case of a decommission of a node where
most snapshots are transferred to another replica in the same locality.
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Feb 2, 2023
Fixes: cockroachdb#42491

This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot.  If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.

By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.

There are two settings that control this feature. The first,
`kv.snapshot_delegation.num_follower`, controls how many followers
the snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
`kv.snapshot_delegation_queue.enabled`, controls whether delegated
snapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.

Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.

Release note (performance improvement): Adds delegated snapshots which
can reduce WAN traffic for snapshot movement.
craig bot pushed a commit that referenced this issue Feb 3, 2023
83991: kv: enable delegate snapshots r=nvanbenschoten a=andrewbaptist

kvserver: delegate snapshots to followers

Fixes: #42491

This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot.  If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.

By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.

There are two settings that control this feature. The first,
`kv.snapshot_delegation.num_follower`, controls how many followers
the snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
`kv.snapshot_delegation_queue.enabled`, controls whether delegated
snapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.

Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.

Release note (performance improvement): Adds delegated snapshots which can reduce WAN traffic for snapshot movement.

Co-authored-by: Andrew Baptist <[email protected]>
@craig craig bot closed this as completed in 760aedb Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants