You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Numerous customer problems have been related to snapshot delivery. These include multiple different aspects:
Slowdown of foreground traffic as a result of snapshots being sent too aggressively
Snapshots not effectively using the resources on a mostly idle system.
If the replicate queue is busy transferring a lot of snapshots, it won't be able to hit its timer goal (of 10 minutes). All (?)other operations on the replicate queue are fast, so snapshot transfer is unique.
With naive snapshot delegation, this problem may get worse due to less visibility into the length of sender queues.
Describe the solution you'd like
Add an additional queue for snapshots to the replicate queue. Once we "decide" to send a snapshot, it would go into this queue and run async. From the perspective of the rest of the replicate queue, it would finish instantly. The snapshots would internally be prioritized differently that normal things on the replicate queue and paced to minimize impact on user throughput. In the case of an operation like decommissioning, there will be lots of replicas on the queue all at once. This is an unbounded queue and will not block other things on there. Additionally every 10 minutes the ones that are not done will be "re-added" but this should not matter in practice, and is necessary because they may also be added for a different reason (or change because of things like range splits/merges). The queue will need to be efficient from a perspective of additions so that this can be done well.
Describe alternatives you've considered
There are a few other alternatives to this proposal. One option is to push the queuing logic lower, either to the send snapshot command on the replica command, or possibly all the way to the receiver. If there was not already a replicate queue that handled almost all snapshot sending, this may be a better approach. Having two queues (one at the replicate layer and one at the send snapshot layer) may cause unnecessary queueing delays.
Additional context
This work is behind #82539 which is required to make almost all system-generated snapshots use the replicate queue for sending snapshots.
Additionally, there is a PR to add a delegate for senders. This does not directly impact this work, but once that is merged and enabled, it may make the current problems worse as there is less visibility into what the delegate is doing.
Other work that has been considered is giving users more knobs to control snapshots: #81953, but would be complex for an end user to tune correctly.
There are two remaining tasks that do not go through the replicate queue.
Merge queue - Moving replicas to align them before a merge operation. This could/should change to use the replicate queue as well. This would be a little more complicated as the request comes externally and the merge is blocked waiting for this to occur. This likely does not have to be addressed by this change, but should be a follow-on.
SQL commands - both ALTER RANGE ... RELOCATE and ALTER INDEX/TABLE ... RELOCATE directly move the ranges. As an end user is waiting on these commands, it is appropriate to have these bypass any queues and execute directly.
Is your feature request related to a problem? Please describe.
Numerous customer problems have been related to snapshot delivery. These include multiple different aspects:
With naive snapshot delegation, this problem may get worse due to less visibility into the length of sender queues.
Describe the solution you'd like
Add an additional queue for snapshots to the replicate queue. Once we "decide" to send a snapshot, it would go into this queue and run async. From the perspective of the rest of the replicate queue, it would finish instantly. The snapshots would internally be prioritized differently that normal things on the replicate queue and paced to minimize impact on user throughput. In the case of an operation like decommissioning, there will be lots of replicas on the queue all at once. This is an unbounded queue and will not block other things on there. Additionally every 10 minutes the ones that are not done will be "re-added" but this should not matter in practice, and is necessary because they may also be added for a different reason (or change because of things like range splits/merges). The queue will need to be efficient from a perspective of additions so that this can be done well.
Describe alternatives you've considered
There are a few other alternatives to this proposal. One option is to push the queuing logic lower, either to the send snapshot command on the replica command, or possibly all the way to the receiver. If there was not already a replicate queue that handled almost all snapshot sending, this may be a better approach. Having two queues (one at the replicate layer and one at the send snapshot layer) may cause unnecessary queueing delays.
Additional context
This work is behind #82539 which is required to make almost all system-generated snapshots use the replicate queue for sending snapshots.
Additionally, there is a PR to add a delegate for senders. This does not directly impact this work, but once that is merged and enabled, it may make the current problems worse as there is less visibility into what the delegate is doing.
Other work that has been considered is giving users more knobs to control snapshots: #81953, but would be complex for an end user to tune correctly.
There are additional notes here (likely all this will get merged into this note). https://cockroachlabs.atlassian.net/wiki/spaces/~6268113f52310b0068ffd245/pages/2619998431/Raft+Snapshots
There are two remaining tasks that do not go through the replicate queue.
ALTER RANGE ... RELOCATE
andALTER INDEX/TABLE ... RELOCATE
directly move the ranges. As an end user is waiting on these commands, it is appropriate to have these bypass any queues and execute directly.Jira issue: CRDB-17665
The text was updated successfully, but these errors were encountered: