kvserver: provide a way for replicas to re-enter the quota pool #82403

tbg · 2022-06-03T14:27:13Z

Describe the problem

Replicas can get "ignored" by the quota pool under certain conditions. This primarily happens when a node restarts, as then we want to avoid stalling foreground traffic until the follower has caught up. However, there is no guarantee that the follower will ever catch up.

To Reproduce

I haven't actually done this, but if you take a write-heavy workload with a few hot ranges, take down a node for 1-2 minutes, then bring it up in a state in which it is slightly underprovisioned for the workload, it should forever lag behind, and the quota pool will not be helping it catch up.

Expected behavior

Hard to formulate! There are different regimes. If a follower is behind and is "hopelessly slow", foreground traffic shouldn't slow down in response to it (see #79215). But if it's only marginally slower (making "good progress"), and perhaps slower only because it is a read-only satellite in a faraway region, etc, we need to slowly bring it back into circulation or AOST reads on this replica will fail forever (and, if it's a voter, availability will remain compromised forever since the replica has to catch up before it can make forward progress).

Additional data / screenshots

Environment:

Additional context

I'm not sure we have struggled with this in practice, but it is a legitimate concern and becomes more important if, for #79215, take an approach where the quota pool "temporarily" ignores followers that are overloaded (and stops sending appends to them). These nodes will "intentionally" fall behind but nothing will ensure that they catch up when they have become healthy.

Jira issue: CRDB-16355

Epic CRDB-39898

blathers-crl · 2022-06-03T14:27:15Z

cc @cockroachdb/replication

tbg · 2022-06-03T14:38:44Z

This is related to this issue which I was about to file but will simply put here:

Is your feature request related to a problem? Please describe.

In the context of #79215, we are thinking about overloaded stores and replicas that are unable to keep up with the incoming raft log due to IO overload. When such a situation occurs, ultimately in the long term we hope that the allocator can find a way to move load off that store and solve the problem. But this may not always be possible, and if it is possible, it may take some time.

What happens in the meantime? We have three options:

let it run its course, ie append to the store anyway, thus overloading it more over time.
don't append to it.
slow down the foreground traffic to match what the overloaded store can support (i.e. some version of distributed admission control)

We currently do some average of 1) and 3) through the quota pool if the traffic is driven primarily through one range. There is a certain amount of quota available and it is only returned when all followers have acked a proposal; if a follower slows down and falls further and further behind, quota will eventually run out, and we will (choppily) wait for the follower to ack something, then append again, etc; so it looks a lot like a dumb version of 3).
If there are many active ranges, no individual quota pool may ever deplete, so it's closer to 1).

causes all kinds of memory blowup and also severely degrades the node and in particular disrupts service on ranges for which that slow store holds the lease.

Whatever end state we work towards (see #79755), it will require determining, for a remote store, a signal on how much replication traffic it can take in, so that the cluster can shape the traffic to the store accordingly.

Describe the solution you'd like

Unclear, as it depends a lot on #79755 and the preliminary steps in #79215, which includes a lot of experimentation. But one approach that has been discussed quite a bit and is roughly in line with how admission control for IO works locally is to negotiate with the remote store's admission control system a throughput rate that the remote considers acceptable (for the near future) and to shape aggregate append traffic to that node according to that target throughput.

This raises the question of what happens if the target throughput rate is "very low". In such a case, likely a good strategy is to treat the follower as unavailable (as long as quorum requirements allow this on a case-by-case basis).

Cut-offs like this are annoying.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

Somewhat related: #77035

tbg · 2023-07-03T14:12:37Z

This issue is obsolete if we disable/remove the quota pool, x-ref #106063

tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication labels Jun 3, 2022

tbg mentioned this issue Jun 3, 2022

kvserver: throttle writes on followers #79215

Closed

irfansharif added the A-admission-control label Jun 20, 2022

tbg mentioned this issue Jun 21, 2022

admission: experiment with quota pool awareness of IO overload on remote stores #82132

Closed

nicktrav added the T-admission-control Admission Control label Dec 21, 2023

sumeerbhola mentioned this issue May 2, 2024

admission,replication: replication admission control for regular work #123509

Open

26 tasks

exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication T-admission-control Admission Control labels Jun 28, 2024

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Incoming in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: provide a way for replicas to re-enter the quota pool #82403

kvserver: provide a way for replicas to re-enter the quota pool #82403

tbg commented Jun 3, 2022 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Jun 3, 2022

tbg commented Jun 3, 2022 •

edited

Loading

tbg commented Jul 3, 2023

kvserver: provide a way for replicas to re-enter the quota pool #82403

kvserver: provide a way for replicas to re-enter the quota pool #82403

Comments

tbg commented Jun 3, 2022 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Jun 3, 2022

tbg commented Jun 3, 2022 • edited Loading

tbg commented Jul 3, 2023

tbg commented Jun 3, 2022 •

edited by exalate-issue-sync bot

Loading

tbg commented Jun 3, 2022 •

edited

Loading