-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: provide a way for replicas to re-enter the quota pool #82403
Comments
cc @cockroachdb/replication |
This is related to this issue which I was about to file but will simply put here: Is your feature request related to a problem? Please describe. In the context of #79215, we are thinking about overloaded stores and replicas that are unable to keep up with the incoming raft log due to IO overload. When such a situation occurs, ultimately in the long term we hope that the allocator can find a way to move load off that store and solve the problem. But this may not always be possible, and if it is possible, it may take some time. What happens in the meantime? We have three options:
We currently do some average of 1) and 3) through the quota pool if the traffic is driven primarily through one range. There is a certain amount of quota available and it is only returned when all followers have acked a proposal; if a follower slows down and falls further and further behind, quota will eventually run out, and we will (choppily) wait for the follower to ack something, then append again, etc; so it looks a lot like a dumb version of 3).
Whatever end state we work towards (see #79755), it will require determining, for a remote store, a signal on how much replication traffic it can take in, so that the cluster can shape the traffic to the store accordingly. Describe the solution you'd like Unclear, as it depends a lot on #79755 and the preliminary steps in #79215, which includes a lot of experimentation. But one approach that has been discussed quite a bit and is roughly in line with how admission control for IO works locally is to negotiate with the remote store's admission control system a throughput rate that the remote considers acceptable (for the near future) and to shape aggregate append traffic to that node according to that target throughput. This raises the question of what happens if the target throughput rate is "very low". In such a case, likely a good strategy is to treat the follower as unavailable (as long as quorum requirements allow this on a case-by-case basis). Cut-offs like this are annoying. Describe alternatives you've considered Additional context Somewhat related: #77035 |
This issue is obsolete if we disable/remove the quota pool, x-ref #106063 |
Describe the problem
Replicas can get "ignored" by the quota pool under certain conditions. This primarily happens when a node restarts, as then we want to avoid stalling foreground traffic until the follower has caught up. However, there is no guarantee that the follower will ever catch up.
To Reproduce
I haven't actually done this, but if you take a write-heavy workload with a few hot ranges, take down a node for 1-2 minutes, then bring it up in a state in which it is slightly underprovisioned for the workload, it should forever lag behind, and the quota pool will not be helping it catch up.
Expected behavior
Hard to formulate! There are different regimes. If a follower is behind and is "hopelessly slow", foreground traffic shouldn't slow down in response to it (see #79215). But if it's only marginally slower (making "good progress"), and perhaps slower only because it is a read-only satellite in a faraway region, etc, we need to slowly bring it back into circulation or AOST reads on this replica will fail forever (and, if it's a voter, availability will remain compromised forever since the replica has to catch up before it can make forward progress).
Additional data / screenshots
Environment:
Additional context
I'm not sure we have struggled with this in practice, but it is a legitimate concern and becomes more important if, for #79215, take an approach where the quota pool "temporarily" ignores followers that are overloaded (and stops sending appends to them). These nodes will "intentionally" fall behind but nothing will ensure that they catch up when they have become healthy.
Jira issue: CRDB-16355
Epic CRDB-39898
The text was updated successfully, but these errors were encountered: