admission: graceful degradation #82114

tbg · 2022-05-31T09:50:05Z

Is your feature request related to a problem? Please describe.

Today, the admission control subsystem protects a node "abruptly and severely". For example, for IO overload, as the L0 file or sublevel count slowly creeps towards the threshold (1000 and 20, respectively), no throttling will occur. As the threshold is crossed, admission control becomes active and throttles (and it does so in a way that doesn't necessarily minimize variance across requests, for example IO tokens are handed out at 1s intervals, so some requests may not be throttled at all and others for seconds at a time).

Describe the solution you'd like

CockroachDB should degrade gracefully: throttling should be introduced gently, as the thresholds are reached, and should throttle requests uniformly.

For example, for a kv0 workload in which the concurrency is ramped up over time, what we would hope to see is that p99s (of admission throttling over an interval) as a function of the concurrency (which is in 1:1 correspondence to targeted throughput) forms a smooth upward curve as opposed to a choppy step function.

Describe alternatives you've considered

Additional context

Related to #79215
Related to #81834

Jira issue: CRDB-16217

ajwerner · 2022-05-31T13:11:09Z

and should throttle requests uniformly.

I'm being nit-picky about the language here. I don't know that "uniformly" is a good goal, depending on the definition of "uniformly". Consider, in particular, long-running, multi-statement transactions. Imagine you want to be throttling 5% of your requests. If you do that uniformly at the KV layer, and each transaction involved 10 underlying kv requests, then 40% of your transactions will experience some throttling. You move up the latency curve much more quickly when issuing a series of requests each of which is exposed to independent throttling decisions. The discussion in https://www.cs.columbia.edu/~ruigu/papers/socc18-final100.pdf is pretty good.

It also seems to me that this issue should mention, at least on some level, #71882.

sumeerbhola · 2022-06-08T15:32:28Z

#82440 reduces the token allocation interval to 250ms.

I agree that a smaller interval like 1ms would have less latency impact. A simple example, is a case where 1000 tokens are available every 1s, and there is a uniform arrival rate of 1000 requests/s of high-priority requests and 2000 request/s of low-priority request. If 1000 tokens are given out in a burst every second, the eventual steady state will be 1000 waiting high-priority requests at every 1s tick, which have been waiting for a mean duration of 500ms. If 1 token is given out every 1ms, then the eventual steady state is 1 waiting high-priority request at every 1ms tick, which has been waiting for a mean of 0.5ms. In both cases, eventually no low-priority requests are admitted, but the latency impact of the latter is minimal.

sumeerbhola · 2022-11-08T14:54:13Z

I've pulled out the smoothing of token allocation into its own issue #91509

sumeerbhola · 2022-11-08T16:38:07Z

#91519 has a clearer specification of what we desire from the ioLoadListener.

Closing this issue.

tbg added the A-admission-control label May 31, 2022

This comment was marked as resolved.

Sign in to view

blathers-crl bot added T-kv KV Team C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels May 31, 2022

tbg mentioned this issue May 31, 2022

kvserver: throttle writes on followers #79215

Closed

lunevalex added the O-postmortem Originated from a Postmortem action item. label Jun 8, 2022

tbg mentioned this issue Jun 27, 2022

admission,kvserver: subject snapshot ingestion to admission control #80607

Closed

lunevalex mentioned this issue Jul 15, 2022

admission: prioritization thresholds for workloads #84496

Closed

sumeerbhola closed this as completed Nov 8, 2022

sumeerbhola closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission: graceful degradation #82114

admission: graceful degradation #82114

tbg commented May 31, 2022 •

edited by cockroach-jira-scripts

Loading

This comment was marked as resolved.

ajwerner commented May 31, 2022

sumeerbhola commented Jun 8, 2022

sumeerbhola commented Nov 8, 2022

sumeerbhola commented Nov 8, 2022

admission: graceful degradation #82114

admission: graceful degradation #82114

Comments

tbg commented May 31, 2022 • edited by cockroach-jira-scripts Loading

This comment was marked as resolved.

ajwerner commented May 31, 2022

sumeerbhola commented Jun 8, 2022

sumeerbhola commented Nov 8, 2022

sumeerbhola commented Nov 8, 2022

tbg commented May 31, 2022 •

edited by cockroach-jira-scripts

Loading