Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission: graceful degradation #82114

Closed
tbg opened this issue May 31, 2022 · 5 comments
Closed

admission: graceful degradation #82114

tbg opened this issue May 31, 2022 · 5 comments
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented May 31, 2022

Is your feature request related to a problem? Please describe.

Today, the admission control subsystem protects a node "abruptly and severely". For example, for IO overload, as the L0 file or sublevel count slowly creeps towards the threshold (1000 and 20, respectively), no throttling will occur. As the threshold is crossed, admission control becomes active and throttles (and it does so in a way that doesn't necessarily minimize variance across requests, for example IO tokens are handed out at 1s intervals, so some requests may not be throttled at all and others for seconds at a time).

Describe the solution you'd like

CockroachDB should degrade gracefully: throttling should be introduced gently, as the thresholds are reached, and should throttle requests uniformly.

For example, for a kv0 workload in which the concurrency is ramped up over time, what we would hope to see is that p99s (of admission throttling over an interval) as a function of the concurrency (which is in 1:1 correspondence to targeted throughput) forms a smooth upward curve as opposed to a choppy step function.

Describe alternatives you've considered

Additional context

Related to #79215
Related to #81834

Jira issue: CRDB-16217

@blathers-crl

This comment was marked as resolved.

@blathers-crl blathers-crl bot added T-kv KV Team C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels May 31, 2022
@ajwerner
Copy link
Contributor

and should throttle requests uniformly.

I'm being nit-picky about the language here. I don't know that "uniformly" is a good goal, depending on the definition of "uniformly". Consider, in particular, long-running, multi-statement transactions. Imagine you want to be throttling 5% of your requests. If you do that uniformly at the KV layer, and each transaction involved 10 underlying kv requests, then 40% of your transactions will experience some throttling. You move up the latency curve much more quickly when issuing a series of requests each of which is exposed to independent throttling decisions. The discussion in https://www.cs.columbia.edu/~ruigu/papers/socc18-final100.pdf is pretty good.

It also seems to me that this issue should mention, at least on some level, #71882.

@sumeerbhola
Copy link
Collaborator

#82440 reduces the token allocation interval to 250ms.

I agree that a smaller interval like 1ms would have less latency impact. A simple example, is a case where 1000 tokens are available every 1s, and there is a uniform arrival rate of 1000 requests/s of high-priority requests and 2000 request/s of low-priority request. If 1000 tokens are given out in a burst every second, the eventual steady state will be 1000 waiting high-priority requests at every 1s tick, which have been waiting for a mean duration of 500ms. If 1 token is given out every 1ms, then the eventual steady state is 1 waiting high-priority request at every 1ms tick, which has been waiting for a mean of 0.5ms. In both cases, eventually no low-priority requests are admitted, but the latency impact of the latter is minimal.

@sumeerbhola
Copy link
Collaborator

I've pulled out the smoothing of token allocation into its own issue #91509

@sumeerbhola
Copy link
Collaborator

#91519 has a clearer specification of what we desire from the ioLoadListener.

Closing this issue.

@sumeerbhola sumeerbhola closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

4 participants