-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
admission: epoch based LIFO to prevent throughput collapse
The epoch-LIFO scheme monitors the queueing delay for each (tenant, priority) pair and switches between FIFO and LIFO queueing based on the maximum observed delay. Lower percentile latency can be reduced under LIFO, at the expense of increasing higher percentile latency. This behavior can help when it is important to finish some transactions in a timely manner, for scenarios which have external deadlines. Under FIFO, one could experience throughput collapse in the presence of such deadlines and an open loop workload, since when the first work item for a transaction reaches the front of the queue, the transaction is close to exceeding its deadline. The epoch aspect of this scheme relies on clock synchronization (which we have in CockroachDB deployments) and the expectation that transaction/query deadlines will be significantly higher than execution time under low load. A standard LIFO scheme suffers from a severe problem when a single user transaction can result in multiple units of lower-level work that get distributed to many nodes, and work execution can result in new work being submitted for admission: the later work for a transaction may no longer be the latest seen by the system (since "latest" is defined based on transaction start time), so will not be preferred. This means LIFO would do some work items from each transaction and starve the remaining work, so nothing would complete. This can be as bad or worse than FIFO which at least prefers the same transactions until they are complete (both FIFO and LIFO are using the transaction start time, and not the individual work arrival time). Consider a case where transaction deadlines are 1s (note this may not necessarily be an actual deadline, and could be a time duration after which the user impact is extremely negative), and typical transaction execution times (under low load) of 10ms. A 100ms epoch will increase transaction latency to at most 100ms + 5ms + 10ms, since execution will not start until the epoch of the transaction's start time is closed (5ms is the grace period before we "close" an epoch). At that time, due to clock synchronization, all nodes will start executing that epoch and will implicitly have the same set of competing transactions, which are ordered in the same manner. This set of competing transactions will stay unchanged until the next epoch close. And by the time the next epoch closes and the current epoch's transactions are deprioritized, 100ms will have elapsed, which is enough time for most of these transactions that got admitted to have finished all their work. The clock synchronization expected here is stronger than the default 500ms value of --max-offset, but that value is deliberately set to be extremely conservative to avoid stale reads, while the use here has no effect on correctness. Note that LIFO queueing will only happen at bottleneck nodes, and decided on a (tenant, priority) basis. So if there is even a single bottleneck node for a (tenant, priority), the above delay will occur. When the epoch closes at the bottleneck node, the creation time for this transaction will be sufficiently in the past, so the non-bottleneck nodes (using FIFO) will prioritize it over recent transactions. There is a queue ordering inversion in that the non-bottleneck nodes are ordering in the opposite way for such closed epochs, but since they are not bottlenecked, the queueing delay should be minimal. Preliminary experiments with kv50/enc=false/nodes=1/conc=8192 are promising in reducing p50 and p75 latency. Release note (ops change): The admission.epoch_lifo.enabled cluster setting, disabled by default, enabled the use of epoch-LIFO adaptive queueing behavior in admission control.
- Loading branch information
1 parent
d10188f
commit 7f4bf73
Showing
11 changed files
with
1,226 additions
and
96 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,231 @@ | ||
init | ||
---- | ||
|
||
# One request at priority=-128 sees high latency. Requests at priority 0, 127 | ||
# do not see high latency. So FIFO priority is set >= -127. | ||
request-received priority=127 | ||
---- | ||
lowest-priority: 127 | ||
|
||
update priority=127 delay-millis=10 | ||
---- | ||
lowest-priority: 127 (pri: 127, delay-millis: 10, admitted: 1) | ||
|
||
request-received priority=-128 | ||
---- | ||
lowest-priority: -128 (pri: 127, delay-millis: 10, admitted: 1) | ||
|
||
update priority=-128 delay-millis=106 | ||
---- | ||
lowest-priority: -128 (pri: -128, delay-millis: 106, admitted: 1) (pri: 127, delay-millis: 10, admitted: 1) | ||
|
||
request-received priority=0 | ||
---- | ||
lowest-priority: -128 (pri: -128, delay-millis: 106, admitted: 1) (pri: 127, delay-millis: 10, admitted: 1) | ||
|
||
update priority=0 delay-millis=20 | ||
---- | ||
lowest-priority: -128 (pri: -128, delay-millis: 106, admitted: 1) (pri: 0, delay-millis: 20, admitted: 1) (pri: 127, delay-millis: 10, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: -127 | ||
|
||
# The latency seen by priority=-128 decreases but not below the threshold | ||
# needed to return to FIFO. So FIFO priority continues to be >= -127. | ||
request-received priority=-128 | ||
---- | ||
lowest-priority: -128 | ||
|
||
update priority=-128 delay-millis=11 | ||
---- | ||
lowest-priority: -128 (pri: -128, delay-millis: 11, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: -127 | ||
|
||
# The latency seen by priority=-128 is low enough to return to FIFO. | ||
request-received priority=-128 | ||
---- | ||
lowest-priority: -128 | ||
|
||
update priority=-128 delay-millis=10 | ||
---- | ||
lowest-priority: -128 (pri: -128, delay-millis: 10, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: -128 | ||
|
||
# Priority=127 sees high latency. FIFO priority is now >= 128. | ||
request-received priority=127 | ||
---- | ||
lowest-priority: 127 | ||
|
||
update priority=127 delay-millis=106 | ||
---- | ||
lowest-priority: 127 (pri: 127, delay-millis: 106, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: 128 | ||
|
||
# Both priority 24 and 127 see high latency. FIFO priority stays at >=128. | ||
request-received priority=127 | ||
---- | ||
lowest-priority: 127 | ||
|
||
update priority=127 delay-millis=106 | ||
---- | ||
lowest-priority: 127 (pri: 127, delay-millis: 106, admitted: 1) | ||
|
||
request-received priority=24 | ||
---- | ||
lowest-priority: 24 (pri: 127, delay-millis: 106, admitted: 1) | ||
|
||
update priority=24 delay-millis=107 | ||
---- | ||
lowest-priority: 24 (pri: 24, delay-millis: 107, admitted: 1) (pri: 127, delay-millis: 106, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: 128 | ||
|
||
# Priority -5 and 20 see high latency. There are no requests at any other | ||
# priority. The FIFO priority threshold reduces to >= 21. | ||
request-received priority=20 | ||
---- | ||
lowest-priority: 20 | ||
|
||
update priority=20 delay-millis=111 | ||
---- | ||
lowest-priority: 20 (pri: 20, delay-millis: 111, admitted: 1) | ||
|
||
request-received priority=-5 | ||
---- | ||
lowest-priority: -5 (pri: 20, delay-millis: 111, admitted: 1) | ||
|
||
update priority=-5 delay-millis=110 | ||
---- | ||
lowest-priority: -5 (pri: -5, delay-millis: 110, admitted: 1) (pri: 20, delay-millis: 111, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: 21 | ||
|
||
# Priority 0 is LIFO and sees latency that is not low enough to return it to | ||
# FIFO. The FIFO priority threshold reduces to >= 1. | ||
request-received priority=0 | ||
---- | ||
lowest-priority: 0 | ||
|
||
update priority=0 delay-millis=11 | ||
---- | ||
lowest-priority: 0 (pri: 0, delay-millis: 11, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: 1 | ||
|
||
# Priority -128 is LIFO and sees latency that is not low enough to return it | ||
# to FIFO. The FIFO priority threshold reduces to >= -127. | ||
request-received priority=-128 | ||
---- | ||
lowest-priority: -128 | ||
|
||
update priority=-128 delay-millis=11 | ||
---- | ||
lowest-priority: -128 (pri: -128, delay-millis: 11, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: -127 | ||
|
||
# Priority -128 is LIFO and sees very low latency and switches back to FIFO. | ||
request-received priority=-128 | ||
---- | ||
lowest-priority: -128 | ||
|
||
update priority=-128 delay-millis=9 | ||
---- | ||
lowest-priority: -128 (pri: -128, delay-millis: 9, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: -128 | ||
|
||
# Priority 0 is FIFO and sees a canceled request that does not meet the | ||
# latency threshold to switch to LIFO. It stays as FIFO. | ||
request-received priority=0 | ||
---- | ||
lowest-priority: 0 | ||
|
||
update priority=0 delay-millis=20 canceled=true | ||
---- | ||
lowest-priority: 0 (pri: 0, delay-millis: 20, admitted: 0) | ||
|
||
get-threshold | ||
---- | ||
threshold: -128 | ||
|
||
# Priority 0 is FIFO and sees a canceled request with very high latency, so | ||
# switched to LIFO. | ||
request-received priority=0 | ||
---- | ||
lowest-priority: 0 | ||
|
||
update priority=0 delay-millis=120 canceled=true | ||
---- | ||
lowest-priority: 0 (pri: 0, delay-millis: 120, admitted: 0) | ||
|
||
get-threshold | ||
---- | ||
threshold: 1 | ||
|
||
# Priority 0 receives a request, but nothing exits admission control, so it | ||
# stays as LIFO. | ||
request-received priority=0 | ||
---- | ||
lowest-priority: 0 | ||
|
||
get-threshold | ||
---- | ||
threshold: 1 | ||
|
||
# Priority 10 sees a request with low latency. Priority 0 has a request that | ||
# does not exit admission control. Priority 0 stays as LIFO. | ||
request-received priority=10 | ||
---- | ||
lowest-priority: 10 | ||
|
||
update priority=10 delay-millis=5 | ||
---- | ||
lowest-priority: 10 (pri: 10, delay-millis: 5, admitted: 1) | ||
|
||
request-received priority=0 | ||
---- | ||
lowest-priority: 0 (pri: 10, delay-millis: 5, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: 1 | ||
|
||
# Priority -10 sees a request with low enough latency to switch back to FIFO. | ||
# Priority 0 has a request that does not exit admission control. Because of | ||
# the observation at priority=-10 we switch everything back to FIFO. | ||
request-received priority=-10 | ||
---- | ||
lowest-priority: -10 | ||
|
||
update priority=-10 delay-millis=5 | ||
---- | ||
lowest-priority: -10 (pri: -10, delay-millis: 5, admitted: 1) | ||
|
||
request-received priority=0 | ||
---- | ||
lowest-priority: -10 (pri: -10, delay-millis: 5, admitted: 1) | ||
|
||
get-threshold | ||
---- | ||
threshold: -128 |
Oops, something went wrong.