-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: Scans with limit acquire excessively large latches #9521
Comments
That query seems to be an artifact of |
Since we already have a pagination mechanism via resume_span, could we On Sat, Sep 24, 2016 at 7:31 AM, Peter Mattis [email protected]
|
@petermattis Yes, this query is not a part of the load @tamird Ah, that's a nice idea. I think there's some concern on the SQL side about what happens if we serve partial results when the client didn't ask for it, but in general returning partial results instead of blocking in the command queue seems like a good idea. |
@tamird this pagination trick would help with this scan command not waiting for others before it, but wouldn't help with the scan blocking other commands that come after it (at least in the case the command queue is empty to begin with), right? I like the idea of attempting an inconsistent read first to get a clue about the required span. @bdarnell, seems to me that double execution can be avoided easily in the uncontended case if the command queue (or the replica otherwise) would maintain sequence numbers for incoming commands (or the spans of recent commands) and so it could assert that no overlapping command went through while we were doing the inconsistent read. Or, actually, can't the timestamp cache (which seems to contain writes too) serve this purpose? |
@andreimatei I think that's an orthogonal problem. In the case we're discussing here, the That said, I agree that we have a general granularity problem in the command queue (scans shouldn't hold up commands in their "wake"), but it doesn't seem immediately relevant here. |
We have this granularity problem on various levels. @vivekmenezes touches on something similar in #9419, but in the realm of intent resolution, where the problem is pretty dramatic given that the current code churns through a lot of data for nothing. With the above query, we'll "always" end up waiting on at least a few Raft commands, which is also pretty drastic. I like @bdarnell's suggestion: we execute the read, check for |
Umm can we just call |
You're right, the read would need to be done while holding the lock. That is probably too expensive so maybe we're going to have to be real smart™ right from the get-go, unfortunately. |
#13349 truncates the span added for scans when possible. |
Fixes cockroachdb#9521. Supersedes cockroachdb#31904. SQL has a tendency to create scans which cover a range's entire key span, though looking only to return a finite number of results. These requests end up blocking on writes that are holding latches over keys that the scan will not actually touch. In reality, when there is a scan with a limit, the actual affected key span ends up being a small subset of the range's key span. This change creates a new "optimistic evaluation" path for read-only requests. When evaluating optimistically, the batch will sequence itself with the latch manager, but will not wait to acquire all of its latches. Instead, it begins evaluating immediately and verifies that it would not have needed to wait on any latch acquisitions after-the-fact. When performing this verification, it uses knowledge about the limited set of keys that the scan actually looked at. If there are no conflicts, the scan succeeds. If there are conflicts, the request waits for all of its latch acquisition attempts to finish and re-evaluates. This PR replaces cockroachdb#31904. The major difference between the two is that this PR exploits the structure of the latch manager to efficiently perform optimistic latch acquisition and after-the-fact verification of conflicts. Doing this requires keeping no extra state because it uses the immutable snapshots that the latch manager now captures during sequencing. The other major difference is that this PR does not release latches after a failed optimistic evaluation. NOTE: a prevalent theory of the pathological case with this behavior was that overestimated read latches would serialize with write latches, causing all requests on a range to serialize. I wasn't seeing this in practice. It turns out that the "timestamp awareness" in the latch manager should avoid this behavior in most cases because later writes will have higher timestamps than earlier reads. The effect of this is that they won't be considered to interfere by the latch manager. Still, large clusters with a high amount of clock skew could see a bounded variant of this situation. _### Benchmark Results ``` name old ops/sec new ops/sec delta kv95/cores=16/nodes=3/splits=3 51.9k ± 0% 51.7k ± 1% ~ (p=0.400 n=3+3) kvS70-L1/cores=16/nodes=3/splits=3 24.1k ± 4% 27.7k ± 1% +14.75% (p=0.100 n=3+3) kvS70-L5/cores=16/nodes=3/splits=3 24.5k ± 1% 27.5k ± 1% +12.08% (p=0.100 n=3+3) kvS70-L1000/cores=16/nodes=3/splits=3 16.0k ± 1% 16.6k ± 2% +3.79% (p=0.100 n=3+3) name old p50(ms) new p50(ms) delta kv95/cores=16/nodes=3/splits=3 0.70 ± 0% 0.70 ± 0% ~ (all equal) kvS70-L1/cores=16/nodes=3/splits=3 1.07 ± 6% 0.90 ± 0% -15.62% (p=0.100 n=3+3) kvS70-L5/cores=16/nodes=3/splits=3 1.10 ± 0% 0.90 ± 0% -18.18% (p=0.100 n=3+3) kvS70-L1000/cores=16/nodes=3/splits=3 1.80 ± 0% 1.67 ± 4% -7.41% (p=0.100 n=3+3) name old p99(ms) new p99(ms) delta kv95/cores=16/nodes=3/splits=3 1.80 ± 0% 1.80 ± 0% ~ (all equal) kvS70-L1/cores=16/nodes=3/splits=3 5.77 ±32% 4.70 ± 0% ~ (p=0.400 n=3+3) kvS70-L5/cores=16/nodes=3/splits=3 5.00 ± 0% 4.70 ± 0% -6.00% (p=0.100 n=3+3) kvS70-L1000/cores=16/nodes=3/splits=3 6.90 ± 3% 7.33 ± 8% ~ (p=0.400 n=3+3) ``` _S<num> = --span-percent=<num>, L<num> = --span-limit=<num>_ Release note (performance improvement): improved performance on workloads which mix OLAP queries with inserts and updates.
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans to false.
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.
58670: concurrency,kvserver: limited scans optimistically check for locks r=sumeerbhola a=sumeerbhola This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: ``` name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) ``` Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. ``` name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) ``` Fixes #49973 Informs #9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false. Co-authored-by: sumeerbhola <[email protected]>
Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Fixes cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.
Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Fixes cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.
Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Microbenchmark numbers: name old time/op new time/op delta OptimisticEvalForLocks/real-contention=false-16 28.8µs ± 6% 29.4µs ± 5% ~ (p=0.089 n=10+10) OptimisticEvalForLocks/real-contention=true-16 5.52ms ± 1% 5.54ms ± 1% ~ (p=0.123 n=10+10) OptimisticEvalForLatches/real-contention=false/num-writers=1-16 63.1µs ± 9% 32.7µs ± 4% -48.21% (p=0.000 n=10+10) OptimisticEvalForLatches/real-contention=false/num-writers=4-16 66.8µs ± 3% 34.4µs ± 5% -48.51% (p=0.000 n=10+10) OptimisticEvalForLatches/real-contention=false/num-writers=8-16 69.4µs ± 1% 35.7µs ± 2% -48.47% (p=0.000 n=10+10) OptimisticEvalForLatches/real-contention=true/num-writers=1-16 127µs ± 5% 112µs ± 2% -11.68% (p=0.000 n=10+10) OptimisticEvalForLatches/real-contention=true/num-writers=4-16 128µs ± 5% 112µs ± 2% -12.38% (p=0.000 n=9+10) OptimisticEvalForLatches/real-contention=true/num-writers=8-16 140µs ± 2% 115µs ± 2% -17.56% (p=0.000 n=9+7) name old alloc/op new alloc/op delta OptimisticEvalForLocks/real-contention=false-16 8.29kB ± 3% 8.38kB ± 5% ~ (p=0.367 n=9+10) OptimisticEvalForLocks/real-contention=true-16 32.2kB ±10% 42.2kB ±108% ~ (p=0.321 n=8+9) OptimisticEvalForLatches/real-contention=false/num-writers=1-16 16.6kB ± 4% 13.1kB ± 2% -21.07% (p=0.000 n=9+9) OptimisticEvalForLatches/real-contention=false/num-writers=4-16 16.6kB ± 7% 12.9kB ± 1% -22.23% (p=0.000 n=9+9) OptimisticEvalForLatches/real-contention=false/num-writers=8-16 16.3kB ± 0% 12.7kB ± 1% -22.20% (p=0.000 n=9+9) OptimisticEvalForLatches/real-contention=true/num-writers=1-16 26.8kB ± 0% 26.8kB ± 0% +0.14% (p=0.049 n=9+9) OptimisticEvalForLatches/real-contention=true/num-writers=4-16 26.8kB ± 0% 26.8kB ± 0% +0.12% (p=0.027 n=8+9) OptimisticEvalForLatches/real-contention=true/num-writers=8-16 26.8kB ± 0% 26.8kB ± 0% ~ (p=1.000 n=10+8) name old allocs/op new allocs/op delta OptimisticEvalForLocks/real-contention=false-16 68.6 ± 1% 70.0 ± 0% +2.04% (p=0.000 n=10+7) OptimisticEvalForLocks/real-contention=true-16 272 ± 3% 272 ± 2% ~ (p=0.777 n=8+8) OptimisticEvalForLatches/real-contention=false/num-writers=1-16 149 ± 0% 117 ± 1% -21.41% (p=0.000 n=8+10) OptimisticEvalForLatches/real-contention=false/num-writers=4-16 150 ± 0% 116 ± 2% -22.80% (p=0.000 n=7+10) OptimisticEvalForLatches/real-contention=false/num-writers=8-16 150 ± 0% 114 ± 1% -24.20% (p=0.000 n=9+10) OptimisticEvalForLatches/real-contention=true/num-writers=1-16 261 ± 0% 261 ± 0% ~ (all equal) OptimisticEvalForLatches/real-contention=true/num-writers=4-16 261 ± 0% 261 ± 0% ~ (p=0.471 n=9+9) OptimisticEvalForLatches/real-contention=true/num-writers=8-16 262 ± 0% 261 ± 0% -0.38% (p=0.000 n=8+8) Fixes cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.
Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Numbers for the new OptimisticEval benchmark show improvement when real-contention=false, compared to master which only had optimistic locking. There is a modest slowdown in the real-contention=true case since every optimistic read has to try twice. The benchmark has concurrent writes of two different kinds: latches represents 1PC writers, so no intents, while latches-and-locks represent writers that will leave intents. For the latter, when the optimistic read finds an intent during evaluation it cannot necessarily add it as a discovered lock, if there is also a conflicting latch (since it could be racing with intent resolution). This case is rare in these benchmarks (latches-and-locks/real-contention=true): 13% found an intent and had a conflicting latch when num-writers=1 and < 1% for the same case when num-writers=4. The remainder (the common case) was to find a conflict when looking for conflicting latches/locks using CheckOptimisticNoConflicts. name old time/op new time/op delta OptimisticEvalForLocks/real-contention=false-16 28.2µs ± 4% 28.5µs ± 5% ~ (p=0.393 n=10+10) OptimisticEvalForLocks/real-contention=true-16 5.52ms ± 1% 5.52ms ± 1% ~ (p=0.796 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 220µs ±62% 89µs ±38% -59.37% (p=0.000 n=10+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 213µs ±65% 155µs ±82% -27.33% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 1.33ms ± 3% 1.27ms ±16% ~ (p=0.829 n=8+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 2.02ms ±10% 2.25ms ± 9% +11.31% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 175µs ± 2% 45µs ± 5% -74.05% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 613µs ± 2% 44µs ± 3% -92.74% (p=0.000 n=10+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 181µs ± 4% 179µs ± 5% ~ (p=0.315 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 624µs ± 3% 620µs ± 3% ~ (p=0.247 n=10+10) name old alloc/op new alloc/op delta OptimisticEvalForLocks/real-contention=false-16 8.40kB ± 7% 8.33kB ± 3% ~ (p=1.000 n=10+8) OptimisticEvalForLocks/real-contention=true-16 31.8kB ± 7% 32.6kB ± 9% ~ (p=0.382 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 23.9kB ±21% 17.8kB ±25% -25.55% (p=0.003 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 24.1kB ±20% 19.4kB ±22% -19.56% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 104kB ± 1% 101kB ± 1% -2.89% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 197kB ± 6% 217kB ±11% +10.19% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 29.9kB ± 0% 13.4kB ± 1% -55.07% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 94.9kB ± 0% 14.5kB ± 1% -84.74% (p=0.000 n=9+8) OptimisticEval/latches/real-contention=true/num-writers=1-16 29.9kB ± 0% 31.3kB ± 0% +4.59% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 94.8kB ± 0% 96.2kB ± 0% +1.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta OptimisticEvalForLocks/real-contention=false-16 68.6 ± 1% 69.6 ± 2% +1.52% (p=0.005 n=9+10) OptimisticEvalForLocks/real-contention=true-16 271 ± 2% 272 ± 2% ~ (p=0.336 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 149 ±17% 117 ±18% -21.00% (p=0.002 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 151 ±16% 126 ±18% -16.31% (p=0.013 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 610 ± 0% 559 ± 1% -8.32% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 1.12k ± 5% 1.19k ±12% ~ (p=0.053 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 162 ± 0% 91 ± 0% -43.83% (p=0.001 n=8+9) OptimisticEval/latches/real-contention=false/num-writers=4-16 445 ± 0% 96 ± 0% -78.44% (p=0.000 n=9+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 163 ± 0% 184 ± 0% +13.07% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 445 ± 0% 467 ± 0% +4.94% (p=0.000 n=10+10) Fixes cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.
66059: concurrency,kvserver: limited scans do optimistic latching r=sumeerbhola a=sumeerbhola Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Numbers for the new OptimisticEval benchmark show improvement when real-contention=false, compared to master which only had optimistic locking. There is a modest slowdown in the real-contention=true case since every optimistic read has to try twice. The benchmark has concurrent writes of two different kinds: latches represents 1PC writers, so no intents, while latches-and-locks represent writers that will leave intents. For the latter, when the optimistic read finds an intent during evaluation it cannot necessarily add it as a discovered lock, if there is also a conflicting latch (since it could be racing with intent resolution). This case is rare in these benchmarks (latches-and-locks/real-contention=true): 13% found an intent and had a conflicting latch when num-writers=1 and < 1% for the same case when num-writers=4. The remainder (the common case) was to find a conflict when looking for conflicting latches/locks using CheckOptimisticNoConflicts. ``` name old time/op new time/op delta OptimisticEvalForLocks/real-contention=false-16 28.2µs ± 4% 28.5µs ± 5% ~ (p=0.393 n=10+10) OptimisticEvalForLocks/real-contention=true-16 5.52ms ± 1% 5.52ms ± 1% ~ (p=0.796 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 220µs ±62% 89µs ±38% -59.37% (p=0.000 n=10+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 213µs ±65% 155µs ±82% -27.33% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 1.33ms ± 3% 1.27ms ±16% ~ (p=0.829 n=8+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 2.02ms ±10% 2.25ms ± 9% +11.31% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 175µs ± 2% 45µs ± 5% -74.05% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 613µs ± 2% 44µs ± 3% -92.74% (p=0.000 n=10+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 181µs ± 4% 179µs ± 5% ~ (p=0.315 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 624µs ± 3% 620µs ± 3% ~ (p=0.247 n=10+10) name old alloc/op new alloc/op delta OptimisticEvalForLocks/real-contention=false-16 8.40kB ± 7% 8.33kB ± 3% ~ (p=1.000 n=10+8) OptimisticEvalForLocks/real-contention=true-16 31.8kB ± 7% 32.6kB ± 9% ~ (p=0.382 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 23.9kB ±21% 17.8kB ±25% -25.55% (p=0.003 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 24.1kB ±20% 19.4kB ±22% -19.56% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 104kB ± 1% 101kB ± 1% -2.89% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 197kB ± 6% 217kB ±11% +10.19% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 29.9kB ± 0% 13.4kB ± 1% -55.07% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 94.9kB ± 0% 14.5kB ± 1% -84.74% (p=0.000 n=9+8) OptimisticEval/latches/real-contention=true/num-writers=1-16 29.9kB ± 0% 31.3kB ± 0% +4.59% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 94.8kB ± 0% 96.2kB ± 0% +1.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta OptimisticEvalForLocks/real-contention=false-16 68.6 ± 1% 69.6 ± 2% +1.52% (p=0.005 n=9+10) OptimisticEvalForLocks/real-contention=true-16 271 ± 2% 272 ± 2% ~ (p=0.336 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 149 ±17% 117 ±18% -21.00% (p=0.002 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 151 ±16% 126 ±18% -16.31% (p=0.013 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 610 ± 0% 559 ± 1% -8.32% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 1.12k ± 5% 1.19k ±12% ~ (p=0.053 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 162 ± 0% 91 ± 0% -43.83% (p=0.001 n=8+9) OptimisticEval/latches/real-contention=false/num-writers=4-16 445 ± 0% 96 ± 0% -78.44% (p=0.000 n=9+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 163 ± 0% 184 ± 0% +13.07% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 445 ± 0% 467 ± 0% +4.94% (p=0.000 n=10+10) ``` Fixes #9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false. Co-authored-by: sumeerbhola <[email protected]>
A query like
SELECT * FROM t WHERE indexed_column > $1 LIMIT 1
has a very large key Span (from key$1
to the end of the table or the end of the range, whichever comes first), but really only depends on a small amount of data (from$1
to the first key greater than that value). The command queue only sees the former, so this query must wait behind any updates to any other rows in the table, not just the rows that it will eventually return.We could minimize this contention at the expense of throughput as follows. For read-only commands with (small) limits, execute the command first, before putting it in the command queue. If it reaches its limit, narrow the span based on the keys that were actually touched. Put it in the command queue under the narrowed span. After waiting on the command queue, execute it again. If it doesn't hit the limit while staying inside the narrowed span, something has changed out from under us and we have to re-queue with a broader span.
There is probably a way to be clever and avoid the double execution in the common case, e.g. if the command queue allows the narrowed span to execute immediately we can use the results of the first execution.
This is a major cause of slowness for
photos
(#9247). For example, this trace spends 100ms in the command queue on its first scan. There is probably a related problem with the timestamp cache, but I haven't confirmed its existence or impact yet.The text was updated successfully, but these errors were encountered: