kv: Scans with limit acquire excessively large latches #9521

bdarnell · 2016-09-24T04:32:55Z

A query like SELECT * FROM t WHERE indexed_column > $1 LIMIT 1 has a very large key Span (from key $1 to the end of the table or the end of the range, whichever comes first), but really only depends on a small amount of data (from $1 to the first key greater than that value). The command queue only sees the former, so this query must wait behind any updates to any other rows in the table, not just the rows that it will eventually return.

We could minimize this contention at the expense of throughput as follows. For read-only commands with (small) limits, execute the command first, before putting it in the command queue. If it reaches its limit, narrow the span based on the keys that were actually touched. Put it in the command queue under the narrowed span. After waiting on the command queue, execute it again. If it doesn't hit the limit while staying inside the narrowed span, something has changed out from under us and we have to re-queue with a broader span.

There is probably a way to be clever and avoid the double execution in the common case, e.g. if the command queue allows the narrowed span to execute immediately we can use the results of the first execution.

This is a major cause of slowness for photos (#9247). For example, this trace spends 100ms in the command queue on its first scan. There is probably a related problem with the timestamp cache, but I haven't confirmed its existence or impact yet.

The text was updated successfully, but these errors were encountered:

petermattis · 2016-09-24T11:31:28Z

That query seems to be an artifact of photos, not of the load photos is intended to model. photos uses that query to find a random user-id to to list the photos of but in a real app the user-id would be known.

tamird · 2016-09-24T13:18:20Z

Since we already have a pagination mechanism via resume_span, could we
instead insert into the command queue the maximum span that doesn't overlap
with other, already-running commands? Then we'd paginate as usual if
another read is required to meet the limit.

On Sat, Sep 24, 2016 at 7:31 AM, Peter Mattis [email protected]
wrote:

That query seems to be an artifact of photos, not of the load photos is
intended to model. photos uses that query to find a random user-id to to
list the photos of but in a real app the user-id would be known.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#9521 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABdsPJSTFpOfsEDKMesoT-ab1Z4tPf0vks5qtQoXgaJpZM4KFjKu
.

bdarnell · 2016-09-24T13:47:42Z

@petermattis Yes, this query is not a part of the load photos is intended to model. However, it is a useful enough pattern that it's worth optimizing (and we should probably come up with another load generator (or transaction for photos) that exercises this.

@tamird Ah, that's a nice idea. I think there's some concern on the SQL side about what happens if we serve partial results when the client didn't ask for it, but in general returning partial results instead of blocking in the command queue seems like a good idea.

andreimatei · 2016-09-24T18:06:34Z

@tamird this pagination trick would help with this scan command not waiting for others before it, but wouldn't help with the scan blocking other commands that come after it (at least in the case the command queue is empty to begin with), right?

I like the idea of attempting an inconsistent read first to get a clue about the required span. @bdarnell, seems to me that double execution can be avoided easily in the uncontended case if the command queue (or the replica otherwise) would maintain sequence numbers for incoming commands (or the spans of recent commands) and so it could assert that no overlapping command went through while we were doing the inconsistent read. Or, actually, can't the timestamp cache (which seems to contain writes too) serve this purpose?

tamird · 2016-09-26T14:01:11Z

@andreimatei I think that's an orthogonal problem. In the case we're discussing here, the ... LIMIT 1 query results in a very large span, and the corresponding wait on entering the command queue is very long. However, once the command executes, it is likely to finish very quickly (because it is limited to one row).

That said, I agree that we have a general granularity problem in the command queue (scans shouldn't hold up commands in their "wake"), but it doesn't seem immediately relevant here.

tbg · 2016-09-26T15:50:10Z

We have this granularity problem on various levels. @vivekmenezes touches on something similar in #9419, but in the realm of intent resolution, where the problem is pretty dramatic given that the current code churns through a lot of data for nothing.

With the above query, we'll "always" end up waiting on at least a few Raft commands, which is also pretty drastic. I like @bdarnell's suggestion: we execute the read, check for getOverlaps in the command queue for the resulting span, and if there aren't any we get to use our previous result. If there is something outstanding, we bite the bullet and grab the whole span plus re-execute when it's our turn. Optimizing that second case further could be useful at some point too, but it's more complicated and less frequently useful.

andreimatei · 2016-09-26T16:05:57Z

Umm can we just call getOverlaps and return the result? What if there was a quick write in there that finished by the time we call getOverlaps? Don't we need to either use the timestamp cache, or redo the read, this time going through the CommandQueue (with the small span), and then possibly re-doing it again if the consistent read failed to satisfy our limit this time?

tbg · 2016-09-26T16:51:54Z

You're right, the read would need to be done while holding the lock. That is probably too expensive so maybe we're going to have to be real smart™ right from the get-go, unfortunately.

petermattis · 2017-02-02T02:18:48Z

#13349 truncates the span added for scans when possible.

Fixes cockroachdb#9521. Supersedes cockroachdb#31904. SQL has a tendency to create scans which cover a range's entire key span, though looking only to return a finite number of results. These requests end up blocking on writes that are holding latches over keys that the scan will not actually touch. In reality, when there is a scan with a limit, the actual affected key span ends up being a small subset of the range's key span. This change creates a new "optimistic evaluation" path for read-only requests. When evaluating optimistically, the batch will sequence itself with the latch manager, but will not wait to acquire all of its latches. Instead, it begins evaluating immediately and verifies that it would not have needed to wait on any latch acquisitions after-the-fact. When performing this verification, it uses knowledge about the limited set of keys that the scan actually looked at. If there are no conflicts, the scan succeeds. If there are conflicts, the request waits for all of its latch acquisition attempts to finish and re-evaluates. This PR replaces cockroachdb#31904. The major difference between the two is that this PR exploits the structure of the latch manager to efficiently perform optimistic latch acquisition and after-the-fact verification of conflicts. Doing this requires keeping no extra state because it uses the immutable snapshots that the latch manager now captures during sequencing. The other major difference is that this PR does not release latches after a failed optimistic evaluation. NOTE: a prevalent theory of the pathological case with this behavior was that overestimated read latches would serialize with write latches, causing all requests on a range to serialize. I wasn't seeing this in practice. It turns out that the "timestamp awareness" in the latch manager should avoid this behavior in most cases because later writes will have higher timestamps than earlier reads. The effect of this is that they won't be considered to interfere by the latch manager. Still, large clusters with a high amount of clock skew could see a bounded variant of this situation. _### Benchmark Results ``` name old ops/sec new ops/sec delta kv95/cores=16/nodes=3/splits=3 51.9k ± 0% 51.7k ± 1% ~ (p=0.400 n=3+3) kvS70-L1/cores=16/nodes=3/splits=3 24.1k ± 4% 27.7k ± 1% +14.75% (p=0.100 n=3+3) kvS70-L5/cores=16/nodes=3/splits=3 24.5k ± 1% 27.5k ± 1% +12.08% (p=0.100 n=3+3) kvS70-L1000/cores=16/nodes=3/splits=3 16.0k ± 1% 16.6k ± 2% +3.79% (p=0.100 n=3+3) name old p50(ms) new p50(ms) delta kv95/cores=16/nodes=3/splits=3 0.70 ± 0% 0.70 ± 0% ~ (all equal) kvS70-L1/cores=16/nodes=3/splits=3 1.07 ± 6% 0.90 ± 0% -15.62% (p=0.100 n=3+3) kvS70-L5/cores=16/nodes=3/splits=3 1.10 ± 0% 0.90 ± 0% -18.18% (p=0.100 n=3+3) kvS70-L1000/cores=16/nodes=3/splits=3 1.80 ± 0% 1.67 ± 4% -7.41% (p=0.100 n=3+3) name old p99(ms) new p99(ms) delta kv95/cores=16/nodes=3/splits=3 1.80 ± 0% 1.80 ± 0% ~ (all equal) kvS70-L1/cores=16/nodes=3/splits=3 5.77 ±32% 4.70 ± 0% ~ (p=0.400 n=3+3) kvS70-L5/cores=16/nodes=3/splits=3 5.00 ± 0% 4.70 ± 0% -6.00% (p=0.100 n=3+3) kvS70-L1000/cores=16/nodes=3/splits=3 6.90 ± 3% 7.33 ± 8% ~ (p=0.400 n=3+3) ``` _S<num> = --span-percent=<num>, L<num> = --span-limit=<num>_ Release note (performance improvement): improved performance on workloads which mix OLAP queries with inserts and updates.

This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None

This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None

This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans to false.

This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.

58670: concurrency,kvserver: limited scans optimistically check for locks r=sumeerbhola a=sumeerbhola This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: ``` name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) ``` Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. ``` name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) ``` Fixes #49973 Informs #9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false. Co-authored-by: sumeerbhola <[email protected]>

Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Fixes cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.

Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Microbenchmark numbers: name old time/op new time/op delta OptimisticEvalForLocks/real-contention=false-16 28.8µs ± 6% 29.4µs ± 5% ~ (p=0.089 n=10+10) OptimisticEvalForLocks/real-contention=true-16 5.52ms ± 1% 5.54ms ± 1% ~ (p=0.123 n=10+10) OptimisticEvalForLatches/real-contention=false/num-writers=1-16 63.1µs ± 9% 32.7µs ± 4% -48.21% (p=0.000 n=10+10) OptimisticEvalForLatches/real-contention=false/num-writers=4-16 66.8µs ± 3% 34.4µs ± 5% -48.51% (p=0.000 n=10+10) OptimisticEvalForLatches/real-contention=false/num-writers=8-16 69.4µs ± 1% 35.7µs ± 2% -48.47% (p=0.000 n=10+10) OptimisticEvalForLatches/real-contention=true/num-writers=1-16 127µs ± 5% 112µs ± 2% -11.68% (p=0.000 n=10+10) OptimisticEvalForLatches/real-contention=true/num-writers=4-16 128µs ± 5% 112µs ± 2% -12.38% (p=0.000 n=9+10) OptimisticEvalForLatches/real-contention=true/num-writers=8-16 140µs ± 2% 115µs ± 2% -17.56% (p=0.000 n=9+7) name old alloc/op new alloc/op delta OptimisticEvalForLocks/real-contention=false-16 8.29kB ± 3% 8.38kB ± 5% ~ (p=0.367 n=9+10) OptimisticEvalForLocks/real-contention=true-16 32.2kB ±10% 42.2kB ±108% ~ (p=0.321 n=8+9) OptimisticEvalForLatches/real-contention=false/num-writers=1-16 16.6kB ± 4% 13.1kB ± 2% -21.07% (p=0.000 n=9+9) OptimisticEvalForLatches/real-contention=false/num-writers=4-16 16.6kB ± 7% 12.9kB ± 1% -22.23% (p=0.000 n=9+9) OptimisticEvalForLatches/real-contention=false/num-writers=8-16 16.3kB ± 0% 12.7kB ± 1% -22.20% (p=0.000 n=9+9) OptimisticEvalForLatches/real-contention=true/num-writers=1-16 26.8kB ± 0% 26.8kB ± 0% +0.14% (p=0.049 n=9+9) OptimisticEvalForLatches/real-contention=true/num-writers=4-16 26.8kB ± 0% 26.8kB ± 0% +0.12% (p=0.027 n=8+9) OptimisticEvalForLatches/real-contention=true/num-writers=8-16 26.8kB ± 0% 26.8kB ± 0% ~ (p=1.000 n=10+8) name old allocs/op new allocs/op delta OptimisticEvalForLocks/real-contention=false-16 68.6 ± 1% 70.0 ± 0% +2.04% (p=0.000 n=10+7) OptimisticEvalForLocks/real-contention=true-16 272 ± 3% 272 ± 2% ~ (p=0.777 n=8+8) OptimisticEvalForLatches/real-contention=false/num-writers=1-16 149 ± 0% 117 ± 1% -21.41% (p=0.000 n=8+10) OptimisticEvalForLatches/real-contention=false/num-writers=4-16 150 ± 0% 116 ± 2% -22.80% (p=0.000 n=7+10) OptimisticEvalForLatches/real-contention=false/num-writers=8-16 150 ± 0% 114 ± 1% -24.20% (p=0.000 n=9+10) OptimisticEvalForLatches/real-contention=true/num-writers=1-16 261 ± 0% 261 ± 0% ~ (all equal) OptimisticEvalForLatches/real-contention=true/num-writers=4-16 261 ± 0% 261 ± 0% ~ (p=0.471 n=9+9) OptimisticEvalForLatches/real-contention=true/num-writers=8-16 262 ± 0% 261 ± 0% -0.38% (p=0.000 n=8+8) Fixes cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.

Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Numbers for the new OptimisticEval benchmark show improvement when real-contention=false, compared to master which only had optimistic locking. There is a modest slowdown in the real-contention=true case since every optimistic read has to try twice. The benchmark has concurrent writes of two different kinds: latches represents 1PC writers, so no intents, while latches-and-locks represent writers that will leave intents. For the latter, when the optimistic read finds an intent during evaluation it cannot necessarily add it as a discovered lock, if there is also a conflicting latch (since it could be racing with intent resolution). This case is rare in these benchmarks (latches-and-locks/real-contention=true): 13% found an intent and had a conflicting latch when num-writers=1 and < 1% for the same case when num-writers=4. The remainder (the common case) was to find a conflict when looking for conflicting latches/locks using CheckOptimisticNoConflicts. name old time/op new time/op delta OptimisticEvalForLocks/real-contention=false-16 28.2µs ± 4% 28.5µs ± 5% ~ (p=0.393 n=10+10) OptimisticEvalForLocks/real-contention=true-16 5.52ms ± 1% 5.52ms ± 1% ~ (p=0.796 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 220µs ±62% 89µs ±38% -59.37% (p=0.000 n=10+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 213µs ±65% 155µs ±82% -27.33% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 1.33ms ± 3% 1.27ms ±16% ~ (p=0.829 n=8+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 2.02ms ±10% 2.25ms ± 9% +11.31% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 175µs ± 2% 45µs ± 5% -74.05% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 613µs ± 2% 44µs ± 3% -92.74% (p=0.000 n=10+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 181µs ± 4% 179µs ± 5% ~ (p=0.315 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 624µs ± 3% 620µs ± 3% ~ (p=0.247 n=10+10) name old alloc/op new alloc/op delta OptimisticEvalForLocks/real-contention=false-16 8.40kB ± 7% 8.33kB ± 3% ~ (p=1.000 n=10+8) OptimisticEvalForLocks/real-contention=true-16 31.8kB ± 7% 32.6kB ± 9% ~ (p=0.382 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 23.9kB ±21% 17.8kB ±25% -25.55% (p=0.003 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 24.1kB ±20% 19.4kB ±22% -19.56% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 104kB ± 1% 101kB ± 1% -2.89% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 197kB ± 6% 217kB ±11% +10.19% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 29.9kB ± 0% 13.4kB ± 1% -55.07% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 94.9kB ± 0% 14.5kB ± 1% -84.74% (p=0.000 n=9+8) OptimisticEval/latches/real-contention=true/num-writers=1-16 29.9kB ± 0% 31.3kB ± 0% +4.59% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 94.8kB ± 0% 96.2kB ± 0% +1.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta OptimisticEvalForLocks/real-contention=false-16 68.6 ± 1% 69.6 ± 2% +1.52% (p=0.005 n=9+10) OptimisticEvalForLocks/real-contention=true-16 271 ± 2% 272 ± 2% ~ (p=0.336 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 149 ±17% 117 ±18% -21.00% (p=0.002 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 151 ±16% 126 ±18% -16.31% (p=0.013 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 610 ± 0% 559 ± 1% -8.32% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 1.12k ± 5% 1.19k ±12% ~ (p=0.053 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 162 ± 0% 91 ± 0% -43.83% (p=0.001 n=8+9) OptimisticEval/latches/real-contention=false/num-writers=4-16 445 ± 0% 96 ± 0% -78.44% (p=0.000 n=9+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 163 ± 0% 184 ± 0% +13.07% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 445 ± 0% 467 ± 0% +4.94% (p=0.000 n=10+10) Fixes cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.

66059: concurrency,kvserver: limited scans do optimistic latching r=sumeerbhola a=sumeerbhola Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Numbers for the new OptimisticEval benchmark show improvement when real-contention=false, compared to master which only had optimistic locking. There is a modest slowdown in the real-contention=true case since every optimistic read has to try twice. The benchmark has concurrent writes of two different kinds: latches represents 1PC writers, so no intents, while latches-and-locks represent writers that will leave intents. For the latter, when the optimistic read finds an intent during evaluation it cannot necessarily add it as a discovered lock, if there is also a conflicting latch (since it could be racing with intent resolution). This case is rare in these benchmarks (latches-and-locks/real-contention=true): 13% found an intent and had a conflicting latch when num-writers=1 and < 1% for the same case when num-writers=4. The remainder (the common case) was to find a conflict when looking for conflicting latches/locks using CheckOptimisticNoConflicts. ``` name old time/op new time/op delta OptimisticEvalForLocks/real-contention=false-16 28.2µs ± 4% 28.5µs ± 5% ~ (p=0.393 n=10+10) OptimisticEvalForLocks/real-contention=true-16 5.52ms ± 1% 5.52ms ± 1% ~ (p=0.796 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 220µs ±62% 89µs ±38% -59.37% (p=0.000 n=10+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 213µs ±65% 155µs ±82% -27.33% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 1.33ms ± 3% 1.27ms ±16% ~ (p=0.829 n=8+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 2.02ms ±10% 2.25ms ± 9% +11.31% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 175µs ± 2% 45µs ± 5% -74.05% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 613µs ± 2% 44µs ± 3% -92.74% (p=0.000 n=10+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 181µs ± 4% 179µs ± 5% ~ (p=0.315 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 624µs ± 3% 620µs ± 3% ~ (p=0.247 n=10+10) name old alloc/op new alloc/op delta OptimisticEvalForLocks/real-contention=false-16 8.40kB ± 7% 8.33kB ± 3% ~ (p=1.000 n=10+8) OptimisticEvalForLocks/real-contention=true-16 31.8kB ± 7% 32.6kB ± 9% ~ (p=0.382 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 23.9kB ±21% 17.8kB ±25% -25.55% (p=0.003 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 24.1kB ±20% 19.4kB ±22% -19.56% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 104kB ± 1% 101kB ± 1% -2.89% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 197kB ± 6% 217kB ±11% +10.19% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 29.9kB ± 0% 13.4kB ± 1% -55.07% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 94.9kB ± 0% 14.5kB ± 1% -84.74% (p=0.000 n=9+8) OptimisticEval/latches/real-contention=true/num-writers=1-16 29.9kB ± 0% 31.3kB ± 0% +4.59% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 94.8kB ± 0% 96.2kB ± 0% +1.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta OptimisticEvalForLocks/real-contention=false-16 68.6 ± 1% 69.6 ± 2% +1.52% (p=0.005 n=9+10) OptimisticEvalForLocks/real-contention=true-16 271 ± 2% 272 ± 2% ~ (p=0.336 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 149 ±17% 117 ±18% -21.00% (p=0.002 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 151 ±16% 126 ±18% -16.31% (p=0.013 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 610 ± 0% 559 ± 1% -8.32% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 1.12k ± 5% 1.19k ±12% ~ (p=0.053 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 162 ± 0% 91 ± 0% -43.83% (p=0.001 n=8+9) OptimisticEval/latches/real-contention=false/num-writers=4-16 445 ± 0% 96 ± 0% -78.44% (p=0.000 n=9+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 163 ± 0% 184 ± 0% +13.07% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 445 ± 0% 467 ± 0% +4.94% (p=0.000 n=10+10) ``` Fixes #9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false. Co-authored-by: sumeerbhola <[email protected]>

andreimatei mentioned this issue Oct 4, 2016

sql: slow down schema change backfill #9617

Closed

petermattis added this to the Later milestone Feb 22, 2017

bdarnell added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-transactions Relating to MVCC and the transactional model. labels Apr 26, 2018

petermattis removed this from the Later milestone Oct 5, 2018

nvanbenschoten added A-kv-transactions Relating to MVCC and the transactional model. and removed A-coreperf labels Oct 12, 2018

petermattis assigned spencerkimball Oct 26, 2018

nvanbenschoten mentioned this issue Dec 27, 2018

storage: evaluate limited scans optimistically without latching #33373

Closed

sumeerbhola mentioned this issue Jun 3, 2021

concurrency,kvserver: limited scans do optimistic latching #66059

Merged

jlinder added the T-kv KV Team label Jun 16, 2021

craig bot closed this as completed in 4321fa8 Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: Scans with limit acquire excessively large latches #9521

kv: Scans with limit acquire excessively large latches #9521

bdarnell commented Sep 24, 2016

petermattis commented Sep 24, 2016

tamird commented Sep 24, 2016

bdarnell commented Sep 24, 2016

andreimatei commented Sep 24, 2016

tamird commented Sep 26, 2016

tbg commented Sep 26, 2016

andreimatei commented Sep 26, 2016

tbg commented Sep 26, 2016

petermattis commented Feb 2, 2017

kv: Scans with limit acquire excessively large latches #9521

kv: Scans with limit acquire excessively large latches #9521

Comments

bdarnell commented Sep 24, 2016

petermattis commented Sep 24, 2016

tamird commented Sep 24, 2016

bdarnell commented Sep 24, 2016

andreimatei commented Sep 24, 2016

tamird commented Sep 26, 2016

tbg commented Sep 26, 2016

andreimatei commented Sep 26, 2016

tbg commented Sep 26, 2016

petermattis commented Feb 2, 2017