Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: Scans with limit acquire excessively large latches #9521

Closed
bdarnell opened this issue Sep 24, 2016 · 10 comments · Fixed by #66059
Closed

kv: Scans with limit acquire excessively large latches #9521

bdarnell opened this issue Sep 24, 2016 · 10 comments · Fixed by #66059
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv KV Team

Comments

@bdarnell
Copy link
Contributor

A query like SELECT * FROM t WHERE indexed_column > $1 LIMIT 1 has a very large key Span (from key $1 to the end of the table or the end of the range, whichever comes first), but really only depends on a small amount of data (from $1 to the first key greater than that value). The command queue only sees the former, so this query must wait behind any updates to any other rows in the table, not just the rows that it will eventually return.

We could minimize this contention at the expense of throughput as follows. For read-only commands with (small) limits, execute the command first, before putting it in the command queue. If it reaches its limit, narrow the span based on the keys that were actually touched. Put it in the command queue under the narrowed span. After waiting on the command queue, execute it again. If it doesn't hit the limit while staying inside the narrowed span, something has changed out from under us and we have to re-queue with a broader span.

There is probably a way to be clever and avoid the double execution in the common case, e.g. if the command queue allows the narrowed span to execute immediately we can use the results of the first execution.

This is a major cause of slowness for photos (#9247). For example, this trace spends 100ms in the command queue on its first scan. There is probably a related problem with the timestamp cache, but I haven't confirmed its existence or impact yet.

@petermattis
Copy link
Collaborator

That query seems to be an artifact of photos, not of the load photos is intended to model. photos uses that query to find a random user-id to to list the photos of but in a real app the user-id would be known.

@tamird
Copy link
Contributor

tamird commented Sep 24, 2016

Since we already have a pagination mechanism via resume_span, could we
instead insert into the command queue the maximum span that doesn't overlap
with other, already-running commands? Then we'd paginate as usual if
another read is required to meet the limit.

On Sat, Sep 24, 2016 at 7:31 AM, Peter Mattis [email protected]
wrote:

That query seems to be an artifact of photos, not of the load photos is
intended to model. photos uses that query to find a random user-id to to
list the photos of but in a real app the user-id would be known.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#9521 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABdsPJSTFpOfsEDKMesoT-ab1Z4tPf0vks5qtQoXgaJpZM4KFjKu
.

@bdarnell
Copy link
Contributor Author

@petermattis Yes, this query is not a part of the load photos is intended to model. However, it is a useful enough pattern that it's worth optimizing (and we should probably come up with another load generator (or transaction for photos) that exercises this.

@tamird Ah, that's a nice idea. I think there's some concern on the SQL side about what happens if we serve partial results when the client didn't ask for it, but in general returning partial results instead of blocking in the command queue seems like a good idea.

@andreimatei
Copy link
Contributor

@tamird this pagination trick would help with this scan command not waiting for others before it, but wouldn't help with the scan blocking other commands that come after it (at least in the case the command queue is empty to begin with), right?

I like the idea of attempting an inconsistent read first to get a clue about the required span. @bdarnell, seems to me that double execution can be avoided easily in the uncontended case if the command queue (or the replica otherwise) would maintain sequence numbers for incoming commands (or the spans of recent commands) and so it could assert that no overlapping command went through while we were doing the inconsistent read. Or, actually, can't the timestamp cache (which seems to contain writes too) serve this purpose?

@tamird
Copy link
Contributor

tamird commented Sep 26, 2016

@andreimatei I think that's an orthogonal problem. In the case we're discussing here, the ... LIMIT 1 query results in a very large span, and the corresponding wait on entering the command queue is very long. However, once the command executes, it is likely to finish very quickly (because it is limited to one row).

That said, I agree that we have a general granularity problem in the command queue (scans shouldn't hold up commands in their "wake"), but it doesn't seem immediately relevant here.

@tbg
Copy link
Member

tbg commented Sep 26, 2016

We have this granularity problem on various levels. @vivekmenezes touches on something similar in #9419, but in the realm of intent resolution, where the problem is pretty dramatic given that the current code churns through a lot of data for nothing.

With the above query, we'll "always" end up waiting on at least a few Raft commands, which is also pretty drastic. I like @bdarnell's suggestion: we execute the read, check for getOverlaps in the command queue for the resulting span, and if there aren't any we get to use our previous result. If there is something outstanding, we bite the bullet and grab the whole span plus re-execute when it's our turn. Optimizing that second case further could be useful at some point too, but it's more complicated and less frequently useful.

@andreimatei
Copy link
Contributor

Umm can we just call getOverlaps and return the result? What if there was a quick write in there that finished by the time we call getOverlaps? Don't we need to either use the timestamp cache, or redo the read, this time going through the CommandQueue (with the small span), and then possibly re-doing it again if the consistent read failed to satisfy our limit this time?

@tbg
Copy link
Member

tbg commented Sep 26, 2016

You're right, the read would need to be done while holding the lock. That is probably too expensive so maybe we're going to have to be real smart™ right from the get-go, unfortunately.

@petermattis
Copy link
Collaborator

#13349 truncates the span added for scans when possible.

@petermattis petermattis added this to the Later milestone Feb 22, 2017
@bdarnell bdarnell added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-transactions Relating to MVCC and the transactional model. labels Apr 26, 2018
@tbg tbg added A-coreperf and removed A-disaster-recovery A-kv-transactions Relating to MVCC and the transactional model. A-kv-distribution Relating to rebalancing and leasing. A-kv-client Relating to the KV client and the KV interface. A-storage Relating to our storage engine (Pebble) on-disk storage. A-kv-replication Relating to Raft, consensus, and coordination. labels Jul 31, 2018
@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@nvanbenschoten nvanbenschoten added A-kv-transactions Relating to MVCC and the transactional model. and removed A-coreperf labels Oct 12, 2018
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Dec 27, 2018
Fixes cockroachdb#9521.
Supersedes cockroachdb#31904.

SQL has a tendency to create scans which cover a range's entire key
span, though looking only to return a finite number of results. These
requests end up blocking on writes that are holding latches over keys
that the scan will not actually touch. In reality, when there is a
scan with a limit, the actual affected key span ends up being a small
subset of the range's key span.

This change creates a new "optimistic evaluation" path for read-only
requests. When evaluating optimistically, the batch will sequence itself
with the latch manager, but will not wait to acquire all of its latches.
Instead, it begins evaluating immediately and verifies that it would not
have needed to wait on any latch acquisitions after-the-fact. When
performing this verification, it uses knowledge about the limited set
of keys that the scan actually looked at. If there are no conflicts, the
scan succeeds. If there are conflicts, the request waits for all of its
latch acquisition attempts to finish and re-evaluates.

This PR replaces cockroachdb#31904. The major difference between the two is that
this PR exploits the structure of the latch manager to efficiently
perform optimistic latch acquisition and after-the-fact verification
of conflicts. Doing this requires keeping no extra state because it
uses the immutable snapshots that the latch manager now captures during
sequencing. The other major difference is that this PR does not release
latches after a failed optimistic evaluation.

NOTE: a prevalent theory of the pathological case with this behavior
was that overestimated read latches would serialize with write latches,
causing all requests on a range to serialize. I wasn't seeing this
in practice. It turns out that the "timestamp awareness" in the
latch manager should avoid this behavior in most cases because later
writes will have higher timestamps than earlier reads. The effect of
this is that they won't be considered to interfere by the latch manager.
Still, large clusters with a high amount of clock skew could see a
bounded variant of this situation.

_### Benchmark Results

```
name                                   old ops/sec  new ops/sec  delta
kv95/cores=16/nodes=3/splits=3          51.9k ± 0%   51.7k ± 1%     ~     (p=0.400 n=3+3)
kvS70-L1/cores=16/nodes=3/splits=3      24.1k ± 4%   27.7k ± 1%  +14.75%  (p=0.100 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3      24.5k ± 1%   27.5k ± 1%  +12.08%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3   16.0k ± 1%   16.6k ± 2%   +3.79%  (p=0.100 n=3+3)

name                                   old p50(ms)  new p50(ms)  delta
kv95/cores=16/nodes=3/splits=3           0.70 ± 0%    0.70 ± 0%     ~     (all equal)
kvS70-L1/cores=16/nodes=3/splits=3       1.07 ± 6%    0.90 ± 0%  -15.62%  (p=0.100 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3       1.10 ± 0%    0.90 ± 0%  -18.18%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3    1.80 ± 0%    1.67 ± 4%   -7.41%  (p=0.100 n=3+3)

name                                   old p99(ms)  new p99(ms)  delta
kv95/cores=16/nodes=3/splits=3           1.80 ± 0%    1.80 ± 0%     ~     (all equal)
kvS70-L1/cores=16/nodes=3/splits=3       5.77 ±32%    4.70 ± 0%     ~     (p=0.400 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3       5.00 ± 0%    4.70 ± 0%   -6.00%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3    6.90 ± 3%    7.33 ± 8%     ~     (p=0.400 n=3+3)
```

_S<num> = --span-percent=<num>, L<num> = --span-limit=<num>_

Release note (performance improvement): improved performance on workloads
which mix OLAP queries with inserts and updates.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 13, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 15, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue May 12, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)

Microbenchmark numbers for the OptimisticEval microbenchmark,
where the real-contention=true case typically causes optimistic
evaluation to retry.

name                                     old time/op    new time/op    delta
OptimisticEval/real-contention=false-16    5.76ms ± 5%    0.03ms ± 2%  -99.49%  (p=0.000 n=8+10)
OptimisticEval/real-contention=true-16     5.75ms ± 4%    5.63ms ± 5%     ~     (p=0.393 n=10+10)

name                                     old alloc/op   new alloc/op   delta
OptimisticEval/real-contention=false-16    34.3kB ±24%     9.9kB ± 2%  -71.29%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16     33.0kB ±20%    35.7kB ± 9%     ~     (p=0.065 n=8+8)

name                                     old allocs/op  new allocs/op  delta
OptimisticEval/real-contention=false-16       273 ±19%        83 ± 0%  -69.56%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16        268 ± 2%       308 ± 2%  +14.90%  (p=0.000 n=8+8)

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting locks in an optimistic manner, which means it will
not conflict with locks (typically unreplicated locks) that were
held in the scan's full spans, but were not in the spans that were
scanned until the limit was reached. This behavior can be turned
off by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans to false.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue May 18, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)

Microbenchmark numbers for the OptimisticEval microbenchmark,
where the real-contention=true case typically causes optimistic
evaluation to retry.

name                                     old time/op    new time/op    delta
OptimisticEval/real-contention=false-16    5.76ms ± 5%    0.03ms ± 2%  -99.49%  (p=0.000 n=8+10)
OptimisticEval/real-contention=true-16     5.75ms ± 4%    5.63ms ± 5%     ~     (p=0.393 n=10+10)

name                                     old alloc/op   new alloc/op   delta
OptimisticEval/real-contention=false-16    34.3kB ±24%     9.9kB ± 2%  -71.29%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16     33.0kB ±20%    35.7kB ± 9%     ~     (p=0.065 n=8+8)

name                                     old allocs/op  new allocs/op  delta
OptimisticEval/real-contention=false-16       273 ±19%        83 ± 0%  -69.56%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16        268 ± 2%       308 ± 2%  +14.90%  (p=0.000 n=8+8)

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting locks in an optimistic manner, which means it will
not conflict with locks (typically unreplicated locks) that were
held in the scan's full spans, but were not in the spans that were
scanned until the limit was reached. This behavior can be turned
off by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue May 20, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)

Microbenchmark numbers for the OptimisticEval microbenchmark,
where the real-contention=true case typically causes optimistic
evaluation to retry.

name                                     old time/op    new time/op    delta
OptimisticEval/real-contention=false-16    5.76ms ± 5%    0.03ms ± 2%  -99.49%  (p=0.000 n=8+10)
OptimisticEval/real-contention=true-16     5.75ms ± 4%    5.63ms ± 5%     ~     (p=0.393 n=10+10)

name                                     old alloc/op   new alloc/op   delta
OptimisticEval/real-contention=false-16    34.3kB ±24%     9.9kB ± 2%  -71.29%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16     33.0kB ±20%    35.7kB ± 9%     ~     (p=0.065 n=8+8)

name                                     old allocs/op  new allocs/op  delta
OptimisticEval/real-contention=false-16       273 ±19%        83 ± 0%  -69.56%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16        268 ± 2%       308 ± 2%  +14.90%  (p=0.000 n=8+8)

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting locks in an optimistic manner, which means it will
not conflict with locks (typically unreplicated locks) that were
held in the scan's full spans, but were not in the spans that were
scanned until the limit was reached. This behavior can be turned
off by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.
craig bot pushed a commit that referenced this issue May 20, 2021
58670: concurrency,kvserver: limited scans optimistically check for locks r=sumeerbhola a=sumeerbhola

This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
```
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)
```

Microbenchmark numbers for the OptimisticEval microbenchmark,
where the real-contention=true case typically causes optimistic
evaluation to retry.
```
name                                     old time/op    new time/op    delta
OptimisticEval/real-contention=false-16    5.76ms ± 5%    0.03ms ± 2%  -99.49%  (p=0.000 n=8+10)
OptimisticEval/real-contention=true-16     5.75ms ± 4%    5.63ms ± 5%     ~     (p=0.393 n=10+10)

name                                     old alloc/op   new alloc/op   delta
OptimisticEval/real-contention=false-16    34.3kB ±24%     9.9kB ± 2%  -71.29%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16     33.0kB ±20%    35.7kB ± 9%     ~     (p=0.065 n=8+8)

name                                     old allocs/op  new allocs/op  delta
OptimisticEval/real-contention=false-16       273 ±19%        83 ± 0%  -69.56%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16        268 ± 2%       308 ± 2%  +14.90%  (p=0.000 n=8+8)
```

Fixes #49973
Informs #9521

Release note (performance improvement): A limited scan now checks
for conflicting locks in an optimistic manner, which means it will
not conflict with locks (typically unreplicated locks) that were
held in the scan's full spans, but were not in the spans that were
scanned until the limit was reached. This behavior can be turned
off by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.


Co-authored-by: sumeerbhola <[email protected]>
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jun 3, 2021
Latches for the full spans get inserted up front in the
spanlatch.Manager, and the conflict checking happens after
evaluation, only over the spans that were actually read.
If there is a conflict, the existing inserted latches are waited
on, and execution switches to pessimistic latching and locking.
The existing cluster setting for optimistic locking is used to
gate this behavior.

Fixes cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting latches in an optimistic manner, which means it will
not conflict with latches that were held in the scan's full spans,
but were not in the spans that were scanned until the limit was
reached. This behavior can be turned off (along with optimistic
locking) by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jun 4, 2021
Latches for the full spans get inserted up front in the
spanlatch.Manager, and the conflict checking happens after
evaluation, only over the spans that were actually read.
If there is a conflict, the existing inserted latches are waited
on, and execution switches to pessimistic latching and locking.
The existing cluster setting for optimistic locking is used to
gate this behavior.

Fixes cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting latches in an optimistic manner, which means it will
not conflict with latches that were held in the scan's full spans,
but were not in the spans that were scanned until the limit was
reached. This behavior can be turned off (along with optimistic
locking) by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jun 8, 2021
Latches for the full spans get inserted up front in the
spanlatch.Manager, and the conflict checking happens after
evaluation, only over the spans that were actually read.
If there is a conflict, the existing inserted latches are waited
on, and execution switches to pessimistic latching and locking.
The existing cluster setting for optimistic locking is used to
gate this behavior.

Microbenchmark numbers:
name                                                             old time/op    new time/op    delta
OptimisticEvalForLocks/real-contention=false-16                    28.8µs ± 6%    29.4µs ± 5%     ~     (p=0.089 n=10+10)
OptimisticEvalForLocks/real-contention=true-16                     5.52ms ± 1%    5.54ms ± 1%     ~     (p=0.123 n=10+10)
OptimisticEvalForLatches/real-contention=false/num-writers=1-16    63.1µs ± 9%    32.7µs ± 4%  -48.21%  (p=0.000 n=10+10)
OptimisticEvalForLatches/real-contention=false/num-writers=4-16    66.8µs ± 3%    34.4µs ± 5%  -48.51%  (p=0.000 n=10+10)
OptimisticEvalForLatches/real-contention=false/num-writers=8-16    69.4µs ± 1%    35.7µs ± 2%  -48.47%  (p=0.000 n=10+10)
OptimisticEvalForLatches/real-contention=true/num-writers=1-16      127µs ± 5%     112µs ± 2%  -11.68%  (p=0.000 n=10+10)
OptimisticEvalForLatches/real-contention=true/num-writers=4-16      128µs ± 5%     112µs ± 2%  -12.38%  (p=0.000 n=9+10)
OptimisticEvalForLatches/real-contention=true/num-writers=8-16      140µs ± 2%     115µs ± 2%  -17.56%  (p=0.000 n=9+7)

name                                                             old alloc/op   new alloc/op   delta
OptimisticEvalForLocks/real-contention=false-16                    8.29kB ± 3%    8.38kB ± 5%     ~     (p=0.367 n=9+10)
OptimisticEvalForLocks/real-contention=true-16                     32.2kB ±10%   42.2kB ±108%     ~     (p=0.321 n=8+9)
OptimisticEvalForLatches/real-contention=false/num-writers=1-16    16.6kB ± 4%    13.1kB ± 2%  -21.07%  (p=0.000 n=9+9)
OptimisticEvalForLatches/real-contention=false/num-writers=4-16    16.6kB ± 7%    12.9kB ± 1%  -22.23%  (p=0.000 n=9+9)
OptimisticEvalForLatches/real-contention=false/num-writers=8-16    16.3kB ± 0%    12.7kB ± 1%  -22.20%  (p=0.000 n=9+9)
OptimisticEvalForLatches/real-contention=true/num-writers=1-16     26.8kB ± 0%    26.8kB ± 0%   +0.14%  (p=0.049 n=9+9)
OptimisticEvalForLatches/real-contention=true/num-writers=4-16     26.8kB ± 0%    26.8kB ± 0%   +0.12%  (p=0.027 n=8+9)
OptimisticEvalForLatches/real-contention=true/num-writers=8-16     26.8kB ± 0%    26.8kB ± 0%     ~     (p=1.000 n=10+8)

name                                                             old allocs/op  new allocs/op  delta
OptimisticEvalForLocks/real-contention=false-16                      68.6 ± 1%      70.0 ± 0%   +2.04%  (p=0.000 n=10+7)
OptimisticEvalForLocks/real-contention=true-16                        272 ± 3%       272 ± 2%     ~     (p=0.777 n=8+8)
OptimisticEvalForLatches/real-contention=false/num-writers=1-16       149 ± 0%       117 ± 1%  -21.41%  (p=0.000 n=8+10)
OptimisticEvalForLatches/real-contention=false/num-writers=4-16       150 ± 0%       116 ± 2%  -22.80%  (p=0.000 n=7+10)
OptimisticEvalForLatches/real-contention=false/num-writers=8-16       150 ± 0%       114 ± 1%  -24.20%  (p=0.000 n=9+10)
OptimisticEvalForLatches/real-contention=true/num-writers=1-16        261 ± 0%       261 ± 0%     ~     (all equal)
OptimisticEvalForLatches/real-contention=true/num-writers=4-16        261 ± 0%       261 ± 0%     ~     (p=0.471 n=9+9)
OptimisticEvalForLatches/real-contention=true/num-writers=8-16        262 ± 0%       261 ± 0%   -0.38%  (p=0.000 n=8+8)

Fixes cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting latches in an optimistic manner, which means it will
not conflict with latches that were held in the scan's full spans,
but were not in the spans that were scanned until the limit was
reached. This behavior can be turned off (along with optimistic
locking) by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.
@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jun 16, 2021
Latches for the full spans get inserted up front in the
spanlatch.Manager, and the conflict checking happens after
evaluation, only over the spans that were actually read.
If there is a conflict, the existing inserted latches are waited
on, and execution switches to pessimistic latching and locking.
The existing cluster setting for optimistic locking is used to
gate this behavior.

Numbers for the new OptimisticEval benchmark show improvement
when real-contention=false, compared to master which only had
optimistic locking. There is a modest slowdown in the
real-contention=true case since every optimistic read has to
try twice. The benchmark has concurrent writes of two different
kinds: latches represents 1PC writers, so no intents, while
latches-and-locks represent writers that will leave intents.

For the latter, when the optimistic read finds an intent during
evaluation it cannot necessarily add it as a discovered lock,
if there is also a conflicting latch (since it could be racing
with intent resolution). This case is rare in these benchmarks
(latches-and-locks/real-contention=true): 13% found an intent
and had a conflicting latch when num-writers=1 and < 1% for
the same case when num-writers=4. The remainder (the common
case) was to find a conflict when looking for conflicting
latches/locks using CheckOptimisticNoConflicts.

name                                                                     old time/op    new time/op    delta
OptimisticEvalForLocks/real-contention=false-16                            28.2µs ± 4%    28.5µs ± 5%     ~     (p=0.393 n=10+10)
OptimisticEvalForLocks/real-contention=true-16                             5.52ms ± 1%    5.52ms ± 1%     ~     (p=0.796 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16     220µs ±62%      89µs ±38%  -59.37%  (p=0.000 n=10+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16     213µs ±65%     155µs ±82%  -27.33%  (p=0.015 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16     1.33ms ± 3%    1.27ms ±16%     ~     (p=0.829 n=8+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16     2.02ms ±10%    2.25ms ± 9%  +11.31%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16               175µs ± 2%      45µs ± 5%  -74.05%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=4-16               613µs ± 2%      44µs ± 3%  -92.74%  (p=0.000 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=1-16                181µs ± 4%     179µs ± 5%     ~     (p=0.315 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16                624µs ± 3%     620µs ± 3%     ~     (p=0.247 n=10+10)

name                                                                     old alloc/op   new alloc/op   delta
OptimisticEvalForLocks/real-contention=false-16                            8.40kB ± 7%    8.33kB ± 3%     ~     (p=1.000 n=10+8)
OptimisticEvalForLocks/real-contention=true-16                             31.8kB ± 7%    32.6kB ± 9%     ~     (p=0.382 n=8+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16    23.9kB ±21%    17.8kB ±25%  -25.55%  (p=0.003 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16    24.1kB ±20%    19.4kB ±22%  -19.56%  (p=0.015 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16      104kB ± 1%     101kB ± 1%   -2.89%  (p=0.000 n=8+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16      197kB ± 6%     217kB ±11%  +10.19%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16              29.9kB ± 0%    13.4kB ± 1%  -55.07%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=4-16              94.9kB ± 0%    14.5kB ± 1%  -84.74%  (p=0.000 n=9+8)
OptimisticEval/latches/real-contention=true/num-writers=1-16               29.9kB ± 0%    31.3kB ± 0%   +4.59%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16               94.8kB ± 0%    96.2kB ± 0%   +1.48%  (p=0.000 n=10+10)

name                                                                     old allocs/op  new allocs/op  delta
OptimisticEvalForLocks/real-contention=false-16                              68.6 ± 1%      69.6 ± 2%   +1.52%  (p=0.005 n=9+10)
OptimisticEvalForLocks/real-contention=true-16                                271 ± 2%       272 ± 2%     ~     (p=0.336 n=8+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16       149 ±17%       117 ±18%  -21.00%  (p=0.002 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16       151 ±16%       126 ±18%  -16.31%  (p=0.013 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16        610 ± 0%       559 ± 1%   -8.32%  (p=0.000 n=8+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16      1.12k ± 5%     1.19k ±12%     ~     (p=0.053 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16                 162 ± 0%        91 ± 0%  -43.83%  (p=0.001 n=8+9)
OptimisticEval/latches/real-contention=false/num-writers=4-16                 445 ± 0%        96 ± 0%  -78.44%  (p=0.000 n=9+9)
OptimisticEval/latches/real-contention=true/num-writers=1-16                  163 ± 0%       184 ± 0%  +13.07%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16                  445 ± 0%       467 ± 0%   +4.94%  (p=0.000 n=10+10)

Fixes cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting latches in an optimistic manner, which means it will
not conflict with latches that were held in the scan's full spans,
but were not in the spans that were scanned until the limit was
reached. This behavior can be turned off (along with optimistic
locking) by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.
craig bot pushed a commit that referenced this issue Jun 22, 2021
66059: concurrency,kvserver: limited scans do optimistic latching r=sumeerbhola a=sumeerbhola

Latches for the full spans get inserted up front in the
spanlatch.Manager, and the conflict checking happens after
evaluation, only over the spans that were actually read.
If there is a conflict, the existing inserted latches are waited
on, and execution switches to pessimistic latching and locking.
The existing cluster setting for optimistic locking is used to
gate this behavior.

Numbers for the new OptimisticEval benchmark show improvement
when real-contention=false, compared to master which only had
optimistic locking. There is a modest slowdown in the
real-contention=true case since every optimistic read has to
try twice. The benchmark has concurrent writes of two different
kinds: latches represents 1PC writers, so no intents, while
latches-and-locks represent writers that will leave intents.

For the latter, when the optimistic read finds an intent during
evaluation it cannot necessarily add it as a discovered lock,
if there is also a conflicting latch (since it could be racing
with intent resolution). This case is rare in these benchmarks
(latches-and-locks/real-contention=true): 13% found an intent
and had a conflicting latch when num-writers=1 and < 1% for
the same case when num-writers=4. The remainder (the common
case) was to find a conflict when looking for conflicting
latches/locks using CheckOptimisticNoConflicts.

```
name                                                                     old time/op    new time/op    delta
OptimisticEvalForLocks/real-contention=false-16                            28.2µs ± 4%    28.5µs ± 5%     ~     (p=0.393 n=10+10)
OptimisticEvalForLocks/real-contention=true-16                             5.52ms ± 1%    5.52ms ± 1%     ~     (p=0.796 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16     220µs ±62%      89µs ±38%  -59.37%  (p=0.000 n=10+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16     213µs ±65%     155µs ±82%  -27.33%  (p=0.015 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16     1.33ms ± 3%    1.27ms ±16%     ~     (p=0.829 n=8+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16     2.02ms ±10%    2.25ms ± 9%  +11.31%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16               175µs ± 2%      45µs ± 5%  -74.05%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=4-16               613µs ± 2%      44µs ± 3%  -92.74%  (p=0.000 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=1-16                181µs ± 4%     179µs ± 5%     ~     (p=0.315 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16                624µs ± 3%     620µs ± 3%     ~     (p=0.247 n=10+10)

name                                                                     old alloc/op   new alloc/op   delta
OptimisticEvalForLocks/real-contention=false-16                            8.40kB ± 7%    8.33kB ± 3%     ~     (p=1.000 n=10+8)
OptimisticEvalForLocks/real-contention=true-16                             31.8kB ± 7%    32.6kB ± 9%     ~     (p=0.382 n=8+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16    23.9kB ±21%    17.8kB ±25%  -25.55%  (p=0.003 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16    24.1kB ±20%    19.4kB ±22%  -19.56%  (p=0.015 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16      104kB ± 1%     101kB ± 1%   -2.89%  (p=0.000 n=8+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16      197kB ± 6%     217kB ±11%  +10.19%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16              29.9kB ± 0%    13.4kB ± 1%  -55.07%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=4-16              94.9kB ± 0%    14.5kB ± 1%  -84.74%  (p=0.000 n=9+8)
OptimisticEval/latches/real-contention=true/num-writers=1-16               29.9kB ± 0%    31.3kB ± 0%   +4.59%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16               94.8kB ± 0%    96.2kB ± 0%   +1.48%  (p=0.000 n=10+10)

name                                                                     old allocs/op  new allocs/op  delta
OptimisticEvalForLocks/real-contention=false-16                              68.6 ± 1%      69.6 ± 2%   +1.52%  (p=0.005 n=9+10)
OptimisticEvalForLocks/real-contention=true-16                                271 ± 2%       272 ± 2%     ~     (p=0.336 n=8+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16       149 ±17%       117 ±18%  -21.00%  (p=0.002 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16       151 ±16%       126 ±18%  -16.31%  (p=0.013 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16        610 ± 0%       559 ± 1%   -8.32%  (p=0.000 n=8+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16      1.12k ± 5%     1.19k ±12%     ~     (p=0.053 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16                 162 ± 0%        91 ± 0%  -43.83%  (p=0.001 n=8+9)
OptimisticEval/latches/real-contention=false/num-writers=4-16                 445 ± 0%        96 ± 0%  -78.44%  (p=0.000 n=9+9)
OptimisticEval/latches/real-contention=true/num-writers=1-16                  163 ± 0%       184 ± 0%  +13.07%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16                  445 ± 0%       467 ± 0%   +4.94%  (p=0.000 n=10+10)
```

Fixes #9521

Release note (performance improvement): A limited scan now checks
for conflicting latches in an optimistic manner, which means it will
not conflict with latches that were held in the scan's full spans,
but were not in the spans that were scanned until the limit was
reached. This behavior can be turned off (along with optimistic
locking) by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.

Co-authored-by: sumeerbhola <[email protected]>
@craig craig bot closed this as completed in 4321fa8 Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv KV Team
Projects
None yet
8 participants