Skip to content

Commit

Permalink
storage: evaluate limited scans optimistically without latching
Browse files Browse the repository at this point in the history
Fixes #9521.
Supersedes #31904.

SQL has a tendency to create scans which cover a range's entire key
span, though looking only to return a finite number of results. These
requests end up blocking on writes that are holding latches over keys
that the scan will not actually touch. In reality, when there is a
scan with a limit, the actual affected key span ends up being a small
subset of the range's key span.

This change creates a new "optimistic evaluation" path for read-only
requests. When evaluating optimistically, the batch will sequence itself
with the latch manager, but will not wait to acquire all of its latches.
Instead, it begins evaluating immediately and verifies that it would not
have needed to wait on any latch acquisitions after-the-fact. When
performing this verification, it uses knowledge about the limited set
of keys that the scan actually looked at. If there are no conflicts, the
scan succeeds. If there are conflicts, the request waits for all of its
latch acquisition attempts to finish and re-evaluates.

This PR replaces #31904. The major difference between the two is that
this PR exploits the structure of the latch manager to efficiently
perform optimistic latch acquisition and after-the-fact verification
of conflicts. Doing this requires keeping no extra state because it
uses the immutable snapshots that the latch manager now captures during
sequencing. The other major difference is that this PR does not release
latches after a failed optimistic evaluation.

NOTE: a prevalent theory of the pathological case with this behavior
was that overestimated read latches would serialize with write latches,
causing all requests on a range to serialize. I wasn't seeing this
in practice. It turns out that the "timestamp awareness" in the
latch manager should avoid this behavior in most cases because later
writes will have higher timestamps than earlier reads. The effect of
this is that they won't be considered to interfere by the latch manager.
Still, large clusters with a high amount of clock skew could see a
bounded variant of this situation.

_### Benchmark Results

```
name                                   old ops/sec  new ops/sec  delta
kv95/cores=16/nodes=3/splits=3          51.9k ± 0%   51.7k ± 1%     ~     (p=0.400 n=3+3)
kvS70-L1/cores=16/nodes=3/splits=3      24.1k ± 4%   27.7k ± 1%  +14.75%  (p=0.100 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3      24.5k ± 1%   27.5k ± 1%  +12.08%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3   16.0k ± 1%   16.6k ± 2%   +3.79%  (p=0.100 n=3+3)

name                                   old p50(ms)  new p50(ms)  delta
kv95/cores=16/nodes=3/splits=3           0.70 ± 0%    0.70 ± 0%     ~     (all equal)
kvS70-L1/cores=16/nodes=3/splits=3       1.07 ± 6%    0.90 ± 0%  -15.62%  (p=0.100 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3       1.10 ± 0%    0.90 ± 0%  -18.18%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3    1.80 ± 0%    1.67 ± 4%   -7.41%  (p=0.100 n=3+3)

name                                   old p99(ms)  new p99(ms)  delta
kv95/cores=16/nodes=3/splits=3           1.80 ± 0%    1.80 ± 0%     ~     (all equal)
kvS70-L1/cores=16/nodes=3/splits=3       5.77 ±32%    4.70 ± 0%     ~     (p=0.400 n=3+3)
kvS70-L5/cores=16/nodes=3/splits=3       5.00 ± 0%    4.70 ± 0%   -6.00%  (p=0.100 n=3+3)
kvS70-L1000/cores=16/nodes=3/splits=3    6.90 ± 3%    7.33 ± 8%     ~     (p=0.400 n=3+3)
```

_S<num> = --span-percent=<num>, L<num> = --span-limit=<num>_

Release note (performance improvement): improved performance on workloads
which mix OLAP queries with inserts and updates.
  • Loading branch information
nvanbenschoten committed Dec 27, 2018
1 parent b48126f commit 91c2060
Show file tree
Hide file tree
Showing 5 changed files with 419 additions and 70 deletions.
2 changes: 1 addition & 1 deletion pkg/storage/batcheval/command.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import (

// A Command is the implementation of a single request within a BatchRequest.
type Command struct {
// DeclareKeys adds all keys this command touches to the given spanSet.
// DeclareKeys adds all keys this command touches to the given SpanSet.
DeclareKeys func(roachpb.RangeDescriptor, roachpb.Header, roachpb.Request, *spanset.SpanSet)

// Eval evaluates a command on the given engine. It should populate
Expand Down
Loading

0 comments on commit 91c2060

Please sign in to comment.