Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: MVCCGarbageCollect is slow in v22.2 compared to v22.1 #87277

Closed
aliher1911 opened this issue Sep 1, 2022 · 14 comments
Closed

storage: MVCCGarbageCollect is slow in v22.2 compared to v22.1 #87277

aliher1911 opened this issue Sep 1, 2022 · 14 comments
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-performance Perf of queries or internals. Solution not expected to change functional behavior. GA-blocker T-storage Storage Team

Comments

@aliher1911
Copy link
Contributor

aliher1911 commented Sep 1, 2022

Running BenchmarkMVCCGarbageCollect (+modified) to include MVCCStats to make it more comparable to real runs:

22.1 (with stats update)

BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=2/deleteVersions=1-16            226299              4816 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1-16         141714              7635 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=16-16         96882             11860 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=32-16         71068             16025 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=512-16         8449            142570 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1015-16        3842            275308 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1023-16        4149            274660 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=2/deleteVersions=1-16            746           1566034 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1-16         212           5182729 ns/op
>BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16-16         84          12559669 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=32-16         63          17568397 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=512-16         6         190913206 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1015-16                        3         351730931 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1023-16                        3         343130885 ns/op

22.1 (without stats update)

BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=2/deleteVersions=1-16            228370              4773 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1-16         141380              7663 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=16-16        101109             11101 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=32-16         77886             14664 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=512-16         9637            125238 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1015-16        4796            237608 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1023-16        4208            242834 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=2/deleteVersions=1-16            774           1514281 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1-16         220           5165355 ns/op
>BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16-16         98          12588087 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=32-16         66          16583878 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=512-16         6         201204474 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1015-16                        4         367584725 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1023-16                        3         393604358 ns/op

Master (with stats)

BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=2/deleteVersions=1/numRangeTs=0/updateStats=true-16                273270          4243 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1/numRangeTs=0/updateStats=true-16             157464          7689 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=16/numRangeTs=0/updateStats=true-16             96703         12978 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=32/numRangeTs=0/updateStats=true-16             65997         17460 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=512/numRangeTs=0/updateStats=true-16             6937        174425 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1015/numRangeTs=0/updateStats=true-16            3046        336960 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1023/numRangeTs=0/updateStats=true-16            3135        339966 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=2/deleteVersions=1/numRangeTs=0/updateStats=true-16                717       1647826 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1/numRangeTs=0/updateStats=true-16              64      17664351 ns/op
>BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16/numRangeTs=0/updateStats=true-16              5     207204312 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=32/numRangeTs=0/updateStats=true-16              3     397054718 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=512/numRangeTs=0/updateStats=true-16             1    7044993137 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1015/numRangeTs=0/updateStats=true-16                     1    14448873505 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1023/numRangeTs=0/updateStats=true-16                     3     403105760 ns/op

Master (without stats)

BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=2/deleteVersions=1/numRangeTs=0/updateStats=false-16               287023          4198 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1/numRangeTs=0/updateStats=false-16            152658          7607 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=16/numRangeTs=0/updateStats=false-16           102878         11380 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=32/numRangeTs=0/updateStats=false-16            74265         16096 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=512/numRangeTs=0/updateStats=false-16            7830        151733 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1015/numRangeTs=0/updateStats=false-16           3432        297439 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1023/numRangeTs=0/updateStats=false-16           3717        287882 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=2/deleteVersions=1/numRangeTs=0/updateStats=false-16               703       1584928 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1/numRangeTs=0/updateStats=false-16             67      17417538 ns/op
>BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16/numRangeTs=0/updateStats=false-16             5     201719601 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=32/numRangeTs=0/updateStats=false-16             3     397685799 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=512/numRangeTs=0/updateStats=false-16            1    7004730468 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1015/numRangeTs=0/updateStats=false-16                    1    14312205532 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1023/numRangeTs=0/updateStats=false-16                    3     351458143 ns/op

If we compare highlighted (with >) case:

BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16-16         84          12559669 ns/op
BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16/numRangeTs=0/updateStats=true-16              5     207204312 ns/op

we see a 16x degradation which looks scary.

Flame graph of reruning only this test:
(running 22.1 release branch with just enginepb.MVCCStats pointer added to GC invocation)

go test ./pkg/storage/. -bench=BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16 -run=XXXXX --test.cpuprofile mvcc-gc-cpu-release-good.pprof -v

Screenshot 2022-09-01 at 16 14 04

And on master:
(with now extra options for range tombstones, but they are disabled)

go test ./pkg/storage/. -bench=BenchmarkMVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16/numRangeTs=0/updateStats=true -run=XXXXX --test.cpuprofile mvcc-gc-cpu-master-bad.pprof -v

Screenshot 2022-09-01 at 16 14 16

Jira issue: CRDB-19249

Epic CRDB-2624

@aliher1911 aliher1911 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels Sep 1, 2022
@aliher1911 aliher1911 self-assigned this Sep 1, 2022
@blathers-crl
Copy link

blathers-crl bot commented Sep 1, 2022

cc @cockroachdb/replication

@aliher1911
Copy link
Contributor Author

For reference - this is a change to benchmark on master to add stats and range tombstone used to test #87261

@erikgrinaker erikgrinaker added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 5, 2022
@erikgrinaker erikgrinaker changed the title MVCCGarbageCollect is slow in v22.2.-1 compared to v22.1 storage: MVCCGarbageCollect is slow in v22.2.-1 compared to v22.1 Sep 5, 2022
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Sep 5, 2022

Thanks for running these benchmarks! This seems fairly pathological, so we'll definitely need to address this for 22.2. Seems likely to be an issue in Pebble.

Maybe it'd be worthwhile to come up with a minimal repro benchmark for the Pebble team to investigate?

Btw, for the future, we usually use benchstat to compare benchmark results, and do 5-10 runs of each benchmark to account for variance. It's also a good idea to stop any other processes on the system, and do another run first to warm up the OS/CPU for the workload -- I've seen this affect benchmarks as much as 10%. Something like:

$ make bench PKG=./pkg/storage BENCHES='BenchmarkMVCCGarbageCollect' TESTFLAGS="-v --count 10" | grep Benchmark >bench-22.1.txt
$ make bench PKG=./pkg/storage BENCHES='BenchmarkMVCCGarbageCollect' TESTFLAGS="-v --count 10" | grep Benchmark >bench-22.1.txt
$ git checkout master
$ make bench PKG=./pkg/storage BENCHES='BenchmarkMVCCGarbageCollect' TESTFLAGS="-v --count 10" | grep Benchmark >bench-master.txt
$ make bench PKG=./pkg/storage BENCHES='BenchmarkMVCCGarbageCollect' TESTFLAGS="-v --count 10" | grep Benchmark >bench-master.txt
$ benchstat bench-22.1.txt bench-master.txt

@aliher1911
Copy link
Contributor Author

I tried a test with the same data that GC test has but that uses MVCCIterator and SeekLT for every object sequentially:
#87373

The results 22.1 vs master are:
name old time/op new time/op delta
SeekLT-16 694ms ± 2% 836ms ± 3% +20.46% (p=0.000 n=10+10)
SeekGE-16 621ms ± 1% 757ms ± 1% +21.95% (p=0.000 n=10+9)

@aliher1911
Copy link
Contributor Author

Those tests use inmem engine, maybe that makes difference more noticeable as we don't have any IO operations involved and impact of compares and such is more visible?

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Sep 5, 2022

20% is significant and worth looking into. However, it's pretty far from 1600%, disk IO or not -- do you have any thoughts on where the discrepancy is coming from? Could we use the same engine setup for this benchmark and the GC benchmark to rule that out? Also, enabling intents and range keys on the iterator might make a difference.

@aliher1911
Copy link
Contributor Author

Looking on stacks, those are not the same paths that Next() takes. So it would be a red herring probably. As Jackson mentioned, Next does switchToMinHeap internally and that is calling batchskl.(*Iterator).Next which is causing slowness. It doesn't happen when I just do SeekGT/SeekLT. Naively iterating with Next() over the data set doesn't trigger this behaviour either.
I'll dig further to see what this is about.

@aliher1911
Copy link
Contributor Author

aliher1911 commented Sep 5, 2022

I think the key to the slowness is updated batch. Tests with just seek or seeklt/next doesn't have significant slowdown by itself. But if test does:

 func(rw ReadWriter, iter MVCCIterator, key MVCCKey) error {
		iter.SeekLT(key)
		iter.Next()
		return rw.ClearMVCC(key)
})

for eligible keys, then degradation is significant:

name old time/op new time/op delta
SeekLTNext-16 2.71s ± 2% 39.78s ± 4% +1370.12% (p=0.000 n=9+10)

This test case is in the draft PR I posted above.

Also note that function above is executed in:

	batch := eng.NewBatch()
	defer batch.Close()

and if batch is replaced by eng (create iterator and make updates directly), performance improves 10x.

@aliher1911
Copy link
Contributor Author

To verify the idea that batch gives the worst effects:

name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-16   354ms ± 6%   397ms ± 4%    +12.16%  (p=0.000 n=10+10)
SeekLTNext/useBatch=true-16    2.58s ± 7%  38.74s ± 2%  +1402.47%  (p=0.000 n=10+10)

batch was also affecting performance in the 22.1 giving 7x slowdown, on master it is giving 98x slowdown.

@aliher1911
Copy link
Contributor Author

cc: @jbowens

@erikgrinaker erikgrinaker added C-performance Perf of queries or internals. Solution not expected to change functional behavior. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Sep 5, 2022
@erikgrinaker erikgrinaker changed the title storage: MVCCGarbageCollect is slow in v22.2.-1 compared to v22.1 storage: MVCCGarbageCollect is slow in v22.2 compared to v22.1 Sep 6, 2022
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Sep 6, 2022

Had a closer look at the minimal repro benchmark in #87373, and it's fairly trivial -- it uses a plain old pebbleIterator which essentially just calls straight through to the pebble.Iterator. Seems likely that the issue is with Pebble somehow, so I'm reassigning to storage for investigation.

@erikgrinaker erikgrinaker assigned jbowens and unassigned aliher1911 Sep 6, 2022
@erikgrinaker erikgrinaker added the T-storage Storage Team label Sep 6, 2022
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Sep 6, 2022
@erikgrinaker erikgrinaker removed the A-storage Relating to our storage engine (Pebble) on-disk storage. label Sep 6, 2022
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Sep 6, 2022
@erikgrinaker erikgrinaker added A-storage Relating to our storage engine (Pebble) on-disk storage. and removed A-kv-replication Relating to Raft, consensus, and coordination. A-storage Relating to our storage engine (Pebble) on-disk storage. labels Sep 6, 2022
@sumeerbhola
Copy link
Collaborator

@jbowens I see the multiple Pebble issues opened in response to this, but I don't quite understand what is it about this use case that is causing slowness for this release (compared to the past).

@jbowens
Copy link
Collaborator

jbowens commented Sep 6, 2022

@sumeerbhola — The change to batch mutation visibility (cockroachdb/pebble#1640) is new in this release. MVCCGarbageCollect exercises the pathological behavior described in cockroachdb/pebble#1944. We visit GCKeys in increasing order, so we're accumulating sorted point deletes to the batch. The initial SeekGE for each MVCC user key is fine—there's no skiplist entries with a greater key. But if we don't find the first garbage key through stepping, we SeekLT to find the first garbage key's predecessor. On the batch iterator, this SeekLT first finds the last written point tombstone and then needs to Prev through the entirety of the batch to find that there are no keys currently visible in the batch. We then Next to move onto the garbage key, which then requires scanning back through the entirety skiplist. Iterator bounds are enforced at each step.

I filed the issue cockroachdb/pebble#1640 as something we might want to explore in the longer term. For avoiding the regression this release, I'm going to look at whether we can enforce batch visibility at the merging iterator level rather than within the batch iterator itself. This would make batch visibility behave more like normal visibility. We don't need to step the batch iterator further unless we actually need to know if it contains a lessor key.

jbowens added a commit to jbowens/pebble that referenced this issue Sep 6, 2022
Previously, visibility of batch point keys was enforced within the batch
iterator. This commit moves this visibility enforcement into the merging
iterator, alongside the visibility enforcement of committed keys. Span keys
(eg, range deletions and range keys) are still subject to visibility filtering
when the batch iterator is constructed, because it's necessary to process all
these spans to fragment them appropriately.

This change addresses the pathological behavior observed in
cockroachdb/cockroach#87277: filtering keys within leaf iterators can force the
leaf iterators to iterate through more keys than necessary. If many mutations
are made to a batch since the iterator was opened, the batch iterator may need
to iterate over many entries before finding a visible key. This iteration may
extend well beyond the next or previous key in the merging iterator.

Using the SeekLT benchmark from cockroachdb/cockroach#87373, relative to master:
```
name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-10   185ms ± 1%   185ms ± 2%     ~     (p=0.841 n=5+5)
SeekLTNext/useBatch=true-10    26.1s ± 1%    0.2s ± 2%  -99.34%  (p=0.008 n=5+5)
```

And relative to 22.1:
```
name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-10   182ms ± 6%   185ms ± 2%     ~     (p=0.690 n=5+5)
SeekLTNext/useBatch=true-10    2.71s ± 7%   0.17s ± 2%  -93.68%  (p=0.008 n=5+5)
```
jbowens added a commit to jbowens/pebble that referenced this issue Sep 7, 2022
Previously, visibility of batch point keys was enforced within the batch
iterator. This commit moves this visibility enforcement into the merging
iterator, alongside the visibility enforcement of committed keys. Span keys
(eg, range deletions and range keys) are still subject to visibility filtering
when the batch iterator is constructed, because it's necessary to process all
these spans to fragment them appropriately.

This change addresses the pathological behavior observed in
cockroachdb/cockroach#87277: filtering keys within leaf iterators can force the
leaf iterators to iterate through more keys than necessary. If many mutations
are made to a batch since the iterator was opened, the batch iterator may need
to iterate over many entries before finding a visible key. This iteration may
extend well beyond the next or previous key in the merging iterator.

Using the SeekLT benchmark from cockroachdb/cockroach#87373, relative to master:
```
name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-10   185ms ± 1%   185ms ± 2%     ~     (p=0.841 n=5+5)
SeekLTNext/useBatch=true-10    26.1s ± 1%    0.2s ± 2%  -99.34%  (p=0.008 n=5+5)
```

And relative to 22.1:
```
name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-10   182ms ± 6%   185ms ± 2%     ~     (p=0.690 n=5+5)
SeekLTNext/useBatch=true-10    2.71s ± 7%   0.17s ± 2%  -93.68%  (p=0.008 n=5+5)
```
jbowens added a commit to jbowens/pebble that referenced this issue Sep 7, 2022
Previously, visibility of batch point keys was enforced within the batch
iterator. This commit moves this visibility enforcement into the merging
iterator, alongside the visibility enforcement of committed keys. Span keys
(eg, range deletions and range keys) are still subject to visibility filtering
when the batch iterator is constructed, because it's necessary to process all
these spans to fragment them appropriately.

This change addresses the pathological behavior observed in
cockroachdb/cockroach#87277: filtering keys within leaf iterators can force the
leaf iterators to iterate through more keys than necessary. If many mutations
are made to a batch since the iterator was opened, the batch iterator may need
to iterate over many entries before finding a visible key. This iteration may
extend well beyond the next or previous key in the merging iterator.

Using the SeekLT benchmark from cockroachdb/cockroach#87373, relative to master:
```
name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-10   185ms ± 1%   185ms ± 2%     ~     (p=0.841 n=5+5)
SeekLTNext/useBatch=true-10    26.1s ± 1%    0.2s ± 2%  -99.34%  (p=0.008 n=5+5)
```

And relative to 22.1:
```
name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-10   182ms ± 6%   185ms ± 2%     ~     (p=0.690 n=5+5)
SeekLTNext/useBatch=true-10    2.71s ± 7%   0.17s ± 2%  -93.68%  (p=0.008 n=5+5)
```
jbowens added a commit to cockroachdb/pebble that referenced this issue Sep 7, 2022
Previously, visibility of batch point keys was enforced within the batch
iterator. This commit moves this visibility enforcement into the merging
iterator, alongside the visibility enforcement of committed keys. Span keys
(eg, range deletions and range keys) are still subject to visibility filtering
when the batch iterator is constructed, because it's necessary to process all
these spans to fragment them appropriately.

This change addresses the pathological behavior observed in
cockroachdb/cockroach#87277: filtering keys within leaf iterators can force the
leaf iterators to iterate through more keys than necessary. If many mutations
are made to a batch since the iterator was opened, the batch iterator may need
to iterate over many entries before finding a visible key. This iteration may
extend well beyond the next or previous key in the merging iterator.

Using the SeekLT benchmark from cockroachdb/cockroach#87373, relative to master:
```
name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-10   185ms ± 1%   185ms ± 2%     ~     (p=0.841 n=5+5)
SeekLTNext/useBatch=true-10    26.1s ± 1%    0.2s ± 2%  -99.34%  (p=0.008 n=5+5)
```

And relative to 22.1:
```
name                          old time/op  new time/op  delta
SeekLTNext/useBatch=false-10   182ms ± 6%   185ms ± 2%     ~     (p=0.690 n=5+5)
SeekLTNext/useBatch=true-10    2.71s ± 7%   0.17s ± 2%  -93.68%  (p=0.008 n=5+5)
```
@jbowens
Copy link
Collaborator

jbowens commented Sep 19, 2022

Fixed by #87714.

@jbowens jbowens closed this as completed Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-performance Perf of queries or internals. Solution not expected to change functional behavior. GA-blocker T-storage Storage Team
Projects
None yet
Development

No branches or pull requests

4 participants