Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: optimize new SST iterators #83051

Closed
erikgrinaker opened this issue Jun 18, 2022 · 11 comments · Fixed by cockroachdb/pebble#1905
Closed

storage: optimize new SST iterators #83051

erikgrinaker opened this issue Jun 18, 2022 · 11 comments · Fixed by cockroachdb/pebble#1905
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-storage Storage Team

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jun 18, 2022

The new SST iterators in #82799, e.g. NewPebbleSSTMemIterator(), are much slower than the existing NewMemSSTIterator(), around 80% or so. Some of this is due to CRDB-side overhead, in particular the use of pebbleIterator and VerifyingMVCCIterator, but the majority appears to be in Pebble via the new NewExternalIter() SST iterator. We need to optimize these.

This can be seen with BenchmarkSSTIterator, by changing NewMemSSTIterator() to NewPebbleMemSSTIterator() in runSSTIterator() on top of #82799.

Many of the optimizations that address #82559 and #83049 would likely apply here as well.

Jira issue: CRDB-16822

Epic CRDB-2624

@erikgrinaker erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. T-kv-replication labels Jun 18, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jun 18, 2022

cc @cockroachdb/replication

@blathers-crl blathers-crl bot added the A-kv-replication Relating to Raft, consensus, and coordination. label Jun 18, 2022
@nicktrav nicktrav added the T-storage Storage Team label Jul 13, 2022
@jbowens
Copy link
Collaborator

jbowens commented Jul 26, 2022

I expect most of the slowdown comes from the fact that the Pebble iterator resolves duplicates. This requires copying every key and an extra key comparison.

@erikgrinaker
Copy link
Contributor Author

Oof, yeah, that isn't great. The probability of duplicates is very low though, I wonder if we could do something cheaper and then backtrack if we find a possible match.

msbutler added a commit to msbutler/cockroach that referenced this issue Jul 26, 2022
This is purely a refactor. The only prod refactor is in the SSTBatcher, where
it calls ComputeStatsForRange. This  should not have a significant perf hit
as the iterator only surfaces point keys and the bulk of sst_batcher comput
time is spent in doing ingestion. Currently the pebbleIterator is
slightly slower than the old iterator, but optimizations are coming soon
(cockroachdb#83051).

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 26, 2022
Previously BACKUP would not back up range tombstones. With this patch, BACKUPs
with revision_history will backup range tombstones. Non-revision history backups
are not affected by this diff because MVCCExportToSST filters all tombstones
out of the backup already.

Specifically, this patch replaces the iterators used in the backup_processor
with the pebbleIterator, which has baked in range key support. This refactor
introduces a 5% regression in backup runtime, even when the backup has no range
keys, though cockroachdb#83051 hopes to address this gap. See details below on the
benchmark experiment.

At this point a backup with range keys is restorable, thanks to cockroachdb#84214. Note
that the restore codebase still touches iterators that are not range key aware.
This is not a problem because restored data does not have range keys, nor do
the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher
and in CheckSSTConflicts) will be updated when cockroachdb#70428 gets fixed.

Fixes cockroachdb#71155

Release note: none

To benchmark this diff, the following commands were used on the following sha
a5ccdc3, with and without this commit, over
three trials:
```
roachprod create -n 5 --gce-machine-type=n2-standard-16 $CLUSTER
roachprod put $CLUSTER [build] cockroach

roachprod wipe $CLUSTER; roachprod start $CLUSTER;
roachprod run $CLUSTER:1 -- "./cockroach workload init bank --rows 1000000000"
roachprod sql $CLUSTER:1 -- -e "BACKUP INTO 'gs://somebucket/michael-rangkey?AUTH=implicit'"
```

The backup on the control binary took on average 478 seconds with a stdev of 13
seconds, while the backup with the treatment binary took on average 499 seconds
with stddev of 8 seconds.
craig bot pushed a commit that referenced this issue Jul 27, 2022
84875: backupccl: handle range keys in BACKUP r=erikgrinaker a=msbutler

Previously BACKUP would not back up range tombstones. With this patch, BACKUPs
with revision_history will backup range tombstones. Non-revision history backups
are not affected by this diff because MVCCExportToSST filters all tombstones
out of the backup already.

Specifically, this patch replaces the iterators used in the backup_processor
with the pebbleIterator, which has baked in range key support. This refactor
introduces a 5% regression in backup runtime, even when the backup has no range
keys, though #83051 hopes to address this gap. See details below on the
benchmark experiment.

At this point a backup with range keys is restorable, thanks to #84214. Note
that the restore codebase still touches iterators that are not range key aware.
This is not a problem because restored data does not have range keys, nor do
the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher
and in CheckSSTConflicts) will be updated when #70428 gets fixed.

Fixes #71155

Release note: none

To benchmark this diff, the following commands were used on the following sha
a5ccdc3, with and without this commit, over
three trials:
```
roachprod create -n 5 --gce-machine-type=n2-standard-16 $CLUSTER
roachprod put $CLUSTER [build] cockroach

roachprod wipe $CLUSTER; roachprod start $CLUSTER;
roachprod run $CLUSTER:1 -- "./cockroach workload init bank --rows 1000000000"
roachprod sql $CLUSTER:1 -- -e "BACKUP INTO 'gs://somebucket/michael-rangkey?AUTH=implicit'"
```

The backup on the control binary took on average 478 seconds with a stdev of 13
seconds, while the backup with the treatment binary took on average 499 seconds
with stddev of 8 seconds.

84883: kvserver: add server-side transaction retry metrics r=arulajmani a=arulajmani

This patch adds a few new metrics to track successful/failed
server-side transaction retries. Specifically, whenever we attempt
to retry a read or write batch or run into a read within uncertainty
interval error, we increment specific counters indicating if the
retry was successful or not.

Release note: None

85074: upgrades: add checkpointing for `raftAppliedIndexTermMigration` r=irfansharif a=erikgrinaker

Forward-port of #84909, for posterity.

----

The `raftAppliedIndexTermMigration` upgrade migration could be
unreliable. It iterates over all ranges and runs a `Migrate` request
which must be applied on all replicas. However, if any ranges merge or
replicas are unavailable, the migration fails and starts over from the
beginning. In large clusters with many ranges, this meant that it might
never complete.

This patch makes the upgrade more robust, by retrying each `Migrate`
request 5 times, and checkpointing the progress after every fifth batch
(1000 ranges), allowing resumption on failure. At some point this should
be made part of the migration infrastructure.

NB: This fix was initially submitted for 22.1, and even though the
migration will be removed for 22.2, it is forward-ported for posterity.

Release note: None

85086: eval: stop ignoring all ResolveOIDFromOID errors r=ajwerner a=rafiss

fixes #84448

The decision about whether an error is safe to ignore is made at the
place where the error is created/returned. This way, the callers don't
need to be aware of any new error codes that the implementation may
start returning in the future.

Release note (bug fix): Fixed incorrect error handling that could cause
casts to OID types to fail in some cases.

Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
craig bot pushed a commit that referenced this issue Jul 28, 2022
85068: bulk: replace old SSTIterators with PebbleSSTIterator in bulk stack r=erikgrinaker a=msbutler

This is purely a refactor. The only prod refactor is in the SSTBatcher, where
it calls ComputeStatsForRange. This  should not have a significant perf hit
as the iterator only surfaces point keys and the bulk of sst_batcher comput
time is spent in doing ingestion. Currently the pebbleIterator is
slightly slower than the old iterator, but optimizations are coming soon
(#83051).

Release note: none

Co-authored-by: Michael Butler <[email protected]>
@jbowens
Copy link
Collaborator

jbowens commented Aug 1, 2022

@msbutler what structure do we have in the backups? Is it correct that we within each individual full backup and each individual backup we have an ordered series of nonoverlapping files? I think we should also be able to refactor the interface so that we can propagate this information to the Pebble iterator, and it can create num-backups levels in the merging iterator instead of num-files levels.

@msbutler
Copy link
Collaborator

msbutler commented Aug 1, 2022

The set of files produced by an individual backup may have overlapping key spans, however there will never be overlapping keys across files (i.e. if a@2 is in file foo, then a@anyTimestamp will be in file foo). CORRECTION: A key's versions can get split across different backup SSTs.

@jbowens
Copy link
Collaborator

jbowens commented Aug 2, 2022

I think we'll want to change the interface to accept a [][]sstable.ReadableFile, where each subslice must not contain overlapping point keys.

This will allow us to organize each backup into its own 'level', by implementing a simple level iterator that chains all the internal point key iterators within a backup. There's a bit of a complication that pebble.Iterator supports seeking, but we don't the bounds of these files. We don't require seeking for this use case, but it's a poor UX to expose a *pebble.Iterator that's only intended for forward scans. I think we could support seeking without too much work by binary searching among the level's files, First()-ing each to retrieve its smallest point key for comparison.

@nicktrav — did you want to pick this up after the min-table version work, or something else?

@msbutler
Copy link
Collaborator

msbutler commented Aug 2, 2022

@stevendanna I just thought of one wrinkle to my answer to Jackson's question above. Could you verify this? If the backup retries, we might write several SSTs that contain the same versioned key, however, only one of those SST's will ever get check pointed. So, from RESTORE's perspective, this still holds:

The set of files produced by an individual backup may have overlapping key spans, and a key's versions can get split across different backup ssts.

@nicktrav
Copy link
Collaborator

nicktrav commented Aug 2, 2022

@nicktrav — did you want to pick this up after the min-table version work, or something else?

SGTM.

@nicktrav nicktrav self-assigned this Aug 2, 2022
@itsbilal itsbilal assigned itsbilal and unassigned nicktrav Aug 8, 2022
@itsbilal
Copy link
Contributor

@jbowens Just to clarify, we'd still need the pausing behaviour for the simpler point-key levelIter right (i.e. the one that operates on observed file bounds and not FileMetadata and only needs to support forward iteration)? Since we could still have range deletes in one fake level that apply to points in another level.

Also on that note, we'd need a similar simpler levelIter for range keys, right? Or is it sufficient to still expose all the range key iterators to its merging iter, with the hope that there aren't many files with range keys?

@jbowens
Copy link
Collaborator

jbowens commented Aug 12, 2022

Just to clarify, we'd still need the pausing behaviour for the simpler point-key levelIter right (i.e. the one that operates on observed file bounds and not FileMetadata and only needs to support forward iteration)? Since we could still have range deletes in one fake level that apply to points in another level.

Yeah, I guess so. It's not needed for the intended use case of backups, but we should try to keep NewExternalIter very general, even if some of our optimizations are not general.

Also on that note, we'd need a similar simpler levelIter for range keys, right? Or is it sufficient to still expose all the range key iterators to its merging iter, with the hope that there aren't many files with range keys?

I think that's sufficient since range keys should be relatively rare, and I think range keys are allowed to overlap between backup files from the same backup.

@erikgrinaker
Copy link
Contributor Author

Recent benchmarks comparing the old sstIterator vs. the new pebbleIterator based one:

name                                      old time/op  new time/op  delta
SSTIterator/X/keys=1/verify=false-24      39.8µs ± 1%  45.5µs ± 4%  +14.35%  (p=0.000 n=9+10)
SSTIterator/X/keys=1/verify=true-24       39.8µs ± 1%  44.1µs ± 4%  +10.68%  (p=0.000 n=9+10)
SSTIterator/X/keys=100/verify=false-24    47.8µs ± 0%  57.1µs ± 1%  +19.56%  (p=0.000 n=8+10)
SSTIterator/X/keys=100/verify=true-24     50.3µs ± 1%  61.9µs ± 1%  +22.97%  (p=0.000 n=9+10)
SSTIterator/X/keys=10000/verify=false-24   850µs ± 0%  1498µs ± 1%  +76.28%  (p=0.000 n=8+9)
SSTIterator/X/keys=10000/verify=true-24   1.07ms ± 1%  1.92ms ± 0%  +80.40%  (p=0.000 n=10+9)

itsbilal added a commit to itsbilal/pebble that referenced this issue Aug 17, 2022
Currently, we create a new merging iterator level for each file
passed into NewExternalIter. This is unnecessary for most use-cases
of creating ExternalIters around lots of sstables, as we can externally
guarantee that many of those sstables won't have overlapping points
with each other. We can have the caller pass this knowledge
by specifying a [][]sstable.ReadableFile where each sub-slice
obeys level invariants for files within it, and is also already
sorted by user keys.

This change makes the interface change to allow for the above
optimization, and also adds a `simpleLevelIter` that implements
forward iteration within a single "level". For files that don't
contain range deletes, we shove all the point iters into one
`simpleLevelIter`, greatly reducing merging iterator levels
and speeding up its operations by a lot.

Fixes cockroachdb/cockroach#83051.
itsbilal added a commit to itsbilal/pebble that referenced this issue Aug 18, 2022
Currently, we create a new merging iterator level for each file
passed into NewExternalIter. This is unnecessary for most use-cases
of creating ExternalIters around lots of sstables, as we can externally
guarantee that many of those sstables won't have overlapping points
with each other. We can have the caller pass this knowledge
by specifying a [][]sstable.ReadableFile where each sub-slice
obeys level invariants for files within it, and is also already
sorted by user keys.

This change makes the interface change to allow for the above
optimization, and also adds a `simpleLevelIter` that implements
forward iteration within a single "level". For files that don't
contain range deletes, we shove all the point iters into one
`simpleLevelIter`, greatly reducing merging iterator levels
and speeding up its operations by a lot.

Fixes cockroachdb/cockroach#83051.
itsbilal added a commit to itsbilal/pebble that referenced this issue Aug 18, 2022
Currently, we create a new merging iterator level for each file
passed into NewExternalIter. This is unnecessary for most use-cases
of creating ExternalIters around lots of sstables, as we can externally
guarantee that many of those sstables won't have overlapping points
with each other. We can have the caller pass this knowledge
by specifying a [][]sstable.ReadableFile where each sub-slice
obeys level invariants for files within it, and is also already
sorted by user keys.

This change makes the interface change to allow for the above
optimization, and also adds a `simpleLevelIter` that implements
forward iteration within a single "level". For files that don't
contain range deletes, we shove all the point iters into one
`simpleLevelIter`, greatly reducing merging iterator levels
and speeding up its operations by a lot.

Fixes cockroachdb/cockroach#83051.
itsbilal added a commit to itsbilal/pebble that referenced this issue Aug 19, 2022
Currently, we create a new merging iterator level for each file
passed into NewExternalIter. This is unnecessary for most use-cases
of creating ExternalIters around lots of sstables, as we can externally
guarantee that many of those sstables won't have overlapping points
with each other. We can have the caller pass this knowledge
by specifying a [][]sstable.ReadableFile where each sub-slice
obeys level invariants for files within it, and is also already
sorted by user keys.

This change makes the interface change to allow for the above
optimization, and also adds a `simpleLevelIter` that implements
forward iteration within a single "level". For files that don't
contain range deletes, we shove all the point iters into one
`simpleLevelIter`, greatly reducing merging iterator levels
and speeding up its operations by a lot.

Fixes cockroachdb/cockroach#83051.
itsbilal added a commit to cockroachdb/pebble that referenced this issue Aug 19, 2022
Currently, we create a new merging iterator level for each file
passed into NewExternalIter. This is unnecessary for most use-cases
of creating ExternalIters around lots of sstables, as we can externally
guarantee that many of those sstables won't have overlapping points
with each other. We can have the caller pass this knowledge
by specifying a [][]sstable.ReadableFile where each sub-slice
obeys level invariants for files within it, and is also already
sorted by user keys.

This change makes the interface change to allow for the above
optimization, and also adds a `simpleLevelIter` that implements
forward iteration within a single "level". For files that don't
contain range deletes, we shove all the point iters into one
`simpleLevelIter`, greatly reducing merging iterator levels
and speeding up its operations by a lot.

Fixes cockroachdb/cockroach#83051.
craig bot pushed a commit that referenced this issue Aug 22, 2022
86259: storage: implement `RangeKeyChanged()` for `MVCCIncrementalIterator` r=tbg a=erikgrinaker

**storage: use `RangeKeyChanged()` in `MVCCIncrementalIterator`**

This patch uses `RangeKeyChanged()` to detect changes to range keys in
`MVCCIncrementalIterator`. There are no functional changes.

Some quick benchmarks, using catchup scans:

```
name                                                                 old time/op  new time/op  delta
CatchUpScan/mixed-case/withDiff=true/perc=0.00/numRangeKeys=0-24      538ms ± 1%   535ms ± 1%     ~     (p=0.211 n=10+9)
CatchUpScan/mixed-case/withDiff=true/perc=0.00/numRangeKeys=1-24      690ms ± 0%   590ms ± 1%  -14.56%  (p=0.000 n=9+10)
CatchUpScan/mixed-case/withDiff=true/perc=0.00/numRangeKeys=100-24    743ms ± 1%   646ms ± 1%  -13.13%  (p=0.000 n=9+9)
CatchUpScan/mixed-case/withDiff=false/perc=0.00/numRangeKeys=0-24     794ms ± 1%   794ms ± 0%     ~     (p=0.579 n=10+10)
CatchUpScan/mixed-case/withDiff=false/perc=0.00/numRangeKeys=1-24     966ms ± 0%   911ms ± 1%   -5.72%  (p=0.000 n=10+10)
CatchUpScan/mixed-case/withDiff=false/perc=0.00/numRangeKeys=100-24   974ms ± 0%   920ms ± 0%   -5.51%  (p=0.000 n=10+10)
```

Release justification: bug fixes and low-risk updates to new functionality.

Release note: None

**storage: implement `RangeKeyChanged()` for `MVCCIncrementalIterator`**

This patch implements `RangeKeyChanged()` for `MVCCIncrementalIterator`.
It only fires if the time bound range keys change, not if a
`Next(Key)IgnoringTime()` operation reveals additional range keys.

Resolves #86105.

Release justification: bug fixes and low-risk updates to new functionality

Release note: None
  
**storage: add `MVCCIncrementalIterator.RangeKeysIgnoringTime()`**

This patch changes `MVCCIncrementalIterator` range key behavior
following a `Next(Key)IgnoringTime()` call. Previously, range key
methods would then expose unfiltered range keys. Now, the standard range
key methods only apply to filtered range keys, and an additional
`RangeKeysIgnoringTime()` method provides access to unfiltered range
keys.

This implies that if such a call steps onto a range key that's entirely
outside of the time bounds then:

* `HasPointAndRange()` will return `false`,`false` if on a bare range
  key.

* `RangeKeyChanged()` will not fire, unless stepping off of a range key
  within the time bounds.

* `RangeBounds()` and `RangeKeys()` will return empty results.

This is done to avoid conditional range key handling following these
calls, except for the exact sites where the caller is interested in
the unfiltered range keys.

Release justification: bug fixes and low-risk updates to new functionality

Release note: None

86440: storage: use concrete `pebbleIterator` in `intentInterleavingIter` r=tbg a=erikgrinaker

**storage: tweak `unsafeMVCCIterator` construction**

Release justification: non-production code changes

Release note: None

**storage: inline some `intentInterleavingIter` methods**

This patch splits up `maybeSkipIntentRangeKeys()` and
`maybeSuppressRangeKeyChanged()` to allow for mid-stack inlining.

I doubt that the gains are as large as these microbenchmarks claim, and
there's a fair bit of variance between runs, but it can't hurt.

```
name                                                                         old time/op    new time/op    delta
MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-24               5.37µs ± 2%    5.43µs ± 2%  +1.13%  (p=0.041 n=10+10)
MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-24             38.2µs ± 2%    38.2µs ± 2%    ~     (p=0.971 n=10+10)
MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-24           2.79ms ± 2%    2.71ms ± 2%  -2.59%  (p=0.000 n=10+10)
MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-24        5.99µs ± 1%    5.99µs ± 2%    ~     (p=0.541 n=10+10)
MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-24      51.7µs ± 3%    52.1µs ± 1%    ~     (p=0.631 n=10+10)
MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-24    3.88ms ± 2%    3.87ms ± 1%    ~     (p=0.897 n=10+8)
MVCCComputeStats_Pebble/valueSize=32/numRangeKeys=0-24                          158ms ± 1%     155ms ± 1%  -2.34%  (p=0.000 n=10+9)
```

Touches #82559.

Release justification: bug fixes and low-risk updates to new functionality

Release note: None
  
**storage: use concrete `pebbleIterator` in `intentInterleavingIter`**

Since `intentInterleavingIter` always constructs the underlying
iterators from the given reader, and these readers always construct
`*pebbleIterator`, it can use the concrete type rather than interfaces
for both iterators. This avoids dynamic dispatch, yielding a decent
performance improvement.

Unfortunately, this requires disabling `unsafeMVCCIterator` inside
`intentInterleavingIter`. This wasn't always enabled anyway, since it
was omitted both for the engine iterator and when using an engine
directly (which doesn't have consistent iterators).

```
name                                                                         old time/op    new time/op    delta
MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-24               5.43µs ± 2%    5.45µs ± 2%    ~     (p=0.566 n=10+10)
MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-24             38.2µs ± 2%    37.0µs ± 1%  -3.02%  (p=0.000 n=10+10)
MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-24           2.71ms ± 2%    2.66ms ± 1%  -1.83%  (p=0.000 n=10+9)
MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-24        5.99µs ± 2%    5.86µs ± 2%  -2.15%  (p=0.000 n=10+10)
MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-24      52.1µs ± 1%    50.2µs ± 2%  -3.77%  (p=0.000 n=10+10)
MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-24    3.87ms ± 1%    3.83ms ± 1%  -1.26%  (p=0.000 n=8+10)
MVCCComputeStats_Pebble/valueSize=32/numRangeKeys=0-24                          155ms ± 1%     155ms ± 1%    ~     (p=0.842 n=9+10)
```

Touches #82559.

Release justification: low risk, high benefit changes to existing functionality

Release note: None

86478: storage: use concrete `pebbleIterator` in `verifyingIterator` r=tbg a=erikgrinaker

**storage: add Pebble SST iterator benchmarks**

Release justification: non-production code changes

Release note: None
  
**storage: use concrete `pebbleIterator` in `verifyingIterator`**

Gives a slight performance boost, since it avoid dynamic dispatch:

```
name                                                  old time/op  new time/op  delta
SSTIterator/keys=1/variant=pebble/verify=true-24      42.8µs ± 1%  42.4µs ± 1%  -0.79%  (p=0.043 n=10+10)
SSTIterator/keys=100/variant=pebble/verify=true-24    61.8µs ± 1%  60.7µs ± 1%  -1.64%  (p=0.000 n=10+10)
SSTIterator/keys=10000/variant=pebble/verify=true-24  1.91ms ± 0%  1.88ms ± 0%  -1.79%  (p=0.000 n=10+10)
```

An attempt was also made at using `RangeKeyChanged()` instead of
`HasPointAndRange()`, but this had no effect.

Touches #83051.

Release justification: bug fixes and low-risk updates to new functionality

Release note: None

86513: storage: use `RangeKeyChanged` in `ReadAsOfIterator` r=tbg a=erikgrinaker

This patch uses `RangeKeyChanged()` to detect range keys in
`ReadAsOfIterator`, and caches them to improve performance.
It also fixes a bug where the iterator would fail to detect tombstones
with a non-empty `MVCCValueHeader`.

Resolves #84714.

Release justification: bug fixes and low-risk updates to new functionality

Release note: None

86514: storage: use `RangeKeyChanged()` in `MVCCDeleteRangeUsingTombstone()` r=tbg a=erikgrinaker

Release justification: bug fixes and low-risk updates to new functionality

Release note: None

86529: storage: omit unnecessary range key calls during intent resolution r=tbg a=erikgrinaker

Release justification: bug fixes and low-risk updates to new functionality

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-storage Storage Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants