Skip to content

Commit

Permalink
Merge #93753
Browse files Browse the repository at this point in the history
93753: storage: handle MVCC range keys in `pebbleMVCCScanner` r=erikgrinaker a=erikgrinaker

**storage: add some `TestMVCCHistories` range key cases for `MVCCGet`**

Release note: None

**storage: simplify some `pebbleMVCCScanner` iteration code**

This also yields a nice performance boost for reverse scans, by removing unnecessary peeking which incurs key copies.

```
name                                                                         old time/op    new time/op    delta
MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=8/numRangeKeys=0-24         5.65µs ± 2%    5.62µs ± 1%    ~     (p=0.897 n=5+5)
MVCCReverseScan_Pebble/rows=1/versions=10/valueSize=8/numRangeKeys=0-24        22.7µs ± 0%    22.9µs ± 1%  +0.94%  (p=0.032 n=5+5)
MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=8/numRangeKeys=0-24       43.0µs ± 1%    40.3µs ± 1%  -6.18%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=100/versions=10/valueSize=8/numRangeKeys=0-24       352µs ± 1%     340µs ± 2%  -3.57%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=8/numRangeKeys=0-24     3.41ms ± 1%    3.21ms ± 1%  -5.68%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=10000/versions=10/valueSize=8/numRangeKeys=0-24    32.5ms ± 2%    31.1ms ± 6%    ~     (p=0.095 n=5+5)
```

Release note: None

**storage: handle MVCC range keys in `pebbleMVCCScanner`**

This patch removes `PointSynthesizingIter`, and instead implements MVCC range key handling in `pebbleMVCCScanner` directly. `PointSynthesizingIter` was a mistake. It was intended to reduce the complexity of integrating MVCC range key processing into `pebbleMVCCScanner`, but ended up introducing even more complexity at a significant performance cost.

`MVCCScan` point synthesis has changed slightly: while point keys were previously synthesized at the start bound of an MVCC range key, they are now only synthesized above existing point keys. This was mostly an artifact of having to do conflict checks based on synthesized points (to ensure bare range keys were handled), but this is no longer necessary since `pebbleMVCCScanner` can inspect the range keys directly.

`MVCCGet` point synthesis remains unchanged: we always synthesize point keys regardless of whether there is an existing point key below it. This is necessary for e.g. conflict checks on top of `MVCCGet`, such as transaction read refreshes.

`MVCCScan` shows a significant performance gain with range keys, and negligible overhead without them. We still see a hefty performance penalty when range keys are present though, compared to the `numRangeKeys=0` case, which likely needs optimization in Pebble.

```
name                                                                    old time/op    new time/op     delta
MVCCScan_Pebble/rows=1/versions=1/valueSize=8/numRangeKeys=0-24           5.31µs ± 1%     5.39µs ± 2%     ~     (p=0.095 n=5+5)
MVCCScan_Pebble/rows=1/versions=1/valueSize=8/numRangeKeys=1-24           10.7µs ± 1%     10.2µs ± 3%   -4.39%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=1/versions=1/valueSize=8/numRangeKeys=100-24          144µs ± 1%       94µs ± 2%  -34.54%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=1/versions=10/valueSize=8/numRangeKeys=0-24          20.4µs ± 2%     20.6µs ± 3%     ~     (p=0.421 n=5+5)
MVCCScan_Pebble/rows=1/versions=10/valueSize=8/numRangeKeys=1-24          28.0µs ± 2%     26.7µs ± 2%   -4.41%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=1/versions=10/valueSize=8/numRangeKeys=100-24         120µs ± 1%       82µs ± 1%  -31.89%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=100/versions=1/valueSize=8/numRangeKeys=0-24         33.0µs ± 1%     33.3µs ± 1%     ~     (p=0.056 n=5+5)
MVCCScan_Pebble/rows=100/versions=1/valueSize=8/numRangeKeys=1-24         55.0µs ± 1%     46.1µs ± 1%  -16.15%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=100/versions=1/valueSize=8/numRangeKeys=100-24        158µs ± 1%      142µs ± 1%  -10.15%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=100/versions=10/valueSize=8/numRangeKeys=0-24         137µs ± 1%      139µs ± 3%   +1.72%  (p=0.032 n=5+5)
MVCCScan_Pebble/rows=100/versions=10/valueSize=8/numRangeKeys=1-24         223µs ± 2%      187µs ± 0%  -15.79%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=100/versions=10/valueSize=8/numRangeKeys=100-24       322µs ± 1%      264µs ± 1%  -17.95%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=10000/versions=1/valueSize=8/numRangeKeys=0-24       2.41ms ± 1%     2.41ms ± 0%     ~     (p=0.690 n=5+5)
MVCCScan_Pebble/rows=10000/versions=1/valueSize=8/numRangeKeys=1-24       3.99ms ± 1%     3.19ms ± 0%  -20.08%  (p=0.016 n=5+4)
MVCCScan_Pebble/rows=10000/versions=1/valueSize=8/numRangeKeys=100-24     4.74ms ± 1%     3.77ms ± 1%  -20.53%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=10000/versions=10/valueSize=8/numRangeKeys=0-24      10.7ms ± 3%     10.7ms ± 3%     ~     (p=0.421 n=5+5)
MVCCScan_Pebble/rows=10000/versions=10/valueSize=8/numRangeKeys=1-24      18.1ms ± 1%     15.0ms ± 0%  -17.33%  (p=0.008 n=5+5)
MVCCScan_Pebble/rows=10000/versions=10/valueSize=8/numRangeKeys=100-24    23.1ms ± 4%     16.4ms ± 6%  -28.90%  (p=0.008 n=5+5)
```

`MVCCScan` in reverse shows a moderate improvement, primarily due to the additional checks needed to detect bare range keys in reverse since it can land on them without `RangeKeyChanged` firing.

```
name                                                                           old time/op    new time/op    delta
MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=8/numRangeKeys=0-24           5.69µs ± 2%    5.68µs ± 1%     ~     (p=0.841 n=5+5)
MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=8/numRangeKeys=1-24           11.0µs ± 2%    10.8µs ± 1%   -2.01%  (p=0.016 n=5+5)
MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=8/numRangeKeys=100-24          100µs ± 1%      95µs ± 1%   -5.67%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=1/versions=10/valueSize=8/numRangeKeys=0-24          23.1µs ± 1%    23.4µs ± 1%     ~     (p=0.056 n=5+5)
MVCCReverseScan_Pebble/rows=1/versions=10/valueSize=8/numRangeKeys=1-24          31.5µs ± 1%    31.4µs ± 1%     ~     (p=0.421 n=5+5)
MVCCReverseScan_Pebble/rows=1/versions=10/valueSize=8/numRangeKeys=100-24        87.5µs ± 1%    91.7µs ± 1%   +4.88%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=8/numRangeKeys=0-24         40.6µs ± 1%    41.3µs ± 1%   +1.60%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=8/numRangeKeys=1-24         60.8µs ± 1%    55.8µs ± 1%   -8.18%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=8/numRangeKeys=100-24        159µs ± 1%     145µs ± 0%   -8.49%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=100/versions=10/valueSize=8/numRangeKeys=0-24         342µs ± 1%     344µs ± 3%     ~     (p=1.000 n=5+5)
MVCCReverseScan_Pebble/rows=100/versions=10/valueSize=8/numRangeKeys=1-24         486µs ± 1%     457µs ± 1%   -6.05%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=100/versions=10/valueSize=8/numRangeKeys=100-24       701µs ± 3%     663µs ± 2%   -5.44%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=8/numRangeKeys=0-24       3.12ms ± 1%    3.17ms ± 0%   +1.36%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=8/numRangeKeys=1-24       4.51ms ± 2%    4.09ms ± 1%   -9.36%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=8/numRangeKeys=100-24     5.36ms ± 0%    4.66ms ± 1%  -13.21%  (p=0.008 n=5+5)
MVCCReverseScan_Pebble/rows=10000/versions=10/valueSize=8/numRangeKeys=0-24      32.5ms ± 8%    31.6ms ± 6%     ~     (p=0.310 n=5+5)
MVCCReverseScan_Pebble/rows=10000/versions=10/valueSize=8/numRangeKeys=1-24      45.7ms ± 6%    40.2ms ±10%  -12.14%  (p=0.016 n=5+5)
MVCCReverseScan_Pebble/rows=10000/versions=10/valueSize=8/numRangeKeys=100-24    64.5ms ± 6%    59.0ms ±11%   -8.50%  (p=0.032 n=5+5)
```

Scans across garbage show even better performance gains:

```
name                                                                                old time/op  new time/op  delta
MVCCScanGarbage_Pebble/rows=1/versions=1/numRangeKeys=0/tombstones=false-24         4.99µs ± 2%  4.81µs ± 1%   -3.64%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=1/numRangeKeys=0/tombstones=true-24          5.10µs ± 1%  5.03µs ± 1%   -1.33%  (p=0.032 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=1/numRangeKeys=1/tombstones=false-24         9.25µs ± 3%  8.47µs ± 2%   -8.41%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=1/numRangeKeys=1/tombstones=true-24          9.44µs ± 1%  8.94µs ± 3%   -5.33%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=1/numRangeKeys=100/tombstones=false-24        150µs ± 1%    96µs ± 1%  -36.27%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=1/numRangeKeys=100/tombstones=true-24        96.7µs ± 0%  95.4µs ± 1%   -1.33%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=10/numRangeKeys=0/tombstones=false-24        9.10µs ± 3%  9.24µs ± 2%     ~     (p=0.151 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=10/numRangeKeys=0/tombstones=true-24         9.43µs ± 2%  9.41µs ± 2%     ~     (p=0.841 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=10/numRangeKeys=1/tombstones=false-24        13.2µs ± 3%  12.6µs ± 2%   -4.73%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=10/numRangeKeys=1/tombstones=true-24         14.5µs ± 1%  13.8µs ± 2%   -4.60%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=10/numRangeKeys=100/tombstones=false-24       160µs ± 1%   105µs ± 1%  -34.67%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=1/versions=10/numRangeKeys=100/tombstones=true-24        106µs ± 1%   105µs ± 1%     ~     (p=0.548 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=1/numRangeKeys=0/tombstones=false-24       22.2µs ± 1%  22.6µs ± 1%   +2.21%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=1/numRangeKeys=0/tombstones=true-24        32.0µs ± 0%  32.2µs ± 1%   +0.80%  (p=0.016 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=1/numRangeKeys=1/tombstones=false-24       18.9µs ± 1%  18.2µs ± 1%   -3.64%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=1/numRangeKeys=1/tombstones=true-24        55.6µs ± 0%  45.4µs ± 1%  -18.27%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=1/numRangeKeys=100/tombstones=false-24      215µs ± 1%   120µs ± 2%  -44.45%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=1/numRangeKeys=100/tombstones=true-24       277µs ± 1%   148µs ± 1%  -46.36%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=10/numRangeKeys=0/tombstones=false-24       108µs ± 1%   109µs ± 3%     ~     (p=1.000 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=10/numRangeKeys=0/tombstones=true-24        122µs ± 0%   122µs ± 2%     ~     (p=0.421 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=10/numRangeKeys=1/tombstones=false-24      67.6µs ± 1%  67.0µs ± 1%     ~     (p=0.095 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=10/numRangeKeys=1/tombstones=true-24        200µs ± 0%   168µs ± 1%  -16.04%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=10/numRangeKeys=100/tombstones=false-24     324µs ± 1%   178µs ± 1%  -45.17%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=100/versions=10/numRangeKeys=100/tombstones=true-24      306µs ± 1%   274µs ± 1%  -10.33%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=1/numRangeKeys=0/tombstones=false-24     1.61ms ± 0%  1.65ms ± 2%   +2.27%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=1/numRangeKeys=0/tombstones=true-24      2.33ms ± 2%  2.35ms ± 1%     ~     (p=0.151 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=1/numRangeKeys=1/tombstones=false-24      141µs ± 0%   140µs ± 1%     ~     (p=0.310 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=1/numRangeKeys=1/tombstones=true-24      4.31ms ± 1%  3.25ms ± 0%  -24.59%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=1/numRangeKeys=100/tombstones=false-24   6.64ms ± 8%  1.35ms ± 1%  -79.67%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=1/numRangeKeys=100/tombstones=true-24    16.4ms ± 2%   4.0ms ± 1%  -75.65%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=10/numRangeKeys=0/tombstones=false-24    9.33ms ± 1%  9.24ms ± 0%     ~     (p=0.310 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=10/numRangeKeys=0/tombstones=true-24     10.3ms ± 1%  10.1ms ± 1%   -1.48%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=10/numRangeKeys=1/tombstones=false-24     213µs ± 2%   213µs ± 1%     ~     (p=0.841 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=10/numRangeKeys=1/tombstones=true-24     17.7ms ± 1%  14.4ms ± 3%  -18.61%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=10/numRangeKeys=100/tombstones=false-24  11.1ms ± 4%   2.9ms ± 4%  -73.89%  (p=0.008 n=5+5)
MVCCScanGarbage_Pebble/rows=10000/versions=10/numRangeKeys=100/tombstones=true-24   18.2ms ± 1%  15.2ms ± 4%  -16.45%  (p=0.008 n=5+5)
```

And `MVCCGet` performance remains largely unchanged, as expected given that the cost is dominated by the seek.

```
name                                                                     old time/op    new time/op    delta
MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=0-24        5.20µs ± 1%    5.23µs ± 2%    ~     (p=0.841 n=5+5)
MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=1-24        9.52µs ± 1%    9.23µs ± 1%  -3.07%  (p=0.008 n=5+5)
MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=100-24      92.4µs ± 1%    91.6µs ± 1%  -0.93%  (p=0.032 n=5+5)
MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=0-24       20.2µs ± 2%    20.2µs ± 3%    ~     (p=0.651 n=5+5)
MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=1-24       25.9µs ± 2%    25.5µs ± 1%    ~     (p=0.056 n=5+5)
MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=100-24     79.8µs ± 1%    80.1µs ± 2%    ~     (p=1.000 n=5+5)
MVCCGet_Pebble/batch=false/versions=100/valueSize=8/numRangeKeys=0-24      82.9µs ± 2%    83.3µs ± 1%    ~     (p=0.310 n=5+5)
MVCCGet_Pebble/batch=false/versions=100/valueSize=8/numRangeKeys=1-24      92.2µs ± 3%    90.2µs ± 1%    ~     (p=0.095 n=5+5)
MVCCGet_Pebble/batch=false/versions=100/valueSize=8/numRangeKeys=100-24     151µs ± 2%     149µs ± 3%    ~     (p=0.690 n=5+5)
MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=0-24         3.02µs ± 1%    3.04µs ± 1%    ~     (p=0.310 n=5+5)
MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=1-24         5.44µs ± 2%    5.29µs ± 3%    ~     (p=0.056 n=5+5)
MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=100-24       53.0µs ± 1%    52.7µs ± 1%    ~     (p=0.095 n=5+5)
MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=0-24        17.0µs ± 1%    17.1µs ± 2%    ~     (p=1.000 n=5+5)
MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=1-24        21.4µs ± 5%    20.9µs ± 3%    ~     (p=0.151 n=5+5)
MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=100-24      73.1µs ± 2%    72.7µs ± 1%    ~     (p=0.841 n=5+5)
MVCCGet_Pebble/batch=true/versions=100/valueSize=8/numRangeKeys=0-24       83.9µs ± 1%    84.3µs ± 1%    ~     (p=0.310 n=5+5)
MVCCGet_Pebble/batch=true/versions=100/valueSize=8/numRangeKeys=1-24       90.8µs ± 2%    90.5µs ± 1%    ~     (p=1.000 n=5+5)
MVCCGet_Pebble/batch=true/versions=100/valueSize=8/numRangeKeys=100-24      146µs ± 1%     146µs ± 0%    ~     (p=0.556 n=5+4)
```

Resolves #88908.

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
  • Loading branch information
craig[bot] and erikgrinaker committed Dec 28, 2022
2 parents 6723e00 + 2e7292d commit 76e3ea8
Show file tree
Hide file tree
Showing 15 changed files with 397 additions and 3,312 deletions.
43 changes: 21 additions & 22 deletions docs/tech-notes/mvcc-range-tombstones.md
Original file line number Diff line number Diff line change
Expand Up @@ -402,39 +402,38 @@ However, callers may request tombstones to be emitted via the `Tombstones`
option, e.g. for conflict checks and rangefeed value diffs. We do not want to
burden the rest of the codebase with having to handle MVCC range tombstones
explicitly, and therefore do not expose them directly. Instead, we synthesize
MVCC point tombstones at the start of MVCC range tombstones and wherever they
overlap MVCC point keys, via `pointSynthesizingIter`.
MVCC point tombstones. A scan emits synthetic point tombstones above existing
point keys, while a get emits synthetic point tombstones if a range tombstone
overlaps the key, regardless of whether there is an existing point key below it.

Consider this example:

```
Time
4 [---|----------)
3 c3
6 [---|----------)
5 c5
4 [----------)
3
2 [----------)
1 c1 d1
1 d1
a b c d e Key
```

An `MVCCScan` with `Tombstones` across this span at timestamp >= 4 would emit
synthetic MVCC point tombstones at `a@4`, `b@4`, `c@4`, and `d@4`.

Note, however, that these synthetic point tombstones are not always stable,
because the start key of MVCC range tombstones can change, e.g. due to iterator
truncation, fragmentation, and CRDB range splits or merges. For example, an
`MVCCScan` across `[bar-foo)` would emit a synthetic MVCC point tombstone at
`bar@4`, because this is the truncated start bound of the range tombstone, but
a scan across `[abc-def)` would not see `bar@4`, but instead see `abc@4`. The
same would happen if the CRDB range was split or merged at the key `bar`.
An `MVCCScan` with `Tombstones` across this span at timestamp >= 6 would emit
synthetic MVCC point tombstones at `c@6` and `d@6`. However, a scan at timestamp
3 would only emit a synthetic tombstone at `d@2`, because the range tombstone is
below the point `c@5` and point tombstones are not synthesized below point keys.
Similarly, a scan across `[a-b)` at any timestamp would not emit anything.

Additionally, an `MVCCGet` will emit a synthetic point tombstone for the key
even if it has no existing point keys: `Get(bar)` at timestamp >= 4 would return
a tombstone `bar@4` even though there is no real point key at `bar`, as this
might be required e.g. for conflict checks. These synthetic tombstones would
similarly not be visible to an `MVCCScan`.

Callers must take care not to rely on these point tombstones being stable, or
use an `MVCCIterator` that exposes MVCC range tombstones directly if needed.
even if it has no existing point key below it, as these might be required for
conflict checks. For example, `Get(bar)` at timestamp >= 6 would return a
tombstone `bar@6` even though there is no real point key at `bar`. Similarly, a
`Get(c)` at timestamp 3 would return `c@2` even though the point key `c@5` is
above it. These synthetic tombstones would not be visible to an `MVCCScan`.

If callers need better visibility into range tombstones, they must use an
`MVCCIterator` that exposes them directly.

### KV APIs

Expand Down
1 change: 0 additions & 1 deletion pkg/storage/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ go_library(
"pebble_iterator.go",
"pebble_merge.go",
"pebble_mvcc_scanner.go",
"point_synthesizing_iter.go",
"read_as_of_iterator.go",
"replicas_storage.go",
"row_counter.go",
Expand Down
32 changes: 10 additions & 22 deletions pkg/storage/mvcc.go
Original file line number Diff line number Diff line change
Expand Up @@ -959,13 +959,12 @@ func newMVCCIterator(
// In tombstones mode, if the most recent value is a deletion tombstone, the
// result will be a non-nil roachpb.Value whose RawBytes field is nil.
// Otherwise, a deletion tombstone results in a nil roachpb.Value. MVCC range
// tombstones will be emitted as synthetic point tombstones, regardless of whether
// there is an existing point key.
// tombstones are emitted as synthetic point tombstones, even if there is no
// existing point key at the position.
//
// NB: Synthetic tombstones generated for MVCC range tombstones may not be
// visible to an MVCCScan if there is no existing point key at the key, since
// MVCCScan only synthesizes them at the MVCC range tombstone's start key and
// around existing keys. Callers must not rely on such semantics.
// visible to an MVCCScan, since MVCCScan will only synthesize point tombstones
// above existing point keys.
//
// In inconsistent mode, if an intent is encountered, it will be placed in the
// dedicated return parameter. By contrast, in consistent mode, an intent will
Expand Down Expand Up @@ -3817,21 +3816,11 @@ type MVCCScanResult struct {
// In tombstones mode, if the most recent value for a key is a deletion
// tombstone, the scan result will contain a roachpb.KeyValue for that key whose
// RawBytes field is nil. Otherwise, the key-value pair will be omitted from the
// result entirely. For MVCC range tombstones, synthetic MVCC point tombstones
// will be emitted at the start of the range tombstone and where they overlap a
// point key.
//
// NB: Synthetic MVCC point tombstones emitted for MVCC range tombstones are
// not stable, nor are they fully deterministic. For example, the start key will
// be truncated by iterator bounds, so an MVCCScan over a given key span may see
// a synthetic point tombstone at its start (if it overlaps an MVCC range
// tombstone), but this will not be emitted if a broader span is used (a
// different point tombstone will be emitted instead). Similarly, a CRDB range
// split/merge will split/merge MVCC range tombstones, changing which point
// tombstones are emitted. Furthermore, MVCCGet will synthesize an MVCC point
// tombstone if it overlaps an MVCC range tombstone and there is no existing
// point key there, while an MVCCScan will not emit these. Callers must take
// care not to rely on such semantics for MVCC tombstones.
// result entirely. MVCC range tombstones will be emitted as synthetic point
// tombstones above existing point keys, but not below them and not if they
// don't overlap any point keys at all. This is unlike MVCCGet, which will
// always synthesize point tombstones if the key overlaps a range tombstone,
// regardless of whether a point key exists below it.
//
// When scanning inconsistently, any encountered intents will be placed in the
// dedicated result parameter. By contrast, when scanning consistently, any
Expand Down Expand Up @@ -3921,8 +3910,7 @@ func MVCCScanAsTxn(
// scan options specify an inconsistent scan, all "ignored" intents will be
// returned. In consistent mode, intents are only ever returned as part of a
// WriteIntentError. In Tombstones mode, MVCC range tombstones are emitted as
// synthetic point tombstones at their start key and around overlapping point
// keys.
// synthetic point tombstones above existing point keys.
func MVCCIterate(
ctx context.Context,
reader Reader,
Expand Down
5 changes: 1 addition & 4 deletions pkg/storage/mvcc_history_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ var (
// scan [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> [end=<key>] [inconsistent] [skipLocked] [tombstones] [reverse] [failOnMoreRecent] [localUncertaintyLimit=<int>[,<int>]] [globalUncertaintyLimit=<int>[,<int>]] [max=<max>] [targetbytes=<target>] [wholeRows[=<int>]] [allowEmpty]
// export [k=<key>] [end=<key>] [ts=<int>[,<int>]] [kTs=<int>[,<int>]] [startTs=<int>[,<int>]] [maxIntents=<int>] [allRevisions] [targetSize=<int>] [maxSize=<int>] [stopMidKey] [fingerprint]
//
// iter_new [k=<key>] [end=<key>] [prefix] [kind=key|keyAndIntents] [types=pointsOnly|pointsWithRanges|pointsAndRanges|rangesOnly] [pointSynthesis] [maskBelow=<int>[,<int>]]
// iter_new [k=<key>] [end=<key>] [prefix] [kind=key|keyAndIntents] [types=pointsOnly|pointsWithRanges|pointsAndRanges|rangesOnly] [maskBelow=<int>[,<int>]]
// iter_new_incremental [k=<key>] [end=<key>] [startTs=<int>[,<int>]] [endTs=<int>[,<int>]] [types=pointsOnly|pointsWithRanges|pointsAndRanges|rangesOnly] [maskBelow=<int>[,<int>]] [intents=error|aggregate|emit]
// iter_seek_ge k=<key> [ts=<int>[,<int>]]
// iter_seek_lt k=<key> [ts=<int>[,<int>]]
Expand Down Expand Up @@ -1599,9 +1599,6 @@ func cmdIterNew(e *evalCtx) error {

r := e.newReader()
iter := r.NewMVCCIterator(kind, opts)
if e.hasArg("pointSynthesis") {
iter = storage.NewPointSynthesizingIter(iter)
}
iter = newMetamorphicIterator(e.t, e.metamorphicIterSeed(), iter).(storage.MVCCIterator)
if opts.Prefix != iter.IsPrefix() {
return errors.Errorf("prefix iterator returned IsPrefix=false")
Expand Down
Loading

0 comments on commit 76e3ea8

Please sign in to comment.