Skip to content

Commit

Permalink
kv: re-enable time-bound iterators for RefreshRange request
Browse files Browse the repository at this point in the history
Closes cockroachdb#53348. The other two requests mentioned in that issue
(`ResolveIntentRange` and `EndTxn`) would no longer benefit from time-bound
iterators because, thanks to b5213fd, they no longer scan the MVCC keyspace.

Transaction refreshing is a form of an optimistic concurrency control validation
phase. Before a transaction can commit, if it will be committing at a timestamp
higher than its original timestamp, it issues point and ranged refresh requests
to the key spans it had previous read. The refresh requests scan a span of keys
and determine whether any new values have been written since the transaction
originally read the keys.

The use of time-bound iterators is an important optimization for ranged refresh
operations because we expect very few new writes between the time that a
transaction originally reads and the time that it refreshes.

Without this optimization, each refresh was redoing all of a transaction's reads
at their original cost. This effectively doubled the cost of reads for
transactions that had to refresh (or worse for those that refreshed multiple
times). With this optimization, refreshing a span of keys is expected to be
significantly cheaper than the original scan over that span of keys, because it
can ignore most files in the lower levels of the LSM.

RefreshRange requests were originally built to use time-bound iterators.
However, this optimization was disabled in 1eb3b2a due to concerns about
correctness. Since then, then correctness concerns have been addressed and
we have begun using time-bound iterators in a handful of places.

This commit re-enables time-bound iterators for `RefreshRange` requests. It does
so by using `MVCCIncrementalIterator`, which was enhanced to support additional
"intent policies" in 87c7f11. This commit uses the "emit" intent policy so that
`RefreshRange` will observe all values and all intents in the refresh time
window.

----

Microbenchmarks:
```
name                                                      old time/op    new time/op    delta
RefreshRange/linear-keys/refresh_window=[95.00,99.00]-10     230ms ± 1%       0ms ± 1%   -99.99%  (p=0.000 n=9+9)
RefreshRange/linear-keys/refresh_window=[75.00,99.00]-10     185ms ± 1%       0ms ± 1%   -99.99%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,95.00]-10     185ms ± 1%       0ms ± 1%   -99.99%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[50.00,75.00]-10     123ms ± 1%       0ms ± 2%   -99.99%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,95.00]-10     123ms ± 1%       0ms ± 2%   -99.99%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,99.00]-10     123ms ± 0%       0ms ± 1%   -99.99%  (p=0.000 n=10+8)
RefreshRange/linear-keys/refresh_window=[99.00,99.00]-10     240ms ± 1%       0ms ± 1%   -99.96%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[95.00,95.00]-10     237ms ± 1%       0ms ± 2%   -99.96%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,75.00]-10     224ms ± 0%       0ms ± 1%   -99.95%  (p=0.000 n=9+9)
RefreshRange/linear-keys/refresh_window=[50.00,50.00]-10     207ms ± 1%       0ms ± 3%   -99.95%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,0.00]-10       174ms ± 1%       0ms ± 1%   -99.93%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,0.00]-10       189ms ± 1%       0ms ± 1%   -99.86%  (p=0.000 n=9+9)
RefreshRange/mixed-case/refresh_window=[0.00,0.00]-10        184ms ± 0%       0ms ± 0%   -99.85%  (p=0.000 n=8+9)
RefreshRange/mixed-case/refresh_window=[95.00,95.00]-10      252ms ± 0%       1ms ± 2%   -99.70%  (p=0.000 n=8+10)
RefreshRange/random-keys/refresh_window=[0.00,50.00]-10      412µs ± 1%      13µs ± 2%   -96.83%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,50.00]-10       413µs ± 1%      13µs ± 1%   -96.78%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,75.00]-10       292µs ± 1%      13µs ± 1%   -95.43%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,95.00]-10      245µs ± 1%      13µs ± 2%   -94.64%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,95.00]-10     245µs ± 0%      14µs ± 1%   -94.49%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[0.00,99.00]-10      238µs ± 1%      13µs ± 1%   -94.48%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,99.00]-10      237µs ± 1%      13µs ± 2%   -94.42%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,99.00]-10     237µs ± 2%      14µs ± 1%   -94.29%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[50.00,75.00]-10      292µs ± 1%      17µs ± 1%   -94.07%  (p=0.000 n=10+9)
RefreshRange/linear-keys/refresh_window=[0.00,75.00]-10      225µs ± 2%      14µs ± 1%   -94.00%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,99.00]-10      224µs ± 1%      13µs ± 1%   -93.99%  (p=0.000 n=10+8)
RefreshRange/linear-keys/refresh_window=[0.00,95.00]-10      224µs ± 1%      14µs ± 1%   -93.95%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,95.00]-10     244µs ± 0%      15µs ± 1%   -93.86%  (p=0.000 n=7+10)
RefreshRange/mixed-case/refresh_window=[0.00,99.00]-10       237µs ± 1%      15µs ± 1%   -93.82%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,50.00]-10      224µs ± 1%      14µs ± 1%   -93.76%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,95.00]-10       244µs ± 1%      15µs ± 1%   -93.75%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,99.00]-10     238µs ± 1%      15µs ± 1%   -93.72%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[75.00,99.00]-10      236µs ± 1%      15µs ± 2%   -93.64%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,99.00]-10      236µs ± 1%      15µs ± 1%   -93.63%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[50.00,95.00]-10      244µs ± 0%      16µs ± 1%   -93.58%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[75.00,95.00]-10      244µs ± 1%      16µs ± 0%   -93.55%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[0.00,75.00]-10      287µs ± 1%      19µs ± 1%   -93.20%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[95.00,99.00]-10     237µs ± 1%      17µs ± 1%   -92.69%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,75.00]-10     288µs ± 2%      23µs ± 1%   -91.98%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[99.00,99.00]-10      255ms ± 1%     122ms ± 1%   -52.02%  (p=0.000 n=9+9)
RefreshRange/random-keys/refresh_window=[75.00,75.00]-10     242ms ± 1%     152ms ± 1%   -37.02%  (p=0.000 n=10+9)
RefreshRange/random-keys/refresh_window=[99.00,99.00]-10     259ms ± 0%     354ms ± 1%   +36.73%  (p=0.000 n=7+9)
RefreshRange/random-keys/refresh_window=[95.00,95.00]-10     256ms ± 1%     353ms ± 1%   +37.65%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[75.00,75.00]-10      242ms ± 0%     398ms ± 1%   +64.38%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[50.00,50.00]-10      227ms ± 0%     392ms ± 1%   +72.65%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[50.00,50.00]-10     229ms ± 1%     512ms ± 1%  +123.45%  (p=0.000 n=9+9)

name                                                      old alloc/op   new alloc/op   delta
RefreshRange/linear-keys/refresh_window=[99.00,99.00]-10     195MB ± 0%       0MB ± 0%  -100.00%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[95.00,95.00]-10     188MB ± 0%       0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,95.00]-10      188MB ± 0%       0MB ± 1%  -100.00%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,75.00]-10     148MB ± 0%       0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,50.00]-10    98.8MB ± 0%     0.0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[99.00,99.00]-10      195MB ± 0%       0MB ± 3%  -100.00%  (p=0.000 n=9+8)
RefreshRange/linear-keys/refresh_window=[95.00,99.00]-10     188MB ± 0%       0MB ± 0%  -100.00%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,95.00]-10     148MB ± 0%       0MB ± 0%   -99.99%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,99.00]-10     148MB ± 0%       0MB ± 0%   -99.99%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[50.00,75.00]-10    99.0MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[50.00,95.00]-10    99.0MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=8+9)
RefreshRange/linear-keys/refresh_window=[50.00,99.00]-10    99.0MB ± 0%     0.0MB ± 0%   -99.99%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[75.00,75.00]-10      148MB ± 0%       0MB ±29%   -99.99%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,75.00]-10     148MB ± 0%       0MB ± 6%   -99.98%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[50.00,50.00]-10     98.7MB ± 0%     0.0MB ± 1%   -99.98%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[99.00,99.00]-10     195MB ± 0%       0MB ± 4%   -99.97%  (p=0.000 n=9+8)
RefreshRange/random-keys/refresh_window=[95.00,95.00]-10     188MB ± 0%       0MB ± 3%   -99.97%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[50.00,50.00]-10    98.7MB ± 0%     0.1MB ±12%   -99.92%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,0.00]-10      41.9kB ± 5%     1.1kB ± 0%   -97.27%  (p=0.000 n=9+9)
RefreshRange/linear-keys/refresh_window=[0.00,50.00]-10      208kB ± 0%       8kB ± 0%   -96.04%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[95.00,99.00]-10     208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[50.00,75.00]-10      208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=8+10)
RefreshRange/mixed-case/refresh_window=[0.00,75.00]-10       208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+9)
RefreshRange/mixed-case/refresh_window=[0.00,50.00]-10       207kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[0.00,95.00]-10      208kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,50.00]-10      207kB ± 0%       8kB ± 0%   -96.03%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,75.00]-10      208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,99.00]-10     208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,95.00]-10     208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[75.00,99.00]-10     208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,95.00]-10      208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,95.00]-10     208kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,75.00]-10      207kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,75.00]-10     207kB ± 0%       8kB ± 0%   -96.02%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[0.00,99.00]-10       208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,95.00]-10      208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[75.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,95.00]-10       208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,99.00]-10      208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[75.00,95.00]-10      208kB ± 0%       8kB ± 0%   -96.01%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,0.00]-10      29.3kB ± 4%     1.2kB ± 0%   -95.95%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[0.00,0.00]-10       9.07kB ±25%    1.13kB ± 0%   -87.51%  (p=0.000 n=10+10)

name                                                      old allocs/op  new allocs/op  delta
RefreshRange/linear-keys/refresh_window=[99.00,99.00]-10     18.8k ± 0%      0.0k ± 0%   -99.91%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[95.00,95.00]-10     18.1k ± 0%      0.0k ± 0%   -99.91%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,95.00]-10      17.5k ± 0%      0.0k ± 0%   -99.90%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[75.00,75.00]-10     14.4k ± 0%      0.0k ± 0%   -99.88%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,50.00]-10     9.81k ± 0%     0.02k ± 0%   -99.83%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[95.00,99.00]-10     18.2k ± 0%      0.1k ± 0%   -99.69%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[75.00,95.00]-10     14.4k ± 0%      0.1k ± 0%   -99.61%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[75.00,99.00]-10     14.4k ± 0%      0.1k ± 0%   -99.61%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[99.00,99.00]-10      18.2k ± 0%      0.1k ± 2%   -99.51%  (p=0.000 n=10+8)
RefreshRange/linear-keys/refresh_window=[50.00,95.00]-10     9.62k ± 0%     0.06k ± 0%   -99.40%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,75.00]-10     9.62k ± 0%     0.06k ± 0%   -99.40%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[50.00,99.00]-10     9.62k ± 0%     0.06k ± 0%   -99.40%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[75.00,75.00]-10      13.8k ± 0%      0.2k ±28%   -98.85%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[50.00,50.00]-10      9.25k ± 0%     0.17k ± 6%   -98.11%  (p=0.000 n=10+9)
RefreshRange/random-keys/refresh_window=[75.00,75.00]-10     14.2k ± 0%      0.3k ± 8%   -97.89%  (p=0.000 n=10+9)
RefreshRange/linear-keys/refresh_window=[0.00,0.00]-10         535 ± 3%        17 ± 0%   -96.82%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[99.00,99.00]-10     18.6k ± 0%      0.8k ± 3%   -95.56%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[95.00,95.00]-10     17.9k ± 0%      0.8k ± 2%   -95.34%  (p=0.000 n=10+8)
RefreshRange/random-keys/refresh_window=[0.00,0.00]-10         351 ± 3%        18 ± 0%   -94.88%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[50.00,50.00]-10     9.59k ± 0%     0.95k ±12%   -90.05%  (p=0.000 n=9+10)
RefreshRange/mixed-case/refresh_window=[0.00,0.00]-10         73.7 ±10%      17.0 ± 0%   -76.93%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,50.00]-10       80.0 ± 0%      56.0 ± 0%   -30.00%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,50.00]-10       79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=9+10)
RefreshRange/random-keys/refresh_window=[95.00,99.00]-10      79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,50.00]-10        79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,75.00]-10        79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,75.00]-10       79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[95.00,99.00]-10       79.0 ± 0%      56.0 ± 0%   -29.11%  (p=0.000 n=10+10)
RefreshRange/linear-keys/refresh_window=[0.00,75.00]-10       80.0 ± 0%      58.0 ± 0%   -27.50%  (p=0.000 n=9+10)
RefreshRange/linear-keys/refresh_window=[0.00,95.00]-10       80.0 ± 0%      58.0 ± 0%   -27.50%  (p=0.002 n=8+10)
RefreshRange/linear-keys/refresh_window=[0.00,99.00]-10       79.7 ± 1%      58.0 ± 0%   -27.23%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,75.00]-10       79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,95.00]-10       79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[0.00,99.00]-10       79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,75.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,95.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[50.00,99.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,95.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/random-keys/refresh_window=[75.00,99.00]-10      79.0 ± 0%      58.0 ± 0%   -26.58%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,95.00]-10        79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[0.00,99.00]-10        79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,95.00]-10       79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[50.00,99.00]-10       79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[75.00,95.00]-10       79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
RefreshRange/mixed-case/refresh_window=[75.00,99.00]-10       79.0 ± 0%      60.0 ± 0%   -24.05%  (p=0.000 n=10+10)
```

----

Release note (performance improvement): transaction read refresh operations
performed during optimistic concurrency control's validation phase now use a
time-bound file filter when scanning the LSM tree. This allows these operations
to avoid scanning files that contain no keys written since the transaction
originally performed its reads.
  • Loading branch information
nvanbenschoten committed Jan 27, 2022
1 parent a0ee732 commit 57cc377
Show file tree
Hide file tree
Showing 6 changed files with 400 additions and 41 deletions.
1 change: 1 addition & 0 deletions pkg/kv/kvserver/batcheval/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
refresh_range_bench_data_*
5 changes: 5 additions & 0 deletions pkg/kv/kvserver/batcheval/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ go_test(
"cmd_lease_test.go",
"cmd_query_resolved_timestamp_test.go",
"cmd_recover_txn_test.go",
"cmd_refresh_range_bench_test.go",
"cmd_refresh_range_test.go",
"cmd_refresh_test.go",
"cmd_resolve_intent_test.go",
Expand Down Expand Up @@ -149,6 +150,7 @@ go_test(
"//pkg/testutils/sstutil",
"//pkg/testutils/testcluster",
"//pkg/util",
"//pkg/util/encoding",
"//pkg/util/hlc",
"//pkg/util/leaktest",
"//pkg/util/log",
Expand All @@ -158,6 +160,9 @@ go_test(
"//pkg/util/uint128",
"//pkg/util/uuid",
"@com_github_cockroachdb_errors//:errors",
"@com_github_cockroachdb_errors//oserror",
"@com_github_cockroachdb_pebble//:pebble",
"@com_github_cockroachdb_pebble//vfs",
"@com_github_stretchr_testify//assert",
"@com_github_stretchr_testify//require",
],
Expand Down
115 changes: 83 additions & 32 deletions pkg/kv/kvserver/batcheval/cmd_refresh_range.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,25 @@ import (

"github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/result"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/settings"
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/storage/enginepb"
"github.com/cockroachdb/cockroach/pkg/util"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/protoutil"
"github.com/cockroachdb/cockroach/pkg/util/uuid"
"github.com/cockroachdb/errors"
)

// refreshRangeTBIEnabled controls whether we use a TBI during ranged refreshes.
var refreshRangeTBIEnabled = settings.RegisterBoolSetting(
settings.SystemOnly,
"kv.refresh_range.time_bound_iterators.enabled",
"use time-bound iterators when performing ranged transaction refreshes",
util.ConstantWithMetamorphicTestBool("kv.refresh_range.time_bound_iterators.enabled", true),
)

func init() {
RegisterReadOnlyCommand(roachpb.RefreshRange, DefaultDeclareKeys, RefreshRange)
}
Expand Down Expand Up @@ -50,40 +64,77 @@ func RefreshRange(
return result.Result{}, errors.AssertionFailedf("empty RefreshFrom: %s", args)
}

// Iterate over values until we discover any value written after the
// original timestamp, but before or at the current timestamp. Note that we
// iterate inconsistently, meaning that intents - including our own - are
// collected separately and the callback is only invoked on the latest
// committed version. Note also that we include tombstones, which must be
// considered as updates on refresh.
log.VEventf(ctx, 2, "refresh %s @[%s-%s]", args.Span(), refreshFrom, refreshTo)
intents, err := storage.MVCCIterate(
ctx, reader, args.Key, args.EndKey, refreshTo,
storage.MVCCScanOptions{
Inconsistent: true,
Tombstones: true,
},
func(kv roachpb.KeyValue) error {
if ts := kv.Value.Timestamp; refreshFrom.Less(ts) {
return roachpb.NewRefreshFailedError(roachpb.RefreshFailedError_REASON_COMMITTED_VALUE, kv.Key, ts)
}
return nil
})
if err != nil {
return result.Result{}, err
}
tbi := refreshRangeTBIEnabled.Get(&cArgs.EvalCtx.ClusterSettings().SV)
return result.Result{}, refreshRange(reader, tbi, args.Span(), refreshFrom, refreshTo, h.Txn.ID)
}

// refreshRange iterates over the specified key span until it discovers a value
// written after the refreshFrom timestamp but before or at the refreshTo
// timestamp. The iteration observes MVCC tombstones, which must be considered
// as conflicts during a refresh. The iteration also observes intents, and any
// intent that is not owned by the specified txn ID is considered a conflict.
//
// If such a conflict is found, the function returns an error. Otherwise, no
// error is returned.
func refreshRange(
reader storage.Reader,
timeBoundIterator bool,
span roachpb.Span,
refreshFrom, refreshTo hlc.Timestamp,
txnID uuid.UUID,
) error {
// Construct an incremental iterator with the desired time bounds. Incremental
// iterators will emit MVCC tombstones by default and will emit intents when
// configured to do so (see IntentPolicy).
iter := storage.NewMVCCIncrementalIterator(reader, storage.MVCCIncrementalIterOptions{
EnableTimeBoundIteratorOptimization: timeBoundIterator,
EndKey: span.EndKey,
StartTime: refreshFrom, // exclusive
EndTime: refreshTo, // inclusive
IntentPolicy: storage.MVCCIncrementalIterIntentPolicyEmit,
})
defer iter.Close()

// Check if any intents which are not owned by this transaction were written
// at or beneath the refresh timestamp.
for _, i := range intents {
// Ignore our own intents.
if i.Txn.ID == h.Txn.ID {
continue
var meta enginepb.MVCCMetadata
iter.SeekGE(storage.MakeMVCCMetadataKey(span.Key))
for {
if ok, err := iter.Valid(); err != nil {
return err
} else if !ok {
break
}
// Return an error if an intent was written to the span.
return result.Result{}, roachpb.NewRefreshFailedError(roachpb.RefreshFailedError_REASON_INTENT,
i.Key, i.Txn.WriteTimestamp)
}

return result.Result{}, nil
key := iter.Key()
if !key.IsValue() {
// Found an intent. Check whether it is owned by this transaction.
// If so, proceed with iteration. Otherwise, return an error.
if err := protoutil.Unmarshal(iter.UnsafeValue(), &meta); err != nil {
return errors.Wrapf(err, "unmarshaling mvcc meta: %v", key)
}
if meta.IsInline() {
// Ignore inline MVCC metadata. We don't expect to see this in practice
// when performing a refresh of an MVCC keyspace.
iter.Next()
continue
}
if meta.Txn.ID == txnID {
// Ignore the transaction's own intent and skip past the corresponding
// provisional key-value. To do this, scan to the timestamp immediately
// before (i.e. the key immediately after) the provisional key.
iter.SeekGE(storage.MVCCKey{
Key: key.Key,
Timestamp: meta.Timestamp.ToTimestamp().Prev(),
})
continue
}
return roachpb.NewRefreshFailedError(roachpb.RefreshFailedError_REASON_INTENT,
key.Key, meta.Txn.WriteTimestamp)
}

// If a committed value is found, return an error.
return roachpb.NewRefreshFailedError(roachpb.RefreshFailedError_REASON_COMMITTED_VALUE,
key.Key, key.Timestamp)
}
return nil
}
Loading

0 comments on commit 57cc377

Please sign in to comment.