rangefeed,storage: use time-bound iterator for with-diff catchup scan #80591

sumeerbhola · 2022-04-26T21:38:16Z

The MVCCIncrementalIterator already has support for selectively
seeing versions beyond the time bounds, via the NextIgnoringTime
method. This method is sufficient for with-diff catchup scans -- the
changes to MVCCIncrementalIterator here are limited to comment
improvements.

To utilize NextIgnoringTime in the CatchUpIterator we need to
distinguish between the reasons that a catchup scan wants to
iterate to an older version of a key. These are now handled by
different methods via the simpleCatchupIter interface. There
is a simpleCatchupIterAdapter to provide a trivial implementation
when the underlying iterator is a SimpleMVCCIterator.

There is more testing of MVCCIncrementalIterator's NextIgnoringTime,
and there is now a TestCatchupScan that tests with and without diff.
BenchmarkCatchUpScan now also benchmarks with-diff. Results for
linear-keys, where time-bound iteration shows an improvement:

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16         	       3	 397238811 ns/op	253609173 B/op	 3006262 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-16        	       6	 192314659 ns/op	126847284 B/op	 1503167 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-16        	      12	  93243340 ns/op	63433929 B/op	  751606 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-16        	      66	  17636059 ns/op	12691501 B/op	  150356 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-16        	     296	   3566355 ns/op	 2560979 B/op	   30126 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16        	       2	 587807376 ns/op	253689672 B/op	 3006770 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16       	       4	 286110763 ns/op	126893106 B/op	 1503413 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16       	       8	 140012384 ns/op	63458643 B/op	  751740 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16       	      43	  26210520 ns/op	12695466 B/op	  150380 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16       	     200	   5381206 ns/op	 2561444 B/op	   30133 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=0.00-16        	       3	 379387200 ns/op	253577896 B/op	 3006228 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=50.00-16       	       4	 294533788 ns/op	130799366 B/op	 1503677 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=75.00-16       	       4	 254943713 ns/op	69418826 B/op	  752487 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=95.00-16       	       5	 220269848 ns/op	20291392 B/op	  151411 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16       	       5	 211813333 ns/op	10473760 B/op	   31231 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16       	       3	 379575618 ns/op	253591349 B/op	 3006275 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16      	       4	 282490494 ns/op	126852270 B/op	 1503388 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16      	       5	 241938049 ns/op	63463176 B/op	  751985 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16      	       5	 210848534 ns/op	12741640 B/op	  150944 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16      	       6	 196175587 ns/op	 2602912 B/op	   30652 allocs/op

Fixes #78974

Release note: None

cockroach-teamcity · 2022-04-26T21:38:25Z

This change is

The MVCCIncrementalIterator already has support for selectively seeing versions beyond the time bounds, via the NextIgnoringTime method. This method is sufficient for with-diff catchup scans -- the changes to MVCCIncrementalIterator here are limited to comment improvements. To utilize NextIgnoringTime in the CatchUpIterator we need to distinguish between the reasons that a catchup scan wants to iterate to an older version of a key. These are now handled by different methods via the simpleCatchupIter interface. There is a simpleCatchupIterAdapter to provide a trivial implementation when the underlying iterator is a SimpleMVCCIterator. There is more testing of MVCCIncrementalIterator's NextIgnoringTime, and there is now a TestCatchupScan that tests with and without diff. BenchmarkCatchUpScan now also benchmarks with-diff. Results for linear-keys, where time-bound iteration shows an improvement: BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16 3 397238811 ns/op 253609173 B/op 3006262 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-16 6 192314659 ns/op 126847284 B/op 1503167 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-16 12 93243340 ns/op 63433929 B/op 751606 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-16 66 17636059 ns/op 12691501 B/op 150356 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-16 296 3566355 ns/op 2560979 B/op 30126 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16 2 587807376 ns/op 253689672 B/op 3006770 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16 4 286110763 ns/op 126893106 B/op 1503413 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16 8 140012384 ns/op 63458643 B/op 751740 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16 43 26210520 ns/op 12695466 B/op 150380 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16 200 5381206 ns/op 2561444 B/op 30133 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=0.00-16 3 379387200 ns/op 253577896 B/op 3006228 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=50.00-16 4 294533788 ns/op 130799366 B/op 1503677 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=75.00-16 4 254943713 ns/op 69418826 B/op 752487 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=95.00-16 5 220269848 ns/op 20291392 B/op 151411 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16 5 211813333 ns/op 10473760 B/op 31231 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16 3 379575618 ns/op 253591349 B/op 3006275 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16 4 282490494 ns/op 126852270 B/op 1503388 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16 5 241938049 ns/op 63463176 B/op 751985 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16 5 210848534 ns/op 12741640 B/op 150944 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16 6 196175587 ns/op 2602912 B/op 30652 allocs/op Fixes cockroachdb#78974 Release note: None

stevendanna

Nice. I honestly am not sure why we thought we needed to do more than this originally.

Given the size of this change, I wonder if we should consider backporting it.

erikgrinaker · 2022-04-27T08:54:54Z

While we're here, should we remove the useTBI parameter and kv.rangefeed.catchup_scan_iterator_optimization.enabled setting for 22.2? Related to #79728.

miretskiy

+1 on backporting this. It effects existing customers and one of the top reasons for escalations.
This is huge.

sumeerbhola

Added backport labels for 21.2 and 22.1.

Reviewable status: complete! 0 of 0 LGTMs obtained

sumeerbhola

While we're here, should we remove the useTBI parameter and kv.rangefeed.catchup_scan_iterator_optimization.enabled setting for 22.2?

I'll leave it for someone else since we will try to backport the changes in this PR.

Reviewable status: complete! 0 of 0 LGTMs obtained

sumeerbhola · 2022-04-27T20:00:59Z

TFTRs!

sumeerbhola · 2022-04-27T20:01:31Z

bors r=stevendanna,miretskiy

craig · 2022-04-27T22:44:11Z

Build succeeded:

GitHub CI (Cockroach)

blathers-crl · 2022-04-27T22:44:30Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from 5a2c3e9 to blathers/backport-release-21.2-80591: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

shermanCRL · 2022-10-14T13:21:08Z

Do we intend to continue on the v21.2 backport (merged failed above)? While that may seem like ancient history, there are users still catching up on old versions, and this is a big win.

sumeerbhola · 2022-10-18T19:03:33Z

@shermanCRL I'm happy to review a manual backport if someone wants to attempt it. The changes here were very small, so should not be hard.

jayshrivastava · 2022-10-18T20:39:40Z

I'll take on the manual backport 👍🏽

ajwerner · 2022-10-24T19:40:21Z

There's something surprising in this data. See these two rows:

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16         	       3	 397238811 ns/op	253609173 B/op	 3006262 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16        	       2	 587807376 ns/op	253689672 B/op	 3006770 allocs/op

Why is withDiff=false slower than withDiff=true when useTBI=true?

sumeerbhola · 2022-10-28T12:56:02Z

@ajwerner Indeed odd. @jayshrivastava can you try to rerun to see if this is reproducible? If so, we should grab cpu profiles for those 2 cases.

90565: rangefeed,storage: use time-bound iterator for with-diff catchup scan r=jayshrivastava a=jayshrivastava Backport of #80591 /cc https://github.com/orgs/cockroachdb/teams/release Release note: None Epic: None Co-authored-by: sumeerbhola <[email protected]>

jayshrivastava · 2022-10-28T19:13:00Z

@sumeerbhola Ran it a few times. withDiff=true is slower when useTBI=true. Attached the CPU profiles below.

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-10                        4         287635781 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-10                       7         152578839 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-10                      15          74391858 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-10                      82          13435338 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-10                     367           2731949 ns/op

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-10                       3         452019403 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-10                      5         233925617 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-10                      9         113711144 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-10                     51          21539938 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-10                    247           4255000 ns/op

pprof-cpu-profiles.zip

ajwerner · 2022-10-28T20:22:22Z

That seems shocking to me. I'd expect withDiff=false to be faster.

sumeerbhola · 2022-10-31T18:58:57Z

The profiles seem to compare all the benchmark runs with-diff with without-diff, and not a particular benchmark of each kind, so it is a bit difficult to interpret. But I did notice that almost all the time is in loadBlock, probably because the block cache is tiny. So I/O dominates these benchmarks which is not representative. And the with-diff benchmarks have all their time spent in NextIgnoringTime which only moves the primary iter and not the timeBoundIter. The with-diff benchmarks are spending all their time in Next. My memory of the workload is poor, so please verify my understanding (I am pretty sure what I am saying here is at least partially incorrect): if each key has some versions that are relevant, the with-diff will use only NextIgnoringTime and NextKey which may end up never using the timeBoundIter, while if the without-diff version is moving both iter and timeBoundIter, the small cache can cause both to miss on the same block, so the cost can be unnecessarily inflated.

sumeerbhola requested review from stevendanna and miretskiy April 26, 2022 21:38

sumeerbhola requested review from a team as code owners April 26, 2022 21:38

sumeerbhola force-pushed the with_diff branch from 4904ed4 to 5a2c3e9 Compare April 27, 2022 00:45

stevendanna approved these changes Apr 27, 2022

View reviewed changes

miretskiy approved these changes Apr 27, 2022

View reviewed changes

sumeerbhola added backport-21.2.x labels Apr 27, 2022

sumeerbhola commented Apr 27, 2022

View reviewed changes

craig bot merged commit 0782bef into cockroachdb:master Apr 27, 2022

blathers-crl bot mentioned this pull request Apr 27, 2022

release-22.1: rangefeed,storage: use time-bound iterator for with-diff catchup scan #80673

Merged

jayshrivastava mentioned this pull request Oct 24, 2022

release-21.2: rangefeed,storage: use time-bound iterator for with-diff catchup scan #90565

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rangefeed,storage: use time-bound iterator for with-diff catchup scan #80591

rangefeed,storage: use time-bound iterator for with-diff catchup scan #80591

sumeerbhola commented Apr 26, 2022

cockroach-teamcity commented Apr 26, 2022

stevendanna left a comment

erikgrinaker commented Apr 27, 2022 •

edited

Loading

miretskiy left a comment

sumeerbhola left a comment

sumeerbhola left a comment

sumeerbhola commented Apr 27, 2022

sumeerbhola commented Apr 27, 2022

craig bot commented Apr 27, 2022

blathers-crl bot commented Apr 27, 2022

shermanCRL commented Oct 14, 2022

sumeerbhola commented Oct 18, 2022

jayshrivastava commented Oct 18, 2022

ajwerner commented Oct 24, 2022

sumeerbhola commented Oct 28, 2022

jayshrivastava commented Oct 28, 2022

ajwerner commented Oct 28, 2022

sumeerbhola commented Oct 31, 2022

rangefeed,storage: use time-bound iterator for with-diff catchup scan #80591

rangefeed,storage: use time-bound iterator for with-diff catchup scan #80591

Conversation

sumeerbhola commented Apr 26, 2022

cockroach-teamcity commented Apr 26, 2022

stevendanna left a comment

Choose a reason for hiding this comment

erikgrinaker commented Apr 27, 2022 • edited Loading

miretskiy left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

sumeerbhola commented Apr 27, 2022

sumeerbhola commented Apr 27, 2022

craig bot commented Apr 27, 2022

blathers-crl bot commented Apr 27, 2022

shermanCRL commented Oct 14, 2022

sumeerbhola commented Oct 18, 2022

jayshrivastava commented Oct 18, 2022

ajwerner commented Oct 24, 2022

sumeerbhola commented Oct 28, 2022

jayshrivastava commented Oct 28, 2022

ajwerner commented Oct 28, 2022

sumeerbhola commented Oct 31, 2022

erikgrinaker commented Apr 27, 2022 •

edited

Loading