Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rangefeed,storage: use time-bound iterator for with-diff catchup scan #80591

Merged
merged 1 commit into from
Apr 27, 2022

Conversation

sumeerbhola
Copy link
Collaborator

The MVCCIncrementalIterator already has support for selectively
seeing versions beyond the time bounds, via the NextIgnoringTime
method. This method is sufficient for with-diff catchup scans -- the
changes to MVCCIncrementalIterator here are limited to comment
improvements.

To utilize NextIgnoringTime in the CatchUpIterator we need to
distinguish between the reasons that a catchup scan wants to
iterate to an older version of a key. These are now handled by
different methods via the simpleCatchupIter interface. There
is a simpleCatchupIterAdapter to provide a trivial implementation
when the underlying iterator is a SimpleMVCCIterator.

There is more testing of MVCCIncrementalIterator's NextIgnoringTime,
and there is now a TestCatchupScan that tests with and without diff.
BenchmarkCatchUpScan now also benchmarks with-diff. Results for
linear-keys, where time-bound iteration shows an improvement:

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16         	       3	 397238811 ns/op	253609173 B/op	 3006262 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-16        	       6	 192314659 ns/op	126847284 B/op	 1503167 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-16        	      12	  93243340 ns/op	63433929 B/op	  751606 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-16        	      66	  17636059 ns/op	12691501 B/op	  150356 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-16        	     296	   3566355 ns/op	 2560979 B/op	   30126 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16        	       2	 587807376 ns/op	253689672 B/op	 3006770 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16       	       4	 286110763 ns/op	126893106 B/op	 1503413 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16       	       8	 140012384 ns/op	63458643 B/op	  751740 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16       	      43	  26210520 ns/op	12695466 B/op	  150380 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16       	     200	   5381206 ns/op	 2561444 B/op	   30133 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=0.00-16        	       3	 379387200 ns/op	253577896 B/op	 3006228 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=50.00-16       	       4	 294533788 ns/op	130799366 B/op	 1503677 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=75.00-16       	       4	 254943713 ns/op	69418826 B/op	  752487 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=95.00-16       	       5	 220269848 ns/op	20291392 B/op	  151411 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16       	       5	 211813333 ns/op	10473760 B/op	   31231 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16       	       3	 379575618 ns/op	253591349 B/op	 3006275 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16      	       4	 282490494 ns/op	126852270 B/op	 1503388 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16      	       5	 241938049 ns/op	63463176 B/op	  751985 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16      	       5	 210848534 ns/op	12741640 B/op	  150944 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16      	       6	 196175587 ns/op	 2602912 B/op	   30652 allocs/op

Fixes #78974

Release note: None

@sumeerbhola sumeerbhola requested review from a team as code owners April 26, 2022 21:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

The MVCCIncrementalIterator already has support for selectively
seeing versions beyond the time bounds, via the NextIgnoringTime
method. This method is sufficient for with-diff catchup scans -- the
changes to MVCCIncrementalIterator here are limited to comment
improvements.

To utilize NextIgnoringTime in the CatchUpIterator we need to
distinguish between the reasons that a catchup scan wants to
iterate to an older version of a key. These are now handled by
different methods via the simpleCatchupIter interface. There
is a simpleCatchupIterAdapter to provide a trivial implementation
when the underlying iterator is a SimpleMVCCIterator.

There is more testing of MVCCIncrementalIterator's NextIgnoringTime,
and there is now a TestCatchupScan that tests with and without diff.
BenchmarkCatchUpScan now also benchmarks with-diff. Results for
linear-keys, where time-bound iteration shows an improvement:

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16         	       3	 397238811 ns/op	253609173 B/op	 3006262 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-16        	       6	 192314659 ns/op	126847284 B/op	 1503167 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-16        	      12	  93243340 ns/op	63433929 B/op	  751606 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-16        	      66	  17636059 ns/op	12691501 B/op	  150356 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-16        	     296	   3566355 ns/op	 2560979 B/op	   30126 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16        	       2	 587807376 ns/op	253689672 B/op	 3006770 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16       	       4	 286110763 ns/op	126893106 B/op	 1503413 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16       	       8	 140012384 ns/op	63458643 B/op	  751740 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16       	      43	  26210520 ns/op	12695466 B/op	  150380 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16       	     200	   5381206 ns/op	 2561444 B/op	   30133 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=0.00-16        	       3	 379387200 ns/op	253577896 B/op	 3006228 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=50.00-16       	       4	 294533788 ns/op	130799366 B/op	 1503677 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=75.00-16       	       4	 254943713 ns/op	69418826 B/op	  752487 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=95.00-16       	       5	 220269848 ns/op	20291392 B/op	  151411 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16       	       5	 211813333 ns/op	10473760 B/op	   31231 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16       	       3	 379575618 ns/op	253591349 B/op	 3006275 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16      	       4	 282490494 ns/op	126852270 B/op	 1503388 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16      	       5	 241938049 ns/op	63463176 B/op	  751985 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16      	       5	 210848534 ns/op	12741640 B/op	  150944 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16      	       6	 196175587 ns/op	 2602912 B/op	   30652 allocs/op

Fixes cockroachdb#78974

Release note: None
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I honestly am not sure why we thought we needed to do more than this originally.

Given the size of this change, I wonder if we should consider backporting it.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 27, 2022

While we're here, should we remove the useTBI parameter and kv.rangefeed.catchup_scan_iterator_optimization.enabled setting for 22.2? Related to #79728.

Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on backporting this. It effects existing customers and one of the top reasons for escalations.
This is huge.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added backport labels for 21.2 and 22.1.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here, should we remove the useTBI parameter and kv.rangefeed.catchup_scan_iterator_optimization.enabled setting for 22.2?

I'll leave it for someone else since we will try to backport the changes in this PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@sumeerbhola
Copy link
Collaborator Author

TFTRs!

@sumeerbhola
Copy link
Collaborator Author

bors r=stevendanna,miretskiy

@craig
Copy link
Contributor

craig bot commented Apr 27, 2022

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Apr 27, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 5a2c3e9 to blathers/backport-release-21.2-80591: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@shermanCRL
Copy link
Contributor

Do we intend to continue on the v21.2 backport (merged failed above)? While that may seem like ancient history, there are users still catching up on old versions, and this is a big win.

@sumeerbhola
Copy link
Collaborator Author

@shermanCRL I'm happy to review a manual backport if someone wants to attempt it. The changes here were very small, so should not be hard.

@jayshrivastava
Copy link
Contributor

I'll take on the manual backport 👍🏽

@ajwerner
Copy link
Contributor

There's something surprising in this data. See these two rows:

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16         	       3	 397238811 ns/op	253609173 B/op	 3006262 allocs/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16        	       2	 587807376 ns/op	253689672 B/op	 3006770 allocs/op

Why is withDiff=false slower than withDiff=true when useTBI=true?

@sumeerbhola
Copy link
Collaborator Author

@ajwerner Indeed odd. @jayshrivastava can you try to rerun to see if this is reproducible? If so, we should grab cpu profiles for those 2 cases.

craig bot pushed a commit that referenced this pull request Oct 28, 2022
90565: rangefeed,storage: use time-bound iterator for with-diff catchup scan r=jayshrivastava a=jayshrivastava

Backport of #80591

/cc https://github.com/orgs/cockroachdb/teams/release

Release note: None
Epic: None

Co-authored-by: sumeerbhola <[email protected]>
@jayshrivastava
Copy link
Contributor

@sumeerbhola Ran it a few times. withDiff=true is slower when useTBI=true. Attached the CPU profiles below.

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-10                        4         287635781 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-10                       7         152578839 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-10                      15          74391858 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-10                      82          13435338 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-10                     367           2731949 ns/op

BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-10                       3         452019403 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-10                      5         233925617 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-10                      9         113711144 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-10                     51          21539938 ns/op
BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-10                    247           4255000 ns/op

pprof-cpu-profiles.zip

@ajwerner
Copy link
Contributor

That seems shocking to me. I'd expect withDiff=false to be faster.

@sumeerbhola
Copy link
Collaborator Author

The profiles seem to compare all the benchmark runs with-diff with without-diff, and not a particular benchmark of each kind, so it is a bit difficult to interpret. But I did notice that almost all the time is in loadBlock, probably because the block cache is tiny. So I/O dominates these benchmarks which is not representative. And the with-diff benchmarks have all their time spent in NextIgnoringTime which only moves the primary iter and not the timeBoundIter. The with-diff benchmarks are spending all their time in Next. My memory of the workload is poor, so please verify my understanding (I am pretty sure what I am saying here is at least partially incorrect): if each key has some versions that are relevant, the with-diff will use only NextIgnoringTime and NextKey which may end up never using the timeBoundIter, while if the without-diff version is moving both iter and timeBoundIter, the small cache can cause both to miss on the same block, so the cost can be unnecessarily inflated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: expand TBI support to include with diff
8 participants