rowexec: use CheckExistsRequest in LEFT SEMI/ANTI joins #53818

helenmhe-zz · 2020-09-02T08:49:37Z

Implements CheckExistsRequest as specified in #49790

cockroach-teamcity · 2020-09-02T08:49:47Z

This change is

asubiotto

Reviewed 1 of 18 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @helenmhe)

pkg/kv/kvclient/kvcoord/txn_interceptor_existence_cache.go, line 28 at r4 (raw file):

	"kv.transaction.existence_cache.enabled",
	"if enabled, key range existence knowledge is maintained for each transaction",
	true,

What happens to benchmarks if you disable this cache?

pkg/sql/row/kv_batch_fetcher.go, line 328 at r4 (raw file):

				ba.Requests[i].MustSetInner(&scans[i])
			}
			ba.Header.MaxSpanRequestKeys = 0

Hmm, I think it would be interesting to disable parallelization in this case and in the scan case (I think you can do this by always setting a max span request keys, although I'm not sure what the value should be) and get some benchmark numbers. This would make any difference between the request types much more noticeable.

helenmhe-zz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)

pkg/kv/kvclient/kvcoord/txn_interceptor_existence_cache.go, line 28 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

What happens to benchmarks if you disable this cache?

At least for fk_test.go I see a noticeable regression

pkg/sql/row/kv_batch_fetcher.go, line 328 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Hmm, I think it would be interesting to disable parallelization in this case and in the scan case (I think you can do this by always setting a max span request keys, although I'm not sure what the value should be) and get some benchmark numbers. This would make any difference between the request types much more noticeable.

I'll try that, would a max span request keys of 1 not work?

This commit adds BenchmarkLeftSemiJoin in order to test the usage of CheckExistsRequest over ScanRequests in left semi/anti joins. Release note: None

Release note: None

Release note (<category, see below>): <what> <show> <why>

helenmhe-zz · 2020-09-03T19:31:35Z

CheckExistsRequest

CheckExistsRequest is currently implemented with a flag checking for types that don’t require right cols, namely left semi/anti joins. Currently in the kv_batch_fetcher, I set ba.Header.MaxSpanRequestKeys and ba.Header.TargetBytes to 0, (not limiting the number of rows returned since we aren’t really returning rows). When testing locally it seemed that this had a large impact on performance (parallel performed ~20% better than batch size limited), but I wasn’t able to recreate anything with a low p-value on gceworker.

Testing

Switching to the inner loop in https://github.com/cockroachdb/cockroach/pull/53669/files#diff-2793c01ca6dd699f8ad1e26f24f700dbR1182 seems to have increased variance, compared to before when I was using -count 100 or so to modify b.N, so despite running faster I don’t know if it’s as desirable for getting statistically significant results? Results with gceworker for some reason tended to have slightly more variance, leading to having very high p-values so unfortunately I didn’t get any worthwhile results on the gceworker after making this change.

The results I got prior to making the benchmark changes are shown below:

name                            old time/op    new time/op    delta
LeftSemiJoin/SingleRow/None-12    13.1ms ±13%    12.9ms ± 9%  -1.73%  (p=0.005 n=86+100)

name                            old alloc/op   new alloc/op   delta
LeftSemiJoin/SingleRow/None-12    4.61MB ± 2%    4.57MB ± 1%  -0.87%  (p=0.000 n=97+98)

name                            old allocs/op  new allocs/op  delta
LeftSemiJoin/SingleRow/None-12     28.4k ± 1%     30.3k ± 1%  +6.85%  (p=0.000 n=90+97)

Along with the corresponding profile call_trees:

Master

CheckExists

Get

To force the GetRequest code path in BenchmarkLeftSemiJoin I had to create the foo index as UNIQUE.

Overall from the testing, it was a bit difficult to isolate the difference between using Scan vs CheckExists even in a benchmark designed to stress the left semi join. I’m not entirely sure why fk_test.go wasn’t hitting this codepath as much, but it could be worthwhile also investigating that.

cockroach-teamcity · 2020-11-10T04:03:26Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ nvanbenschoten
❌ helenmhe
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

helenmhe-zz requested a review from a team September 2, 2020 14:30

asubiotto suggested changes Sep 2, 2020

View reviewed changes

helenmhe-zz commented Sep 2, 2020

View reviewed changes

helenmhe and others added 4 commits September 3, 2020 12:10

rowexec: add BenchmarkLeftSemiJoin

0bbd648

This commit adds BenchmarkLeftSemiJoin in order to test the usage of CheckExistsRequest over ScanRequests in left semi/anti joins. Release note: None

roachpb: add new CheckExistsRequest type

806f027

Release note: None

kv: introduce txnExistenceCache

bd3f8de

Release note: None

use checkExistsRequest

0a31b5d

Release note (<category, see below>): <what> <show> <why>

helenmhe-zz force-pushed the checkExists branch from 63e63a3 to 0a31b5d Compare September 3, 2020 16:11

asubiotto mentioned this pull request Sep 4, 2020

rowexec: improve LEFT SEMI/ANTI lookup join performance #49790

Closed

2 tasks

tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rowexec: use CheckExistsRequest in LEFT SEMI/ANTI joins #53818

rowexec: use CheckExistsRequest in LEFT SEMI/ANTI joins #53818

helenmhe-zz commented Sep 2, 2020 •

edited

Loading

cockroach-teamcity commented Sep 2, 2020

asubiotto left a comment

helenmhe-zz left a comment

helenmhe-zz commented Sep 3, 2020

cockroach-teamcity commented Nov 10, 2020 •

edited

Loading

rowexec: use CheckExistsRequest in LEFT SEMI/ANTI joins #53818

Are you sure you want to change the base?

rowexec: use CheckExistsRequest in LEFT SEMI/ANTI joins #53818

Conversation

helenmhe-zz commented Sep 2, 2020 • edited Loading

cockroach-teamcity commented Sep 2, 2020

asubiotto left a comment

Choose a reason for hiding this comment

helenmhe-zz left a comment

Choose a reason for hiding this comment

helenmhe-zz commented Sep 3, 2020

cockroach-teamcity commented Nov 10, 2020 • edited Loading

helenmhe-zz commented Sep 2, 2020 •

edited

Loading

cockroach-teamcity commented Nov 10, 2020 •

edited

Loading