-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rowexec: use CheckExistsRequest in LEFT SEMI/ANTI joins #53818
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 18 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @helenmhe)
pkg/kv/kvclient/kvcoord/txn_interceptor_existence_cache.go, line 28 at r4 (raw file):
"kv.transaction.existence_cache.enabled", "if enabled, key range existence knowledge is maintained for each transaction", true,
What happens to benchmarks if you disable this cache?
pkg/sql/row/kv_batch_fetcher.go, line 328 at r4 (raw file):
ba.Requests[i].MustSetInner(&scans[i]) } ba.Header.MaxSpanRequestKeys = 0
Hmm, I think it would be interesting to disable parallelization in this case and in the scan case (I think you can do this by always setting a max span request keys, although I'm not sure what the value should be) and get some benchmark numbers. This would make any difference between the request types much more noticeable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/kv/kvclient/kvcoord/txn_interceptor_existence_cache.go, line 28 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
What happens to benchmarks if you disable this cache?
At least for fk_test.go
I see a noticeable regression
pkg/sql/row/kv_batch_fetcher.go, line 328 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Hmm, I think it would be interesting to disable parallelization in this case and in the scan case (I think you can do this by always setting a max span request keys, although I'm not sure what the value should be) and get some benchmark numbers. This would make any difference between the request types much more noticeable.
I'll try that, would a max span request keys of 1 not work?
This commit adds BenchmarkLeftSemiJoin in order to test the usage of CheckExistsRequest over ScanRequests in left semi/anti joins. Release note: None
Release note: None
Release note: None
Release note (<category, see below>): <what> <show> <why>
63e63a3
to
0a31b5d
Compare
CheckExistsRequest CheckExistsRequest is currently implemented with a flag checking for types that don’t require right cols, namely left semi/anti joins. Currently in the kv_batch_fetcher, I set Testing Switching to the inner loop in https://github.com/cockroachdb/cockroach/pull/53669/files#diff-2793c01ca6dd699f8ad1e26f24f700dbR1182 seems to have increased variance, compared to before when I was using -count 100 or so to modify b.N, so despite running faster I don’t know if it’s as desirable for getting statistically significant results? Results with gceworker for some reason tended to have slightly more variance, leading to having very high p-values so unfortunately I didn’t get any worthwhile results on the gceworker after making this change. The results I got prior to making the benchmark changes are shown below:
Along with the corresponding profile call_trees: Master CheckExists Get To force the GetRequest code path in BenchmarkLeftSemiJoin I had to create the foo index as UNIQUE. Overall from the testing, it was a bit difficult to isolate the difference between using Scan vs CheckExists even in a benchmark designed to stress the left semi join. I’m not entirely sure why |
|
Implements CheckExistsRequest as specified in #49790