-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsql: add batching for secondary lookup joins #25815
Conversation
Please ignore the first commit which is from a dependent PR. |
9fd2dcb
to
1533557
Compare
Review status: 0 of 5 files reviewed at latest revision, all discussions resolved. pkg/sql/distsqlrun/joinreader.go, line 388 at r2 (raw file):
I disentangled the index join and lookup join logic a bit because there was too much cognitive overhead. I think we could stand go further in this direction down the line. Comments from Reviewable |
With regard to testing, I'd definitely like to at least see a test case that has multiple results in the outer join. There's a bunch of code here that tries to preserve ordering of the results (I think) - that doesn't seem to be tested properly if there's no test case that has more than one output. Reviewed 5 of 5 files at r1. pkg/sql/distsqlrun/joinreader.go, line 388 at r2 (raw file): Previously, solongordon (Solon) wrote…
👍 although I wish the disentangling was in its own commit - the code movement combined with the changes makes it hard to see what's new vs what's copied. pkg/sql/distsqlrun/joinreader.go, line 485 at r2 (raw file):
This doesn't have to be allocated every time, does it? We could allocate a slice up front and reuse it. At least put a TODO or something here so it's easy to spot if this shows up as a hot allocations spot in profiling later. pkg/sql/distsqlrun/joinreader.go, line 494 at r2 (raw file):
Is it possible that Comments from Reviewable |
I would try to run all the logic tests with the lookup join flag set and see if something breaks (other than EXPLAIN (DISTSQL) statements and the like) Review status: 4 of 5 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/sql/distsqlrun/joinreader.go, line 456 at r2 (raw file):
Nice, TIL pkg/sql/distsqlrun/joinreader.go, line 553 at r2 (raw file):
why Comments from Reviewable |
Review status: 4 of 5 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. pkg/sql/distsqlrun/joinreader.go, line 456 at r2 (raw file): Previously, RaduBerinde wrote…
What is the Comments from Reviewable |
Yeah, I agree about more tests. I did just add some outer join tests with multiple results to the other PR. But really we ought to have some tests where the number of results exceeds the batch size. (I've manually tested this by setting the batch size very small and running logic tests.) I'll look into that. Also I'll see if I can try out Radu's suggestion. Review status: 4 of 5 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. pkg/sql/distsqlrun/joinreader.go, line 388 at r2 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Yeah, sorry about that. pkg/sql/distsqlrun/joinreader.go, line 456 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
Yeah, this was just my fingers misremembering the pkg/sql/distsqlrun/joinreader.go, line 485 at r2 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
True. I added a TODO for now. pkg/sql/distsqlrun/joinreader.go, line 494 at r2 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
As far as I understand it shouldn't be possible, since each secondary index row should correspond to exactly one primary index row. But if there's some weird case I'm missing I'd love to know.
pkg/sql/distsqlrun/joinreader.go, line 553 at r2 (raw file): Previously, RaduBerinde wrote…
No good reason, fixed. Comments from Reviewable |
e5b698e
to
3e29991
Compare
I ended up just making the batch size a parameter and lowering it to 2 in unit tests. Review status: 0 of 7 files reviewed at latest revision, 5 unresolved discussions. Comments from Reviewable |
3e29991
to
6cae2f4
Compare
The PR this depended on was merged and this was rebased. PTAL. |
Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions. pkg/sql/distsqlrun/joinreader.go, line 290 at r3 (raw file):
what's the point of making zero-sized slices? I'd just let it be pkg/sql/distsqlrun/joinreader_test.go, line 232 at r3 (raw file):
[nit] could be a random value between 1 and 10, cheap way to test more cases pkg/sql/distsqlrun/joinreader_test.go, line 350 at r3 (raw file):
Same pkg/sql/distsqlrun/processors.proto, line 261 at r3 (raw file):
Whoa, why are you changing the BTW bump the distsqlrun version and mention the change in the versions text file pkg/sql/logictest/testdata/logic_test/lookup_join, line 395 at r3 (raw file):
It would be good to have some tests where the distsql plan is actually distributed. Maybe create a secondary table and split that and use it (perhaps in some subquery) instead of VALUES Comments from Reviewable |
Lookup joins on non-covering secondary indexes were previously making a separate primary index scan for every secondary index row. Now those scans are grouped together into batches of up to 100 spans. Release note: None
6cae2f4
to
67ae2ab
Compare
Review status: 0 of 4 files reviewed at latest revision, 9 unresolved discussions. pkg/sql/distsqlrun/joinreader.go, line 290 at r3 (raw file): Previously, RaduBerinde wrote…
Done. pkg/sql/distsqlrun/joinreader_test.go, line 232 at r3 (raw file): Previously, RaduBerinde wrote…
I like the idea though I'm worried about introducing randomness into unit tests. Could lead to flakiness. pkg/sql/distsqlrun/processors.proto, line 261 at r3 (raw file): Previously, RaduBerinde wrote…
I think you must be seeing a diff from the rebase. (There was some renumbering between revisions of the PR this depended on.) And ohhh I didn't know about or had forgotten about the distsql version. That makes sense. I thought you were referring to the cockroach version in a previous comment. Done. pkg/sql/logictest/testdata/logic_test/lookup_join, line 395 at r3 (raw file): Previously, RaduBerinde wrote…
Yeah, makes sense. I'm going to create an issue for follow-up test enhancements since there are a couple others I'd like to make too. Comments from Reviewable |
Review status: 0 of 4 files reviewed at latest revision, 9 unresolved discussions. pkg/sql/logictest/testdata/logic_test/lookup_join, line 395 at r3 (raw file): Previously, solongordon wrote…
Created #25862 Comments from Reviewable |
bors r+ |
25815: distsql: add batching for secondary lookup joins r=solongordon a=solongordon Lookup joins on non-covering secondary indexes were previously making a separate primary index scan for every secondary index row. Now those scans are grouped together into batches of up to 100 spans. Release note: None Co-authored-by: Solon Gordon <[email protected]>
Review status: 0 of 4 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending. pkg/sql/distsqlrun/joinreader_test.go, line 232 at r3 (raw file): Previously, solongordon wrote…
Many unit tests have randomness. If the test is flaky w.r.t the batch size, that's definitely a bug.. pkg/sql/distsqlrun/processors.proto, line 261 at r3 (raw file): Previously, solongordon wrote…
Ah sorry I was looking at some partial diffs Comments from Reviewable |
Build succeeded |
25815: distsql: add batching for secondary lookup joins r=solongordon a=solongordon Lookup joins on non-covering secondary indexes were previously making a separate primary index scan for every secondary index row. Now those scans are grouped together into batches of up to 100 spans. Encryption benchmark: cockroachdb#19783. Release note: None Co-authored-by: Solon Gordon <[email protected]>
Lookup joins on non-covering secondary indexes were previously making a
separate primary index scan for every secondary index row. Now those
scans are grouped together into batches of up to 100 spans.
Release note: None