-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support semi and anti join in joinNode #25037
Conversation
Please hold off on reviewing. There are a few problems I uncovered that I need to fix. |
What are the problems? I haven't gotten around to reviewing yet, but one of my comments was going to ask whether you could manage to share some of the test cases from the distributed joiners. @pbardea has started doing some of this already for the different joiner types in DistSQL, and it could be good to get that kind of test case sharing for the local processors as well. |
The ON condition evaluation still needs access to the right columns; currently it breaks if it needs a right-side column. We usually test things through logic tests. This type of low level tests are very tedious (most planNodes don't have such tests) - I added them in this case because there is no SQL syntax for anti/semi joins. The distsqlrun tests are even more tedious and the infrastructure is different enough that I think sharing tests would be a lot of work. The plan for better tests here is through optimizer execbuilder tests or logic tests (using correlated queries that result in these joins). I might do these in the same PR now that #24969 is merged. |
lemme know when you want a review |
63738a9
to
45adbd2
Compare
Updated, ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. Please however add a comment reminder in the commit message and in the appropriate code locations of how semi/anti joins differ from regular joins. It's pretty clear throughout the change that they only reports columns from the left operand, but the change doesn't provide an intuition of what the result set looks like (which rows get selected on which condition and why).
If you feel lazy at least give a link to wikipedia or some other stable online resource that has a summary.
pkg/sql/join.go
Outdated
@@ -577,8 +580,15 @@ func (n *joinNode) Next(params runParams) (res bool, err error) { | |||
continue | |||
} | |||
foundMatch = true | |||
if n.joinType == sqlbase.JoinType_LEFT_ANTI { | |||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to explain/remind the reader what the anti join is about, and why in the case there's a match we break out of the loop.
pkg/sql/join.go
Outdated
|
||
n.pred.prepareRow(n.run.output, lrow, rrow) | ||
if n.joinType == sqlbase.JoinType_LEFT_SEMI { | ||
n.pred.prepareRow(n.run.output, lrow, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto reminder/explanation.
pkg/sql/join.go
Outdated
@@ -587,6 +597,9 @@ func (n *joinNode) Next(params runParams) (res bool, err error) { | |||
if _, err := n.run.buffer.AddRow(params.ctx, n.run.output); err != nil { | |||
return false, err | |||
} | |||
if n.joinType == sqlbase.JoinType_LEFT_SEMI { | |||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
45adbd2
to
8a80599
Compare
TFTR! I added explanations in the commit and in the places you suggested. Note that the join type constants are also documented: cockroach/pkg/sql/sqlbase/join_type.pb.go Line 24 in 95394c9
Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions. pkg/sql/join.go, line 584 at r1 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/join.go, line 588 at r1 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/join.go, line 601 at r1 (raw file): Previously, knz (kena) wrote…
Done. Comments from Reviewable |
Reviewed 8 of 9 files at r1, 1 of 1 files at r2. pkg/sql/join.go, line 611 at r2 (raw file):
add anti-join to the comment pkg/sql/opt/exec/execbuilder/testdata/join, line 225 at r2 (raw file):
after you rebase you'll probably get an error - just change Comments from Reviewable |
The distsql processors support semi and anti join; however, the optimizer doesn't yet play well with distsql; and when it will, it initially won't generate distsql plans directly. Adding support for semi and anti join in the joinNode. This will also allow us to generate distsql plans for these joins from an intermediary planNode tree. A semi join returns the rows from the left side that match at least one row from the right side (as per equality columns and ON condition). An anti join is an "inverted" semi join: it returns the rows from the left side that don't match any columns on the right side (as per equality columns and ON condition). Adding execbuilder tests for queries that are decorrelated into semi and anti joins. Release note: None
8a80599
to
dc6ff74
Compare
Review status: 7 of 10 files reviewed at latest revision, 5 unresolved discussions. pkg/sql/join.go, line 611 at r2 (raw file): Previously, rytaft wrote…
Done. pkg/sql/opt/exec/execbuilder/testdata/join, line 225 at r2 (raw file): Previously, rytaft wrote…
Done, thanks! Comments from Reviewable |
Reviewed 7 of 9 files at r1, 3 of 3 files at r3. Comments from Reviewable |
bors r+ Comments from Reviewable |
25037: sql: support semi and anti join in joinNode r=RaduBerinde a=RaduBerinde The distsql processors support semi and anti join; however, the optimizer doesn't yet play well with distsql; and when it will, it initially won't generate distsql plans directly. Adding support for semi and anti join in the joinNode. This will also allow us to generate distsql plans for these joins from an intermediary planNode tree. Release note: None 25089: cli: improve debuggability of CLI tests r=knz,windchan7 a=benesch Various fixes from debugging a failed CLI test with @windchan7. The culprit was a misbehaving retry loop that was tying up a goroutine during server quit for too long, but cascading failures in our CLI test infrastructure prevented these logs from surfacing anywhere useful. Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Nikhil Benesch <[email protected]>
Build succeeded |
The distsql processors support semi and anti join; however, the
optimizer doesn't yet play well with distsql; and when it will, it
initially won't generate distsql plans directly.
Adding support for semi and anti join in the joinNode. This will also
allow us to generate distsql plans for these joins from an
intermediary planNode tree.
Release note: None