Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support semi and anti join in joinNode #25037

Merged
merged 1 commit into from
Apr 26, 2018

Conversation

RaduBerinde
Copy link
Member

The distsql processors support semi and anti join; however, the
optimizer doesn't yet play well with distsql; and when it will, it
initially won't generate distsql plans directly.

Adding support for semi and anti join in the joinNode. This will also
allow us to generate distsql plans for these joins from an
intermediary planNode tree.

Release note: None

@RaduBerinde RaduBerinde requested review from pbardea, knz, andy-kimball and a team April 24, 2018 18:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member Author

Please hold off on reviewing. There are a few problems I uncovered that I need to fix.

@jordanlewis
Copy link
Member

What are the problems? I haven't gotten around to reviewing yet, but one of my comments was going to ask whether you could manage to share some of the test cases from the distributed joiners. @pbardea has started doing some of this already for the different joiner types in DistSQL, and it could be good to get that kind of test case sharing for the local processors as well.

@RaduBerinde
Copy link
Member Author

The ON condition evaluation still needs access to the right columns; currently it breaks if it needs a right-side column.

We usually test things through logic tests. This type of low level tests are very tedious (most planNodes don't have such tests) - I added them in this case because there is no SQL syntax for anti/semi joins. The distsqlrun tests are even more tedious and the infrastructure is different enough that I think sharing tests would be a lot of work. The plan for better tests here is through optimizer execbuilder tests or logic tests (using correlated queries that result in these joins). I might do these in the same PR now that #24969 is merged.

@knz
Copy link
Contributor

knz commented Apr 25, 2018

lemme know when you want a review

@RaduBerinde RaduBerinde force-pushed the local-semi-anti-join branch from 63738a9 to 45adbd2 Compare April 25, 2018 19:54
@RaduBerinde RaduBerinde requested a review from a team as a code owner April 25, 2018 19:54
@RaduBerinde
Copy link
Member Author

Updated, ready for review.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. Please however add a comment reminder in the commit message and in the appropriate code locations of how semi/anti joins differ from regular joins. It's pretty clear throughout the change that they only reports columns from the left operand, but the change doesn't provide an intuition of what the result set looks like (which rows get selected on which condition and why).

If you feel lazy at least give a link to wikipedia or some other stable online resource that has a summary.

pkg/sql/join.go Outdated
@@ -577,8 +580,15 @@ func (n *joinNode) Next(params runParams) (res bool, err error) {
continue
}
foundMatch = true
if n.joinType == sqlbase.JoinType_LEFT_ANTI {
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to explain/remind the reader what the anti join is about, and why in the case there's a match we break out of the loop.

pkg/sql/join.go Outdated

n.pred.prepareRow(n.run.output, lrow, rrow)
if n.joinType == sqlbase.JoinType_LEFT_SEMI {
n.pred.prepareRow(n.run.output, lrow, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto reminder/explanation.

pkg/sql/join.go Outdated
@@ -587,6 +597,9 @@ func (n *joinNode) Next(params runParams) (res bool, err error) {
if _, err := n.run.buffer.AddRow(params.ctx, n.run.output); err != nil {
return false, err
}
if n.joinType == sqlbase.JoinType_LEFT_SEMI {
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@RaduBerinde RaduBerinde force-pushed the local-semi-anti-join branch from 45adbd2 to 8a80599 Compare April 26, 2018 12:31
@RaduBerinde
Copy link
Member Author

TFTR! I added explanations in the commit and in the places you suggested. Note that the join type constants are also documented:

// A left semi join returns the rows from the left side that match at least


Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/join.go, line 584 at r1 (raw file):

Previously, knz (kena) wrote…

Add a comment to explain/remind the reader what the anti join is about, and why in the case there's a match we break out of the loop.

Done.


pkg/sql/join.go, line 588 at r1 (raw file):

Previously, knz (kena) wrote…

ditto reminder/explanation.

Done.


pkg/sql/join.go, line 601 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Apr 26, 2018

:lgtm:


Reviewed 8 of 9 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/join.go, line 611 at r2 (raw file):

		if !foundMatch && wantUnmatchedLeft {
			// If none of the rows matched the on condition and we are computing a
			// left or full outer join, we need to add a row with an empty

add anti-join to the comment


pkg/sql/opt/exec/execbuilder/testdata/join, line 225 at r2 (raw file):


exec-raw
CREATE TABLE t.c (u INT PRIMARY KEY, v INT);

after you rebase you'll probably get an error - just change t.c to c (and below too)


Comments from Reviewable

The distsql processors support semi and anti join; however, the
optimizer doesn't yet play well with distsql; and when it will, it
initially won't generate distsql plans directly.

Adding support for semi and anti join in the joinNode. This will also
allow us to generate distsql plans for these joins from an
intermediary planNode tree.

A semi join returns the rows from the left side that match at least
one row from the right side (as per equality columns and ON condition).
An anti join is an "inverted" semi join: it returns the rows from the
left side that don't match any columns on the right side (as per
equality columns and ON condition).

Adding execbuilder tests for queries that are decorrelated into semi
and anti joins.

Release note: None
@RaduBerinde RaduBerinde force-pushed the local-semi-anti-join branch from 8a80599 to dc6ff74 Compare April 26, 2018 14:31
@RaduBerinde
Copy link
Member Author

Review status: 7 of 10 files reviewed at latest revision, 5 unresolved discussions.


pkg/sql/join.go, line 611 at r2 (raw file):

Previously, rytaft wrote…

add anti-join to the comment

Done.


pkg/sql/opt/exec/execbuilder/testdata/join, line 225 at r2 (raw file):

Previously, rytaft wrote…

after you rebase you'll probably get an error - just change t.c to c (and below too)

Done, thanks!


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Apr 26, 2018

:lgtm: thanks!


Reviewed 7 of 9 files at r1, 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

bors r+


Comments from Reviewable

craig bot pushed a commit that referenced this pull request Apr 26, 2018
25037: sql: support semi and anti join in joinNode r=RaduBerinde a=RaduBerinde

The distsql processors support semi and anti join; however, the
optimizer doesn't yet play well with distsql; and when it will, it
initially won't generate distsql plans directly.

Adding support for semi and anti join in the joinNode. This will also
allow us to generate distsql plans for these joins from an
intermediary planNode tree.

Release note: None

25089: cli: improve debuggability of CLI tests r=knz,windchan7 a=benesch

Various fixes from debugging a failed CLI test with @windchan7. The culprit was a misbehaving retry loop that was tying up a goroutine during server quit for too long, but cascading failures in our CLI test infrastructure prevented these logs from surfacing anywhere useful.

Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Nikhil Benesch <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 26, 2018

Build succeeded

@craig craig bot merged commit dc6ff74 into cockroachdb:master Apr 26, 2018
@RaduBerinde RaduBerinde deleted the local-semi-anti-join branch April 26, 2018 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants