-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: split disjunction in join conditions in more cases #97696
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix significantly improves Q19 in TPC-H:
name old time/op new time/op delta
Q19 6.86s ± 1% 1.16s ±11% -83.06% (p=0.029 n=4+4)
Reviewable status: complete! 0 of 0 LGTMs obtained
e168ed6
to
f0c4165
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)
pkg/sql/opt/xform/join_funcs.go
line 1812 at r1 (raw file):
if rightCol, ok := eqExpr.Right.(*memo.VariableExpr); ok { return (leftRelColSet.Contains(leftCol.Col) && rightRelColSet.Contains(rightCol.Col)) || (leftRelColSet.Contains(rightCol.Col) && rightRelColSet.Contains(leftCol.Col))
Is there a reason to be more restrictive here and require equality with a column in both tables? Previously, the following test using the tables in disjunction_in_join
would trigger the rewrite:
opt expect=SplitDisjunctionOfJoinTerms
SELECT * FROM a t1, a t2 WHERE (t1.a2 = t2.a2+1 OR t1.a3 = t2.a3+1) AND (t1.a1 = t2.a1+1 OR t1.a4 = t2.a4+1)
----
project
├── columns: a1:1!null a2:2!null a3:3!null a4:4!null a1:7!null a2:8!null a3:9!null a4:10!null
├── immutable
├── key: (1-4,7-10)
└── distinct-on
├── columns: t1.a1:1!null t1.a2:2!null t1.a3:3!null t1.a4:4!null t2.a1:7!null t2.a2:8!null t2.a3:9!null t2.a4:10!null
├── grouping columns: t1.a1:1!null t1.a2:2!null t1.a3:3!null t1.a4:4!null t2.a1:7!null t2.a2:8!null t2.a3:9!null t2.a4:10!null
├── immutable
├── key: (1-4,7-10)
└── union-all
├── columns: t1.a1:1!null t1.a2:2!null t1.a3:3!null t1.a4:4!null t2.a1:7!null t2.a2:8!null t2.a3:9!null t2.a4:10!null
├── left columns: t1.a1:13 t1.a2:14 t1.a3:15 t1.a4:16 t2.a1:19 t2.a2:20 t2.a3:21 t2.a4:22
├── right columns: t1.a1:25 t1.a2:26 t1.a3:27 t1.a4:28 t2.a1:31 t2.a2:32 t2.a3:33 t2.a4:34
├── immutable
├── project
│ ├── columns: t1.a1:13!null t1.a2:14!null t1.a3:15!null t1.a4:16!null t2.a1:19!null t2.a2:20!null t2.a3:21!null t2.a4:22!null
│ ├── immutable
│ ├── key: (13,15,16,19-22)
│ ├── fd: (20)-->(14)
│ └── inner-join (hash)
│ ├── columns: t1.a1:13!null t1.a2:14!null t1.a3:15!null t1.a4:16!null t2.a1:19!null t2.a2:20!null t2.a3:21!null t2.a4:22!null column37:37!null
│ ├── immutable
│ ├── key: (13,15,16,19-22)
│ ├── fd: (20)-->(37), (14)==(37), (37)==(14)
│ ├── scan a [as=t1]
│ │ ├── columns: t1.a1:13!null t1.a2:14!null t1.a3:15!null t1.a4:16!null
│ │ └── key: (13-16)
│ ├── project
│ │ ├── columns: column37:37!null t2.a1:19!null t2.a2:20!null t2.a3:21!null t2.a4:22!null
│ │ ├── immutable
│ │ ├── key: (19-22)
│ │ ├── fd: (20)-->(37)
│ │ ├── scan a [as=t2]
│ │ │ ├── columns: t2.a1:19!null t2.a2:20!null t2.a3:21!null t2.a4:22!null
│ │ │ └── key: (19-22)
│ │ └── projections
│ │ └── t2.a2:20 + 1 [as=column37:37, outer=(20), immutable]
│ └── filters
│ ├── (t1.a1:13 = (t2.a1:19 + 1)) OR (t1.a4:16 = (t2.a4:22 + 1)) [outer=(13,16,19,22), immutable]
│ └── t1.a2:14 = column37:37 [outer=(14,37), constraints=(/14: (/NULL - ]; /37: (/NULL - ]), fd=(14)==(37), (37)==(14)]
└── project
├── columns: t1.a1:25!null t1.a2:26!null t1.a3:27!null t1.a4:28!null t2.a1:31!null t2.a2:32!null t2.a3:33!null t2.a4:34!null
├── immutable
├── key: (25,26,28,31-34)
├── fd: (33)-->(27)
└── inner-join (hash)
├── columns: t1.a1:25!null t1.a2:26!null t1.a3:27!null t1.a4:28!null t2.a1:31!null t2.a2:32!null t2.a3:33!null t2.a4:34!null column38:38!null
├── immutable
├── key: (25,26,28,31-34)
├── fd: (33)-->(38), (27)==(38), (38)==(27)
├── scan a [as=t1]
│ ├── columns: t1.a1:25!null t1.a2:26!null t1.a3:27!null t1.a4:28!null
│ └── key: (25-28)
├── project
│ ├── columns: column38:38!null t2.a1:31!null t2.a2:32!null t2.a3:33!null t2.a4:34!null
│ ├── immutable
│ ├── key: (31-34)
│ ├── fd: (33)-->(38)
│ ├── scan a [as=t2]
│ │ ├── columns: t2.a1:31!null t2.a2:32!null t2.a3:33!null t2.a4:34!null
│ │ └── key: (31-34)
│ └── projections
│ └── t2.a3:33 + 1 [as=column38:38, outer=(33), immutable]
└── filters
├── (t1.a1:25 = (t2.a1:31 + 1)) OR (t1.a4:28 = (t2.a4:34 + 1)) [outer=(25,28,31,34), immutable]
└── t1.a3:27 = column38:38 [outer=(27,38), constraints=(/27: (/NULL - ]; /38: (/NULL - ]), fd=(27)==(38), (38)==(27)]
Also, don't we support lookup join with range predicates now? This check would disallow the rewrite to enable those types of joins. Then there could also be potential inverted joins with predicates such as ST_Intersects(geo_table2.geom, geo_table.geom)
, which may be OR'ed with other join terms. I guess the previous rule should not have included "equijoin" in the name since it actually enabled more joins than equijoin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @msirek)
pkg/sql/opt/xform/join_funcs.go
line 1812 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Is there a reason to be more restrictive here and require equality with a column in both tables? Previously, the following test using the tables in
disjunction_in_join
would trigger the rewrite:opt expect=SplitDisjunctionOfJoinTerms SELECT * FROM a t1, a t2 WHERE (t1.a2 = t2.a2+1 OR t1.a3 = t2.a3+1) AND (t1.a1 = t2.a1+1 OR t1.a4 = t2.a4+1) ---- project ├── columns: a1:1!null a2:2!null a3:3!null a4:4!null a1:7!null a2:8!null a3:9!null a4:10!null ├── immutable ├── key: (1-4,7-10) └── distinct-on ├── columns: t1.a1:1!null t1.a2:2!null t1.a3:3!null t1.a4:4!null t2.a1:7!null t2.a2:8!null t2.a3:9!null t2.a4:10!null ├── grouping columns: t1.a1:1!null t1.a2:2!null t1.a3:3!null t1.a4:4!null t2.a1:7!null t2.a2:8!null t2.a3:9!null t2.a4:10!null ├── immutable ├── key: (1-4,7-10) └── union-all ├── columns: t1.a1:1!null t1.a2:2!null t1.a3:3!null t1.a4:4!null t2.a1:7!null t2.a2:8!null t2.a3:9!null t2.a4:10!null ├── left columns: t1.a1:13 t1.a2:14 t1.a3:15 t1.a4:16 t2.a1:19 t2.a2:20 t2.a3:21 t2.a4:22 ├── right columns: t1.a1:25 t1.a2:26 t1.a3:27 t1.a4:28 t2.a1:31 t2.a2:32 t2.a3:33 t2.a4:34 ├── immutable ├── project │ ├── columns: t1.a1:13!null t1.a2:14!null t1.a3:15!null t1.a4:16!null t2.a1:19!null t2.a2:20!null t2.a3:21!null t2.a4:22!null │ ├── immutable │ ├── key: (13,15,16,19-22) │ ├── fd: (20)-->(14) │ └── inner-join (hash) │ ├── columns: t1.a1:13!null t1.a2:14!null t1.a3:15!null t1.a4:16!null t2.a1:19!null t2.a2:20!null t2.a3:21!null t2.a4:22!null column37:37!null │ ├── immutable │ ├── key: (13,15,16,19-22) │ ├── fd: (20)-->(37), (14)==(37), (37)==(14) │ ├── scan a [as=t1] │ │ ├── columns: t1.a1:13!null t1.a2:14!null t1.a3:15!null t1.a4:16!null │ │ └── key: (13-16) │ ├── project │ │ ├── columns: column37:37!null t2.a1:19!null t2.a2:20!null t2.a3:21!null t2.a4:22!null │ │ ├── immutable │ │ ├── key: (19-22) │ │ ├── fd: (20)-->(37) │ │ ├── scan a [as=t2] │ │ │ ├── columns: t2.a1:19!null t2.a2:20!null t2.a3:21!null t2.a4:22!null │ │ │ └── key: (19-22) │ │ └── projections │ │ └── t2.a2:20 + 1 [as=column37:37, outer=(20), immutable] │ └── filters │ ├── (t1.a1:13 = (t2.a1:19 + 1)) OR (t1.a4:16 = (t2.a4:22 + 1)) [outer=(13,16,19,22), immutable] │ └── t1.a2:14 = column37:37 [outer=(14,37), constraints=(/14: (/NULL - ]; /37: (/NULL - ]), fd=(14)==(37), (37)==(14)] └── project ├── columns: t1.a1:25!null t1.a2:26!null t1.a3:27!null t1.a4:28!null t2.a1:31!null t2.a2:32!null t2.a3:33!null t2.a4:34!null ├── immutable ├── key: (25,26,28,31-34) ├── fd: (33)-->(27) └── inner-join (hash) ├── columns: t1.a1:25!null t1.a2:26!null t1.a3:27!null t1.a4:28!null t2.a1:31!null t2.a2:32!null t2.a3:33!null t2.a4:34!null column38:38!null ├── immutable ├── key: (25,26,28,31-34) ├── fd: (33)-->(38), (27)==(38), (38)==(27) ├── scan a [as=t1] │ ├── columns: t1.a1:25!null t1.a2:26!null t1.a3:27!null t1.a4:28!null │ └── key: (25-28) ├── project │ ├── columns: column38:38!null t2.a1:31!null t2.a2:32!null t2.a3:33!null t2.a4:34!null │ ├── immutable │ ├── key: (31-34) │ ├── fd: (33)-->(38) │ ├── scan a [as=t2] │ │ ├── columns: t2.a1:31!null t2.a2:32!null t2.a3:33!null t2.a4:34!null │ │ └── key: (31-34) │ └── projections │ └── t2.a3:33 + 1 [as=column38:38, outer=(33), immutable] └── filters ├── (t1.a1:25 = (t2.a1:31 + 1)) OR (t1.a4:28 = (t2.a4:34 + 1)) [outer=(25,28,31,34), immutable] └── t1.a3:27 = column38:38 [outer=(27,38), constraints=(/27: (/NULL - ]; /38: (/NULL - ]), fd=(27)==(38), (38)==(27)]
Also, don't we support lookup join with range predicates now? This check would disallow the rewrite to enable those types of joins. Then there could also be potential inverted joins with predicates such as
ST_Intersects(geo_table2.geom, geo_table.geom)
, which may be OR'ed with other join terms. I guess the previous rule should not have included "equijoin" in the name since it actually enabled more joins than equijoin.
I made it more restrictive since I added a test with ... OR t2.a2 % t1.a3 = 2
and noticed that it was considered an "equijoin".
But this is a good point about the support for joins with projections -- I'll add this example to the test file.
Regarding lookup joins with range predicates and inverted joins with spatial functions -- the rule doesn't support those cases today. Should we remove the need for an "interesting" predicate to perform the transformation? Seems to me like most predicates will be interesting, and therefore we shouldn't worry too much about the cases where they are not. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)
pkg/sql/opt/xform/join_funcs.go
line 1812 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I made it more restrictive since I added a test with
... OR t2.a2 % t1.a3 = 2
and noticed that it was considered an "equijoin".But this is a good point about the support for joins with projections -- I'll add this example to the test file.
Regarding lookup joins with range predicates and inverted joins with spatial functions -- the rule doesn't support those cases today. Should we remove the need for an "interesting" predicate to perform the transformation? Seems to me like most predicates will be interesting, and therefore we shouldn't worry too much about the cases where they are not. What do you think?
The old code:
isJoinPred := func(predCols, leftRelColSet, rightRelColSet opt.ColSet) bool {
return leftRelColSet.Intersects(predCols) && rightRelColSet.Intersects(predCols)
was checking that we have a non-cartesian join, which should trigger the rewrite even in non-equijoin cases including lookup joins with range predicates, etc. We could restore this check or rewrite it to something else. I'm not sure if there's a benefit in removing that code entirely and matching on any predicate. We'd be trusting the costing code more in that case.
Previously, msirek (Mark Sirek) wrote…
This function was only called if the expression was an In order to be more restrictive and still support the other cases you mention, we'd need to add explicit checks for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)
pkg/sql/opt/xform/join_funcs.go
line 1812 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This function was only called if the expression was an
EqExpr
(see below), so it wasn't working for the cases you mention. I could call it from thedefault
case instead, but then if we combine this with my new function,boundBySingleTable
, we would effectively allow any predicate at all.In order to be more restrictive and still support the other cases you mention, we'd need to add explicit checks for
=
,<
,>
,>=
,<=
, and all of the relevant spatial functions. And then we'd probably still be missing some operators (e.g.,IN
, etc). I'm leaning more and more towards just treating all predicates as interesting.
I see. I agree, there are many different predicates which could be interesting, so supporting the rewrite for all predicates would be good. There may be some plan regressions, but that would be good if it uncovers costing bugs. If this is backported, we may wish to consider keeping the old isJoinPred
check in place (plus your new check) in the backport to make sure such regressions would only occur in unreleased v23.1, giving more time for testing to uncover problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)
pkg/sql/opt/xform/join_funcs.go
line 1812 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
I see. I agree, there are many different predicates which could be interesting, so supporting the rewrite for all predicates would be good. There may be some plan regressions, but that would be good if it uncovers costing bugs. If this is backported, we may wish to consider keeping the old
isJoinPred
check in place (plus your new check) in the backport to make sure such regressions would only occur in unreleased v23.1, giving more time for testing to uncover problems.
You just mentioned in the escalation review meeting you may create a session setting for this. I suppose with a session setting we could keep the fix the same in 23.1 and 22.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 9 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek)
pkg/sql/opt/xform/join_funcs.go
line 1812 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
You just mentioned in the escalation review meeting you may create a session setting for this. I suppose with a session setting we could keep the fix the same in 23.1 and 22.2.
For the case of projections being used in the join condition, maybe we should expand ExtractJoinComparisons
to handle disjunctions (or create a similar rule that handles them). Maybe this isn't necessary if we do treat all predicates as interesting.
Alternatively, we could use the boundBySingleTable
logic to ensure that each side of the EqExpr
references (only) each input, without requiring that the operands be variables.
f0c4165
to
7809528
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a commit with a new session setting and updated the logic as discussed below. PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @msirek)
pkg/sql/opt/xform/join_funcs.go
line 1812 at r1 (raw file):
we could use the boundBySingleTable logic to ensure that each side of the EqExpr references (only) each input, without requiring that the operands be variables.
This wouldn't handle the other cases @msirek mentioned with inequality predicates and spatial functions. I'm going to just treat all predicates as interesting for now.
7809528
to
12cdf64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 15 files at r2, 9 of 9 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @msirek)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @rytaft)
…_joins This commit adds a new session setting, optimizer_use_improved_split_disjunction_for_joins, which will be used in the next commit. Release note (sql change): added a new session setting, optimizer_use_improved_split_disjunction_for_joins, which enables the optimizer to split disjunctions (OR expressions) in more cases in join conditions by building a UNION of two join expressions. If this setting is true, all disjunctions in inner, semi, and anti joins will be split. If false, only disjunctions potentially containing an equijoin condition will be split.
Prior to this commit, when a join condition included a disjunction (e.g. a OR b), in some cases we could remove the disjunction by splitting the join into a UNION of joins to create a more efficient plan. However, we were only performing this transformation if at least one side of the OR predicate contained an equijoin predicate (e.g., t1.col1 = t2.col1). There were other cases where we could have improved the plan by splitting the disjunction, but we did not do so. This commit improves our ability to optimize joins with disjunctions in the join condition when there is the possibility to push one or both sides of the disjunction below the join. This commit removes the requirement that the disjunction contains an equijoin predicate, and instead splits the disjunction in all cases where it is possible to do so, thus enabling more optimization opportunities. Fixes cockroachdb#97695 Release note (performance improvement): if the session setting optimizer_use_improved_split_disjunction_for_joins is true, the optimizer now creates a better query plan in some cases where an inner, semi, or anti join contains a join predicate with a disjuction (OR condition).
12cdf64
to
554b1dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @msirek)
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 88709d8 to blathers/backport-release-22.2-97696: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
sql: add session setting
optimizer_use_improved_split_disjunction_for_joins
This commit adds a new session setting,
optimizer_use_improved_split_disjunction_for_joins
, which will be used inthe next commit.
Release note (sql change): added a new session setting,
optimizer_use_improved_split_disjunction_for_joins
, which enables theoptimizer to split disjunctions (
OR
expressions) in more cases in joinconditions by building a
UNION
of two join expressions. If this settingis true, all disjunctions in inner, semi, and anti joins will be split.
If false, only disjunctions potentially containing an equijoin condition
will be split.
opt: split disjunction in join conditions in more cases
Prior to this commit, when a join condition included a disjunction
(e.g.
a OR b
), in some cases we could remove the disjunction by splittingthe join into a
UNION
of joins to create a more efficient plan. However,we were only performing this transformation if at least one side of the
OR
predicate contained an equijoin predicate (e.g.,
t1.col1 = t2.col1
). Therewere other cases where we could have improved the plan by splitting the
disjunction, but we did not do so.
This commit improves our ability to optimize joins with disjunctions in
the join condition when there is the possibility to push one or both sides
of the disjunction below the join. This commit removes the requirement that
the disjunction contains an equijoin predicate, and instead splits the
disjunction in all cases where it is possible to do so, thus enabling
more optimization opportunities.
Fixes #97695
Release note (performance improvement): if the session setting
optimizer_use_improved_split_disjunction_for_joins
is true, the optimizernow creates a better query plan in some cases where an inner, semi, or
anti join contains a join predicate with a disjuction (
OR
condition).