-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: allow join-elimination rules to apply in more cases #105214
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice performance improvements!
Reviewed 17 of 17 files at r1, 15 of 15 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Just left some minor comments/questions.
Reviewed 17 of 17 files at r1, 15 of 15 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/norm/project_funcs.go
line 862 at r1 (raw file):
for col, ok := from.Next(0); ok; col, ok = from.Next(col + 1) { if !to.Contains(col) { candidates := fds.ComputeEquivGroup(col)
nit: add TODO(mgartner): We don't need to compute the entire equivalence group, we only need to find the first equivalent column that's in the to set
.
pkg/sql/opt/norm/project_funcs.go
line 883 at r1 (raw file):
) memo.ProjectionsExpr { getReplacement := func(col opt.ColumnID) opt.ColumnID { candidates := fds.ComputeEquivGroup(col)
nit: add TODO(mgartner): We don't need to compute the entire equivalence group, we only need to find the first equivalent column that's in the to set
.
pkg/sql/opt/norm/rules/project.opt
line 36 at r1 (raw file):
(Project $left (MergeProjections
Are you convinced we don't need to call CanMergeProjections
before MergeProjections
? I think it's fine because the outer
projections, (RemapProjectionCols $projections $leftCols $fds)
, will only contain references to columns in $leftCols
, and the inner
projections, (ProjectRemappedCols $passthrough $leftCols $fds)
, will only produce columns in $passthrough
.
pkg/sql/opt/testutils/opttester/forcing_opt.go
line 78 at r1 (raw file):
if tester.Flags.DisableRules.Contains(int(ruleName)) { return false }
Good catch!
pkg/sql/opt/exec/execbuilder/testdata/subquery
line 182 at r1 (raw file):
# IN expression transformed into semi-join. query T EXPLAIN (VERBOSE) SELECT a FROM abc WHERE a IN (SELECT a FROM abc WHERE b < 0)
The comment doesn't make sense anymore. Maybe change the a
s to b
s here and the semi-join won't be eliminated.
pkg/sql/opt/memo/testdata/logprops/join
line 3012 at r2 (raw file):
├── multiplicity: left-rows(zero-or-one), right-rows(zero-or-one) ├── key: (6) ├── fd: ()-->(2,7), (1)-->(3), (6)-->(8), (1)==(6), (6)==(1), (2)==(7), (7)==(2), (3)==(8), (8)==(3)
Walk me through this test case. We can infer equality of 2==7 and 3==8 because 2 and 7 are held constant? Any rows passing the xyz.x = foo.x
filter and one of the constant equality filters would make 3 funcationally equivalent to 8, even though this is contradiction that no rows can satisfy?
pkg/sql/opt/norm/testdata/rules/join
line 1842 at r2 (raw file):
# -------------------------------------------------- # SimplifyLeftJoin + SimplifyRightJoin
Do we want to prevent these tests from changing since their suppose to show the left-join => inner-join transformation? I suppose we need to self-join in order to make that transformation, and that also enables the join to be eliminated entirely, so maybe it's fine to leave as-is?
546bce0
to
aff928a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/norm/project_funcs.go
line 862 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: add
TODO(mgartner): We don't need to compute the entire equivalence group, we only need to find the first equivalent column that's in the to set
.
Done.
pkg/sql/opt/norm/project_funcs.go
line 883 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: add
TODO(mgartner): We don't need to compute the entire equivalence group, we only need to find the first equivalent column that's in the to set
.
Done.
pkg/sql/opt/norm/rules/project.opt
line 36 at r1 (raw file):
Are you convinced we don't need to call
CanMergeProjections
beforeMergeProjections
? I think it's fine because theouter
projections,(RemapProjectionCols $projections $leftCols $fds)
, will only contain references to columns in$leftCols
, and theinner
projections,(ProjectRemappedCols $passthrough $leftCols $fds)
, will only produce columns in$passthrough
.
Yes, RemapProjectionCols
projections only reference left columns, while ProjectRemappedCols
only projects right columns. Added a comment.
pkg/sql/opt/testutils/opttester/forcing_opt.go
line 78 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Good catch!
Thanks!
pkg/sql/opt/exec/execbuilder/testdata/subquery
line 182 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The comment doesn't make sense anymore. Maybe change the
a
s tob
s here and the semi-join won't be eliminated.
It didn't want to use the semi-join when I used b
instead, but I got around it by just adding a duplicate table abc2
.
pkg/sql/opt/memo/testdata/logprops/join
line 3012 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Walk me through this test case. We can infer equality of 2==7 and 3==8 because 2 and 7 are held constant? Any rows passing the
xyz.x = foo.x
filter and one of the constant equality filters would make 3 funcationally equivalent to 8, even though this is contradiction that no rows can satisfy?
x
implies y
(and the same x
implies the same y
for both tables because it's a self-join), so if the x
values between xyz
and foo
are the same, the y
values must also be the same in order for there to be any matches. It's a contradiction in this case, but the optimizer doesn't see it because the filters get pushed down before the join's properties are calculated.
pkg/sql/opt/norm/testdata/rules/join
line 1842 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Do we want to prevent these tests from changing since their suppose to show the left-join => inner-join transformation? I suppose we need to self-join in order to make that transformation, and that also enables the join to be eliminated entirely, so maybe it's fine to leave as-is?
I think it's ok since the expected rules are still firing, but I'm happy to prevent the elimination if you think that'd be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 20 of 22 files at r3, 15 of 15 files at r4, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/exec/execbuilder/testdata/subquery
line 182 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
It didn't want to use the semi-join when I used
b
instead, but I got around it by just adding a duplicate tableabc2
.
👍
pkg/sql/opt/memo/testdata/logprops/join
line 3012 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
x
impliesy
(and the samex
implies the samey
for both tables because it's a self-join), so if thex
values betweenxyz
andfoo
are the same, they
values must also be the same in order for there to be any matches. It's a contradiction in this case, but the optimizer doesn't see it because the filters get pushed down before the join's properties are calculated.
👍
pkg/sql/opt/norm/testdata/rules/join
line 1842 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I think it's ok since the expected rules are still firing, but I'm happy to prevent the elimination if you think that'd be better.
No, I think this is fine as-is.
aff928a
to
aa2b106
Compare
Fixed a few test failures, I think everything should be passing now. |
TFTRs! bors r+ |
105214: opt: allow join-elimination rules to apply in more cases r=DrewKimball a=DrewKimball #### opt: allow join elimination rules to remap columns When it can be proven that a join does not add rows to or remove them from one of its inputs, the other input can often be removed, eliminating the join. However, this can only be done if the columns from the eliminated side are not needed. This patch allows the join elimination rules to remap columns from the eliminated side to the preserved side of the join, using the join's functional dependencies. For example: ``` CREATE TABLE xy (x INT PRIMARY KEY, y INT); CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x)); SELECT x, k, v FROM fk INNER JOIN xy ON v = x; ``` In the example above, the join could not previously be eliminated because the `x` column is required in the output. Now, the `x` column is remapped to the equivalent `v` column, allowing the join to be removed. Fixes #102614 Release note (performance improvement): The optimizer can now eliminate joins in more cases. #### opt: infer equality filters for self joins When a table is joined to itself with an equality that forms a key over both join inputs, it is possible to infer equality filters between each pair of columns at the same ordinal position in the base table. This patch improves the logical props builder to infer these self-join equalities in a join's FuncDepSet. This improves the quality of information available to optimization rules, and in particular, join elimination rules. Informs #102614 Release note: None Co-authored-by: Drew Kimball <[email protected]>
Build failed: |
bors retry |
Build failed (retrying...): |
Build failed: |
Looks like |
When it can be proven that a join does not add rows to or remove them from one of its inputs, the other input can often be removed, eliminating the join. However, this can only be done if the columns from the eliminated side are not needed. This patch allows the join elimination rules to remap columns from the eliminated side to the preserved side of the join, using the join's functional dependencies. For example: ``` CREATE TABLE xy (x INT PRIMARY KEY, y INT); CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x)); SELECT x, k, v FROM fk INNER JOIN xy ON v = x; ``` In the example above, the join could not previously be eliminated because the `x` column is required in the output. Now, the `x` column is remapped to the equivalent `v` column, allowing the join to be removed. Fixes cockroachdb#102614 Release note (performance improvement): The optimizer can now eliminate joins in more cases.
When a table is joined to itself with an equality that forms a key over both join inputs, it is possible to infer equality filters between each pair of columns at the same ordinal position in the base table. This patch improves the logical props builder to infer these self-join equalities in a join's FuncDepSet. This improves the quality of information available to optimization rules, and in particular, join elimination rules. Informs cockroachdb#102614 Release note: None
aa2b106
to
cb25ade
Compare
bors r+ |
Build succeeded: |
PR cockroachdb#105214 added logic to infer equivalences for a join that is performing a self-join with an equality between the same columns, and those columns form a key over both inputs. This missed the fact that, for nullable columns from the self-join table, the inferred equivalences are actually *lax*, as opposed to strict. A strict equivalence captures the behavior of the SQL `=` operator; strictly equivalent columns have the same value for every row and no NULL values. A lax equivalence is similar, but without the restriction on NULL values. If one column is NULL for a given row, the other must still be NULL at that row. Adding all the inferred equivalences as strict could cause other rules that depend on functional dependencies to perform incorrect transformations; e.g. adding an invalid filter. This patch moves the equivalence-inference logic to the join-elimination rules. This is valid because column remapping only cares about the "col1 and col2 have the same values for every row" semantics of lax equivalences. If some of those values are NULL, the remapping is still valid. This ensures that the inferred lax equivalences are only seen by the join-elimination rules, which know how to handle them. Fixes cockroachdb#105608 Release note: None
The `JoinOrderBuilder` is able to infer equality filters from the transitive closure of equalities in the reordered join tree for use in building new joins. However, it isn't valid to infer this equality for nullable columns. This is because the SQL `=` operator rejects NULL values, while functional dependency equivalences allow them. It was not previously possible to encounter this bug because inference of an "interesting" equality required an existing equality filter on both of the involved columns, which rejects NULLs anyway. However, cockroachdb#105214 added the ability to infer equivalences between self-join columns, which *do* allow nulls. This allowed the bug to manifest. This patch adds a check to the equality filter inference logic to ensure that at least one of the candidate columns is non-null. This works because if one column is non-null, all equivalent columns are also non-null. This ensures that a null-rejecting filter isn't incorrectly added to a nullable column. Fixes cockroachdb#105608 Release note: None
The `JoinOrderBuilder` is able to infer equality filters from the transitive closure of equalities in the reordered join tree for use in building new joins. However, it isn't valid to infer this equality for nullable columns. This is because the SQL `=` operator rejects NULL values, while functional dependency equivalences allow them. It was not previously possible to encounter this bug because inference of an "interesting" equality required an existing equality filter on both of the involved columns, which rejects NULLs anyway. However, cockroachdb#105214 added the ability to infer equivalences between self-join columns, which *do* allow nulls. This allowed the bug to manifest. This patch adds a check to the equality filter inference logic to ensure that at least one of the candidate columns is non-null. This works because if one column is non-null, all equivalent columns are also non-null. This ensures that a null-rejecting filter isn't incorrectly added to a nullable column. Fixes cockroachdb#105608 Release note: None
105866: opt: don't infer join equalities from equivalences on nullable columns r=DrewKimball a=DrewKimball The `JoinOrderBuilder` is able to infer equality filters from the transitive closure of equalities in the reordered join tree for use in building new joins. However, it isn't valid to infer this equality for nullable columns. This is because the SQL `=` operator rejects NULL values, while functional dependency equivalences allow them. It was not previously possible to encounter this bug because inference of an "interesting" equality required an existing equality filter on both of the involved columns, which rejects NULLs anyway. However, #105214 added the ability to infer equivalences between self-join columns, which *do* allow nulls. This allowed the bug to manifest. This patch adds a check to the equality filter inference logic to ensure that at least one of the candidate columns is non-null. This works because if one column is non-null, all equivalent columns are also non-null. This ensures that a null-rejecting filter isn't incorrectly added to a nullable column. Fixes #105608, Fixes #105598 Release note: None Co-authored-by: Drew Kimball <[email protected]>
Self-join equality inference was added by cockroachdb#105214, so that the `FuncDeps` for a self-join would include equalities between *every* pair of columns at the same ordinal position in the base table if there was an equality between key columns (also at the same ordinal position). However, the key column check was performed using the FDs of the join inputs rather than the base table's FDs. This could lead to incorrectly adding self-join equality filters. For example, consider the following: ``` CREATE TABLE xy (x INT NOT NULL, y INT NOT NULL); INSERT INTO xy VALUES (1, 1), (1, 2); SELECT * FROM xy a JOIN xy b ON a.x = b.x; SELECT * FROM (SELECT * FROM xy LIMIT 1) a JOIN (SELECT * FROM xy LIMIT 1) b ON a.x = b.x; ``` In the first query above, `a.x = b.x` does not consitute joining on key columns. But in the second query, both inputs have one row and so *any* set of columns is a key. However, there are no guarantees which row is being joined from each table - if `a` is the `(1, 1)` row and `b` is the `(1, 2)` row, inferring a `a.y = b.y` filter will incorrectly cause the join to return no rows. This patch fixes the problem by requiring the initial self-join equalities to form a key over the *base* table, not just the inputs of the join. Fixes cockroachdb#106371 Release note: None
Self-join equality inference was added by cockroachdb#105214, so that the `FuncDeps` for a self-join would include equalities between *every* pair of columns at the same ordinal position in the base table if there was an equality between key columns (also at the same ordinal position). However, the key column check was performed using the FDs of the join inputs rather than the base table's FDs. This could lead to incorrectly adding self-join equality filters. For example, consider the following: ``` CREATE TABLE t106371 (x INT NOT NULL, y INT NOT NULL); INSERT INTO t106371 VALUES (1, 1), (1, 2); SELECT * FROM t106371 a JOIN t106371 b ON a.x = b.x; SELECT * FROM (SELECT * FROM t106371 ORDER BY y DESC LIMIT 1) a JOIN (SELECT DISTINCT ON (x) * FROM t106371) b ON a.x = b.x; ``` In the first query above, `a.x = b.x` does not consitute joining on key columns. But in the second query, one input has one row and the other de-duplicated by the `x` column and so `x` is a key over both inputs. However, the query as written will select different rows for each input - `a` will return the `(1, 2)` row, while `b` will return the `(1, 1)` row. Inferring a `a.y = b.y` filter will incorrectly cause the join to return no rows. This patch fixes the problem by requiring the initial self-join equalities to form a key over the *base* table, not just the inputs of the join. Fixes cockroachdb#106371 Release note: None
106485: opt: only infer self-join equality with a key over the base table r=DrewKimball a=DrewKimball #### opt: only infer self-join equality with a key over the base table Self-join equality inference was added by #105214, so that the `FuncDeps` for a self-join would include equalities between *every* pair of columns at the same ordinal position in the base table if there was an equality between key columns (also at the same ordinal position). However, the key column check was performed using the FDs of the join inputs rather than the base table's FDs. This could lead to incorrectly adding self-join equality filters. For example, consider the following: ``` CREATE TABLE t106371 (x INT NOT NULL, y INT NOT NULL); INSERT INTO t106371 VALUES (1, 1), (1, 2); SELECT * FROM t106371 a JOIN t106371 b ON a.x = b.x; SELECT * FROM (SELECT * FROM t106371 ORDER BY y DESC LIMIT 1) a JOIN (SELECT DISTINCT ON (x) * FROM t106371) b ON a.x = b.x; ``` In the first query above, `a.x = b.x` does not consitute joining on key columns. But in the second query, one input has one row and the other de-duplicated by the `x` column and so `x` is a key over both inputs. However, the query as written will select different rows for each input - `a` will return the `(1, 2)` row, while `b` will return the `(1, 1)` row. Inferring a `a.y = b.y` filter will incorrectly cause the join to return no rows. This patch fixes the problem by requiring the initial self-join equalities to form a key over the *base* table, not just the inputs of the join. Fixes #106371 Release note: None 106779: acceptance: Fix DockerCSharp Test r=rimadeodhar a=rimadeodhar This PR updates the .net framework within the csproj setup to .net 6. Additionally, it also fixes the CS program that we run as a part of this test with the new npgsql DateTime changes made as a part of npgsql 6 version update. The old datetime types have been deprecated and needed to be removed. With these fixes, the test is running successfully and can be unskipped. Epic: none Fixes: #86852 Release note: none 107104: cli: fix debug zip with new columns for cluster settings r=maryliag a=maryliag The addition of new columns on cluster_settings view done on #104449 were causing debug zip fails to add the information from that table, since the query used to gather the information was doing a join and not considering the new columns. This commit updates the query to use the explicit columns, so even if new columns are added it won't be a problem in the future. It also adds tests for all custom querys used to generate the debug zip, so this type of issue would have been caught. The file `crdb_internal.cluster_settings.txt` in debug zips was empty due to this bug on v23.1.5 (only version affected by this bug). Fixes #107103 Release note (bug fix): Debug zip now are properly showing the information from cluster_settings. The file `crdb_internal.cluster_settings.txt` in debug zips was empty due to this bug on v23.1.5 (only version affected by this bug). Co-authored-by: Drew Kimball <[email protected]> Co-authored-by: rimadeodhar <[email protected]> Co-authored-by: maryliag <[email protected]>
opt: allow join elimination rules to remap columns
When it can be proven that a join does not add rows to or remove them
from one of its inputs, the other input can often be removed, eliminating
the join. However, this can only be done if the columns from the
eliminated side are not needed.
This patch allows the join elimination rules to remap columns from the
eliminated side to the preserved side of the join, using the join's
functional dependencies. For example:
In the example above, the join could not previously be eliminated because
the
x
column is required in the output. Now, thex
column is remappedto the equivalent
v
column, allowing the join to be removed.Fixes #102614
Release note (performance improvement): The optimizer can now eliminate
joins in more cases.
opt: infer equality filters for self joins
When a table is joined to itself with an equality that forms a key
over both join inputs, it is possible to infer equality filters
between each pair of columns at the same ordinal position in the base
table. This patch improves the logical props builder to infer these
self-join equalities in a join's FuncDepSet. This improves the quality
of information available to optimization rules, and in particular, join
elimination rules.
Informs #102614
Release note: None