opt: allow join-elimination rules to apply in more cases #105214

DrewKimball · 2023-06-20T18:05:54Z

opt: allow join elimination rules to remap columns

When it can be proven that a join does not add rows to or remove them
from one of its inputs, the other input can often be removed, eliminating
the join. However, this can only be done if the columns from the
eliminated side are not needed.

This patch allows the join elimination rules to remap columns from the
eliminated side to the preserved side of the join, using the join's
functional dependencies. For example:

CREATE TABLE xy (x INT PRIMARY KEY, y INT);
CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x));

SELECT x, k, v FROM fk INNER JOIN xy ON v = x;

In the example above, the join could not previously be eliminated because
the x column is required in the output. Now, the x column is remapped
to the equivalent v column, allowing the join to be removed.

Fixes #102614

Release note (performance improvement): The optimizer can now eliminate
joins in more cases.

opt: infer equality filters for self joins

When a table is joined to itself with an equality that forms a key
over both join inputs, it is possible to infer equality filters
between each pair of columns at the same ordinal position in the base
table. This patch improves the logical props builder to infer these
self-join equalities in a join's FuncDepSet. This improves the quality
of information available to optimization rules, and in particular, join
elimination rules.

Informs #102614

Release note: None

cockroach-teamcity · 2023-06-20T18:06:09Z

This change is

msirek

Very nice performance improvements!

Reviewed 17 of 17 files at r1, 15 of 15 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

mgartner

Very nice! Just left some minor comments/questions.

Reviewed 17 of 17 files at r1, 15 of 15 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)

pkg/sql/opt/norm/project_funcs.go line 862 at r1 (raw file):

	for col, ok := from.Next(0); ok; col, ok = from.Next(col + 1) {
		if !to.Contains(col) {
			candidates := fds.ComputeEquivGroup(col)

nit: add TODO(mgartner): We don't need to compute the entire equivalence group, we only need to find the first equivalent column that's in the to set.

pkg/sql/opt/norm/project_funcs.go line 883 at r1 (raw file):

) memo.ProjectionsExpr {
	getReplacement := func(col opt.ColumnID) opt.ColumnID {
		candidates := fds.ComputeEquivGroup(col)

nit: add TODO(mgartner): We don't need to compute the entire equivalence group, we only need to find the first equivalent column that's in the to set.

pkg/sql/opt/norm/rules/project.opt line 36 at r1 (raw file):

(Project
    $left
    (MergeProjections

Are you convinced we don't need to call CanMergeProjections before MergeProjections? I think it's fine because the outer projections, (RemapProjectionCols $projections $leftCols $fds), will only contain references to columns in $leftCols, and the inner projections, (ProjectRemappedCols $passthrough $leftCols $fds), will only produce columns in $passthrough.

pkg/sql/opt/testutils/opttester/forcing_opt.go line 78 at r1 (raw file):

		if tester.Flags.DisableRules.Contains(int(ruleName)) {
			return false
		}

Good catch!

pkg/sql/opt/exec/execbuilder/testdata/subquery line 182 at r1 (raw file):

# IN expression transformed into semi-join.
query T
EXPLAIN (VERBOSE) SELECT a FROM abc WHERE a IN (SELECT a FROM abc WHERE b < 0)

The comment doesn't make sense anymore. Maybe change the as to bs here and the semi-join won't be eliminated.

pkg/sql/opt/memo/testdata/logprops/join line 3012 at r2 (raw file):

 ├── multiplicity: left-rows(zero-or-one), right-rows(zero-or-one)
 ├── key: (6)
 ├── fd: ()-->(2,7), (1)-->(3), (6)-->(8), (1)==(6), (6)==(1), (2)==(7), (7)==(2), (3)==(8), (8)==(3)

Walk me through this test case. We can infer equality of 2==7 and 3==8 because 2 and 7 are held constant? Any rows passing the xyz.x = foo.x filter and one of the constant equality filters would make 3 funcationally equivalent to 8, even though this is contradiction that no rows can satisfy?

pkg/sql/opt/norm/testdata/rules/join line 1842 at r2 (raw file):


# --------------------------------------------------
# SimplifyLeftJoin + SimplifyRightJoin

Do we want to prevent these tests from changing since their suppose to show the left-join => inner-join transformation? I suppose we need to self-join in order to make that transformation, and that also enables the join to be eliminated entirely, so maybe it's fine to leave as-is?

DrewKimball

Thanks for taking a look!

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

pkg/sql/opt/norm/project_funcs.go line 862 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: add TODO(mgartner): We don't need to compute the entire equivalence group, we only need to find the first equivalent column that's in the to set.

Done.

pkg/sql/opt/norm/project_funcs.go line 883 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: add TODO(mgartner): We don't need to compute the entire equivalence group, we only need to find the first equivalent column that's in the to set.

Done.

pkg/sql/opt/norm/rules/project.opt line 36 at r1 (raw file):

Are you convinced we don't need to call CanMergeProjections before MergeProjections? I think it's fine because the outer projections, (RemapProjectionCols $projections $leftCols $fds), will only contain references to columns in $leftCols, and the inner projections, (ProjectRemappedCols $passthrough $leftCols $fds), will only produce columns in $passthrough.

Yes, RemapProjectionCols projections only reference left columns, while ProjectRemappedCols only projects right columns. Added a comment.

pkg/sql/opt/testutils/opttester/forcing_opt.go line 78 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Good catch!

Thanks!

pkg/sql/opt/exec/execbuilder/testdata/subquery line 182 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

The comment doesn't make sense anymore. Maybe change the as to bs here and the semi-join won't be eliminated.

It didn't want to use the semi-join when I used b instead, but I got around it by just adding a duplicate table abc2.

pkg/sql/opt/memo/testdata/logprops/join line 3012 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Walk me through this test case. We can infer equality of 2==7 and 3==8 because 2 and 7 are held constant? Any rows passing the xyz.x = foo.x filter and one of the constant equality filters would make 3 funcationally equivalent to 8, even though this is contradiction that no rows can satisfy?

x implies y (and the same x implies the same y for both tables because it's a self-join), so if the x values between xyz and foo are the same, the y values must also be the same in order for there to be any matches. It's a contradiction in this case, but the optimizer doesn't see it because the filters get pushed down before the join's properties are calculated.

pkg/sql/opt/norm/testdata/rules/join line 1842 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Do we want to prevent these tests from changing since their suppose to show the left-join => inner-join transformation? I suppose we need to self-join in order to make that transformation, and that also enables the join to be eliminated entirely, so maybe it's fine to leave as-is?

I think it's ok since the expected rules are still firing, but I'm happy to prevent the elimination if you think that'd be better.

mgartner

Reviewed 20 of 22 files at r3, 15 of 15 files at r4, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball)

pkg/sql/opt/exec/execbuilder/testdata/subquery line 182 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

It didn't want to use the semi-join when I used b instead, but I got around it by just adding a duplicate table abc2.

👍

pkg/sql/opt/memo/testdata/logprops/join line 3012 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

x implies y (and the same x implies the same y for both tables because it's a self-join), so if the x values between xyz and foo are the same, the y values must also be the same in order for there to be any matches. It's a contradiction in this case, but the optimizer doesn't see it because the filters get pushed down before the join's properties are calculated.

👍

pkg/sql/opt/norm/testdata/rules/join line 1842 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think it's ok since the expected rules are still firing, but I'm happy to prevent the elimination if you think that'd be better.

No, I think this is fine as-is.

DrewKimball · 2023-06-23T19:17:26Z

Fixed a few test failures, I think everything should be passing now.

DrewKimball · 2023-06-24T04:32:24Z

TFTRs!

bors r+

105214: opt: allow join-elimination rules to apply in more cases r=DrewKimball a=DrewKimball #### opt: allow join elimination rules to remap columns When it can be proven that a join does not add rows to or remove them from one of its inputs, the other input can often be removed, eliminating the join. However, this can only be done if the columns from the eliminated side are not needed. This patch allows the join elimination rules to remap columns from the eliminated side to the preserved side of the join, using the join's functional dependencies. For example: ``` CREATE TABLE xy (x INT PRIMARY KEY, y INT); CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x)); SELECT x, k, v FROM fk INNER JOIN xy ON v = x; ``` In the example above, the join could not previously be eliminated because the `x` column is required in the output. Now, the `x` column is remapped to the equivalent `v` column, allowing the join to be removed. Fixes #102614 Release note (performance improvement): The optimizer can now eliminate joins in more cases. #### opt: infer equality filters for self joins When a table is joined to itself with an equality that forms a key over both join inputs, it is possible to infer equality filters between each pair of columns at the same ordinal position in the base table. This patch improves the logical props builder to infer these self-join equalities in a join's FuncDepSet. This improves the quality of information available to optimization rules, and in particular, join elimination rules. Informs #102614 Release note: None Co-authored-by: Drew Kimball <[email protected]>

craig · 2023-06-24T04:51:17Z

Build failed:

Bazel Essential CI (Cockroach)

DrewKimball · 2023-06-26T15:01:33Z

bors retry

craig · 2023-06-26T15:59:48Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-06-26T16:17:15Z

Build failed:

Bazel Essential CI (Cockroach)

DrewKimball · 2023-06-26T16:22:18Z

Looks like TestExecBuild_upsert is flaking, will investigate.

When it can be proven that a join does not add rows to or remove them from one of its inputs, the other input can often be removed, eliminating the join. However, this can only be done if the columns from the eliminated side are not needed. This patch allows the join elimination rules to remap columns from the eliminated side to the preserved side of the join, using the join's functional dependencies. For example: ``` CREATE TABLE xy (x INT PRIMARY KEY, y INT); CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x)); SELECT x, k, v FROM fk INNER JOIN xy ON v = x; ``` In the example above, the join could not previously be eliminated because the `x` column is required in the output. Now, the `x` column is remapped to the equivalent `v` column, allowing the join to be removed. Fixes cockroachdb#102614 Release note (performance improvement): The optimizer can now eliminate joins in more cases.

When a table is joined to itself with an equality that forms a key over both join inputs, it is possible to infer equality filters between each pair of columns at the same ordinal position in the base table. This patch improves the logical props builder to infer these self-join equalities in a join's FuncDepSet. This improves the quality of information available to optimization rules, and in particular, join elimination rules. Informs cockroachdb#102614 Release note: None

DrewKimball · 2023-06-26T18:39:03Z

bors r+

craig · 2023-06-26T19:35:07Z

Build succeeded:

Bazel Essential CI (Cockroach)

PR cockroachdb#105214 added logic to infer equivalences for a join that is performing a self-join with an equality between the same columns, and those columns form a key over both inputs. This missed the fact that, for nullable columns from the self-join table, the inferred equivalences are actually *lax*, as opposed to strict. A strict equivalence captures the behavior of the SQL `=` operator; strictly equivalent columns have the same value for every row and no NULL values. A lax equivalence is similar, but without the restriction on NULL values. If one column is NULL for a given row, the other must still be NULL at that row. Adding all the inferred equivalences as strict could cause other rules that depend on functional dependencies to perform incorrect transformations; e.g. adding an invalid filter. This patch moves the equivalence-inference logic to the join-elimination rules. This is valid because column remapping only cares about the "col1 and col2 have the same values for every row" semantics of lax equivalences. If some of those values are NULL, the remapping is still valid. This ensures that the inferred lax equivalences are only seen by the join-elimination rules, which know how to handle them. Fixes cockroachdb#105608 Release note: None

The `JoinOrderBuilder` is able to infer equality filters from the transitive closure of equalities in the reordered join tree for use in building new joins. However, it isn't valid to infer this equality for nullable columns. This is because the SQL `=` operator rejects NULL values, while functional dependency equivalences allow them. It was not previously possible to encounter this bug because inference of an "interesting" equality required an existing equality filter on both of the involved columns, which rejects NULLs anyway. However, cockroachdb#105214 added the ability to infer equivalences between self-join columns, which *do* allow nulls. This allowed the bug to manifest. This patch adds a check to the equality filter inference logic to ensure that at least one of the candidate columns is non-null. This works because if one column is non-null, all equivalent columns are also non-null. This ensures that a null-rejecting filter isn't incorrectly added to a nullable column. Fixes cockroachdb#105608 Release note: None

105866: opt: don't infer join equalities from equivalences on nullable columns r=DrewKimball a=DrewKimball The `JoinOrderBuilder` is able to infer equality filters from the transitive closure of equalities in the reordered join tree for use in building new joins. However, it isn't valid to infer this equality for nullable columns. This is because the SQL `=` operator rejects NULL values, while functional dependency equivalences allow them. It was not previously possible to encounter this bug because inference of an "interesting" equality required an existing equality filter on both of the involved columns, which rejects NULLs anyway. However, #105214 added the ability to infer equivalences between self-join columns, which *do* allow nulls. This allowed the bug to manifest. This patch adds a check to the equality filter inference logic to ensure that at least one of the candidate columns is non-null. This works because if one column is non-null, all equivalent columns are also non-null. This ensures that a null-rejecting filter isn't incorrectly added to a nullable column. Fixes #105608, Fixes #105598 Release note: None Co-authored-by: Drew Kimball <[email protected]>

Self-join equality inference was added by cockroachdb#105214, so that the `FuncDeps` for a self-join would include equalities between *every* pair of columns at the same ordinal position in the base table if there was an equality between key columns (also at the same ordinal position). However, the key column check was performed using the FDs of the join inputs rather than the base table's FDs. This could lead to incorrectly adding self-join equality filters. For example, consider the following: ``` CREATE TABLE xy (x INT NOT NULL, y INT NOT NULL); INSERT INTO xy VALUES (1, 1), (1, 2); SELECT * FROM xy a JOIN xy b ON a.x = b.x; SELECT * FROM (SELECT * FROM xy LIMIT 1) a JOIN (SELECT * FROM xy LIMIT 1) b ON a.x = b.x; ``` In the first query above, `a.x = b.x` does not consitute joining on key columns. But in the second query, both inputs have one row and so *any* set of columns is a key. However, there are no guarantees which row is being joined from each table - if `a` is the `(1, 1)` row and `b` is the `(1, 2)` row, inferring a `a.y = b.y` filter will incorrectly cause the join to return no rows. This patch fixes the problem by requiring the initial self-join equalities to form a key over the *base* table, not just the inputs of the join. Fixes cockroachdb#106371 Release note: None

Self-join equality inference was added by cockroachdb#105214, so that the `FuncDeps` for a self-join would include equalities between *every* pair of columns at the same ordinal position in the base table if there was an equality between key columns (also at the same ordinal position). However, the key column check was performed using the FDs of the join inputs rather than the base table's FDs. This could lead to incorrectly adding self-join equality filters. For example, consider the following: ``` CREATE TABLE t106371 (x INT NOT NULL, y INT NOT NULL); INSERT INTO t106371 VALUES (1, 1), (1, 2); SELECT * FROM t106371 a JOIN t106371 b ON a.x = b.x; SELECT * FROM (SELECT * FROM t106371 ORDER BY y DESC LIMIT 1) a JOIN (SELECT DISTINCT ON (x) * FROM t106371) b ON a.x = b.x; ``` In the first query above, `a.x = b.x` does not consitute joining on key columns. But in the second query, one input has one row and the other de-duplicated by the `x` column and so `x` is a key over both inputs. However, the query as written will select different rows for each input - `a` will return the `(1, 2)` row, while `b` will return the `(1, 1)` row. Inferring a `a.y = b.y` filter will incorrectly cause the join to return no rows. This patch fixes the problem by requiring the initial self-join equalities to form a key over the *base* table, not just the inputs of the join. Fixes cockroachdb#106371 Release note: None

106485: opt: only infer self-join equality with a key over the base table r=DrewKimball a=DrewKimball #### opt: only infer self-join equality with a key over the base table Self-join equality inference was added by #105214, so that the `FuncDeps` for a self-join would include equalities between *every* pair of columns at the same ordinal position in the base table if there was an equality between key columns (also at the same ordinal position). However, the key column check was performed using the FDs of the join inputs rather than the base table's FDs. This could lead to incorrectly adding self-join equality filters. For example, consider the following: ``` CREATE TABLE t106371 (x INT NOT NULL, y INT NOT NULL); INSERT INTO t106371 VALUES (1, 1), (1, 2); SELECT * FROM t106371 a JOIN t106371 b ON a.x = b.x; SELECT * FROM (SELECT * FROM t106371 ORDER BY y DESC LIMIT 1) a JOIN (SELECT DISTINCT ON (x) * FROM t106371) b ON a.x = b.x; ``` In the first query above, `a.x = b.x` does not consitute joining on key columns. But in the second query, one input has one row and the other de-duplicated by the `x` column and so `x` is a key over both inputs. However, the query as written will select different rows for each input - `a` will return the `(1, 2)` row, while `b` will return the `(1, 1)` row. Inferring a `a.y = b.y` filter will incorrectly cause the join to return no rows. This patch fixes the problem by requiring the initial self-join equalities to form a key over the *base* table, not just the inputs of the join. Fixes #106371 Release note: None 106779: acceptance: Fix DockerCSharp Test r=rimadeodhar a=rimadeodhar This PR updates the .net framework within the csproj setup to .net 6. Additionally, it also fixes the CS program that we run as a part of this test with the new npgsql DateTime changes made as a part of npgsql 6 version update. The old datetime types have been deprecated and needed to be removed. With these fixes, the test is running successfully and can be unskipped. Epic: none Fixes: #86852 Release note: none 107104: cli: fix debug zip with new columns for cluster settings r=maryliag a=maryliag The addition of new columns on cluster_settings view done on #104449 were causing debug zip fails to add the information from that table, since the query used to gather the information was doing a join and not considering the new columns. This commit updates the query to use the explicit columns, so even if new columns are added it won't be a problem in the future. It also adds tests for all custom querys used to generate the debug zip, so this type of issue would have been caught. The file `crdb_internal.cluster_settings.txt` in debug zips was empty due to this bug on v23.1.5 (only version affected by this bug). Fixes #107103 Release note (bug fix): Debug zip now are properly showing the information from cluster_settings. The file `crdb_internal.cluster_settings.txt` in debug zips was empty due to this bug on v23.1.5 (only version affected by this bug). Co-authored-by: Drew Kimball <[email protected]> Co-authored-by: rimadeodhar <[email protected]> Co-authored-by: maryliag <[email protected]>

mgartner · 2023-08-01T22:29:56Z

🚨 If we backport this, we'll need to also backport #105723, #107319, and #107995. 🚨

DrewKimball requested a review from a team as a code owner June 20, 2023 18:05

DrewKimball requested a review from yuzefovich June 20, 2023 18:05

yuzefovich requested review from mgartner and msirek and removed request for yuzefovich June 20, 2023 18:07

msirek approved these changes Jun 21, 2023

View reviewed changes

mgartner reviewed Jun 22, 2023

View reviewed changes

DrewKimball force-pushed the eliminate-join branch from 546bce0 to aff928a Compare June 23, 2023 07:59

DrewKimball commented Jun 23, 2023

View reviewed changes

mgartner approved these changes Jun 23, 2023

View reviewed changes

DrewKimball force-pushed the eliminate-join branch from aff928a to aa2b106 Compare June 23, 2023 19:17

DrewKimball requested review from a team as code owners June 23, 2023 19:17

DrewKimball removed request for a team June 23, 2023 19:17

DrewKimball added 2 commits June 26, 2023 10:42

DrewKimball force-pushed the eliminate-join branch from aa2b106 to cb25ade Compare June 26, 2023 16:42

craig bot merged commit 9bc0a0b into cockroachdb:master Jun 26, 2023

DrewKimball deleted the eliminate-join branch June 26, 2023 19:36

cockroach-teamcity mentioned this pull request Jun 27, 2023

PR #105214 - opt: allow join elimination rules to remap columns cockroachdb/docs#17350

Open

DrewKimball mentioned this pull request Jun 29, 2023

opt: move self-join equality inference to join elimination rules #105804

Closed

DrewKimball mentioned this pull request Jun 30, 2023

opt: don't infer join equalities from equivalences on nullable columns #105866

Merged

DrewKimball mentioned this pull request Jul 8, 2023

opt: only infer self-join equality with a key over the base table #106485

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: allow join-elimination rules to apply in more cases #105214

opt: allow join-elimination rules to apply in more cases #105214

DrewKimball commented Jun 20, 2023

cockroach-teamcity commented Jun 20, 2023

msirek left a comment

mgartner left a comment

DrewKimball left a comment

mgartner left a comment

DrewKimball commented Jun 23, 2023

DrewKimball commented Jun 24, 2023

craig bot commented Jun 24, 2023

DrewKimball commented Jun 26, 2023

craig bot commented Jun 26, 2023

craig bot commented Jun 26, 2023

DrewKimball commented Jun 26, 2023

DrewKimball commented Jun 26, 2023

craig bot commented Jun 26, 2023

mgartner commented Aug 1, 2023

opt: allow join-elimination rules to apply in more cases #105214

opt: allow join-elimination rules to apply in more cases #105214

Conversation

DrewKimball commented Jun 20, 2023

opt: allow join elimination rules to remap columns

opt: infer equality filters for self joins

cockroach-teamcity commented Jun 20, 2023

msirek left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

DrewKimball left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

DrewKimball commented Jun 23, 2023

DrewKimball commented Jun 24, 2023

craig bot commented Jun 24, 2023

DrewKimball commented Jun 26, 2023

craig bot commented Jun 26, 2023

craig bot commented Jun 26, 2023

DrewKimball commented Jun 26, 2023

DrewKimball commented Jun 26, 2023

craig bot commented Jun 26, 2023

mgartner commented Aug 1, 2023