Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: TryRemapJoinOuterColsRight prevents optimal plan #102614

Closed
mgartner opened this issue Apr 28, 2023 · 2 comments · Fixed by #105214
Closed

opt: TryRemapJoinOuterColsRight prevents optimal plan #102614

mgartner opened this issue Apr 28, 2023 · 2 comments · Fixed by #105214
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Apr 28, 2023

The TryRemapJoinOuterColsRight rule, which is newly introduced in v23.1, is preventing an optimal query plan from being produced.

Here's an example test case showing a suboptimal query plan first, and an optimal query plan when TryRemapJoinOuterColsRight is disabled:

exec-ddl
CREATE TABLE parent (
  id INT PRIMARY KEY
)
----

exec-ddl
CREATE TABLE child (
  id INT PRIMARY KEY,
  parent_id INT NOT NULL REFERENCES parent(id)
)
----

opt
SELECT
  id,
  (SELECT child.parent_id FROM parent WHERE id = child.parent_id)
FROM child
----
project
 ├── columns: id:1!null parent_id:9!null
 ├── key: (1)
 ├── fd: (1)-->(9)
 ├── inner-join (hash)
 │    ├── columns: child.id:1!null child.parent_id:2!null parent.id:5!null parent_id:8!null
 │    ├── multiplicity: left-rows(exactly-one), right-rows(zero-or-more)
 │    ├── key: (1)
 │    ├── fd: (1)-->(2), (5)==(2,8), (8)==(2,5), (2)==(5,8)
 │    ├── scan child
 │    │    ├── columns: child.id:1!null child.parent_id:2!null
 │    │    ├── key: (1)
 │    │    └── fd: (1)-->(2)
 │    ├── project
 │    │    ├── columns: parent_id:8!null parent.id:5!null
 │    │    ├── key: (5)
 │    │    ├── fd: (5)==(8), (8)==(5)
 │    │    ├── scan parent
 │    │    │    ├── columns: parent.id:5!null
 │    │    │    └── key: (5)
 │    │    └── projections
 │    │         └── parent.id:5 [as=parent_id:8, outer=(5)]
 │    └── filters
 │         └── parent.id:5 = child.parent_id:2 [outer=(2,5), constraints=(/2: (/NULL - ]; /5: (/NULL - ]), fd=(2)==(5), (5)==(2)]
 └── projections
      └── parent_id:8 [as=parent_id:9, outer=(8)]

opt disable=(TryRemapJoinOuterColsRight)
SELECT
  id,
  (SELECT child.parent_id FROM parent WHERE id = child.parent_id)
FROM child
----
project
 ├── columns: id:1!null parent_id:9!null
 ├── key: (1)
 ├── fd: (1)-->(9)
 ├── scan child
 │    ├── columns: child.id:1!null child.parent_id:2!null
 │    ├── key: (1)
 │    └── fd: (1)-->(2)
 └── projections
      └── child.parent_id:2 [as=parent_id:9, outer=(2)]

cc @DrewKimball

Jira issue: CRDB-27561

@mgartner mgartner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 28, 2023
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Apr 28, 2023
@mgartner
Copy link
Collaborator Author

This is a tricky case. EliminateJoinUnderProjectLeft doesn't fire because the parent.id:5 column on the RHS of the join used in the top-level project, so the RHS of the join is needed.

@rytaft
Copy link
Collaborator

rytaft commented May 9, 2023

From triage meeting:

  • Should this be an exploration rule rather than normalization?
  • We should probably make the join elimination rules more powerful
  • We shouldn't spend too much time on this. If it's too hard, can move this issue to Backlog

DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jun 20, 2023
When it can be proven that a join does not add rows to or remove them
from one of its inputs, the other input can often be removed, eliminating
the join. However, this can only be done if the columns from the
eliminated side are not needed.

This patch allows the join elimination rules to remap columns from the
eliminated side to the preserved side of the join, using the join's
functional dependencies. For example:
```
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x));

SELECT x, k, v FROM fk INNER JOIN xy ON v = x;
```
In the example above, the join could not previously be eliminated because
the `x` column is required in the output. Now, the `x` column is remapped
to the equivalent `v` column, allowing the join to be removed.

Fixes cockroachdb#102614

Release note (performance improvement): The optimizer can now eliminate
joins in more cases.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jun 20, 2023
When a table is joined to itself with an equality that forms a key
over both join inputs, it is possible to infer equality filters
between each pair of columns at the same ordinal position in the base
table. This patch improves the logical props builder to infer these
self-join equalities in a join's FuncDepSet. This improves the quality
of information available to optimization rules, and in particular, join
elimination rules.

Informs cockroachdb#102614

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jun 23, 2023
When it can be proven that a join does not add rows to or remove them
from one of its inputs, the other input can often be removed, eliminating
the join. However, this can only be done if the columns from the
eliminated side are not needed.

This patch allows the join elimination rules to remap columns from the
eliminated side to the preserved side of the join, using the join's
functional dependencies. For example:
```
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x));

SELECT x, k, v FROM fk INNER JOIN xy ON v = x;
```
In the example above, the join could not previously be eliminated because
the `x` column is required in the output. Now, the `x` column is remapped
to the equivalent `v` column, allowing the join to be removed.

Fixes cockroachdb#102614

Release note (performance improvement): The optimizer can now eliminate
joins in more cases.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jun 23, 2023
When a table is joined to itself with an equality that forms a key
over both join inputs, it is possible to infer equality filters
between each pair of columns at the same ordinal position in the base
table. This patch improves the logical props builder to infer these
self-join equalities in a join's FuncDepSet. This improves the quality
of information available to optimization rules, and in particular, join
elimination rules.

Informs cockroachdb#102614

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jun 23, 2023
When it can be proven that a join does not add rows to or remove them
from one of its inputs, the other input can often be removed, eliminating
the join. However, this can only be done if the columns from the
eliminated side are not needed.

This patch allows the join elimination rules to remap columns from the
eliminated side to the preserved side of the join, using the join's
functional dependencies. For example:
```
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x));

SELECT x, k, v FROM fk INNER JOIN xy ON v = x;
```
In the example above, the join could not previously be eliminated because
the `x` column is required in the output. Now, the `x` column is remapped
to the equivalent `v` column, allowing the join to be removed.

Fixes cockroachdb#102614

Release note (performance improvement): The optimizer can now eliminate
joins in more cases.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jun 23, 2023
When a table is joined to itself with an equality that forms a key
over both join inputs, it is possible to infer equality filters
between each pair of columns at the same ordinal position in the base
table. This patch improves the logical props builder to infer these
self-join equalities in a join's FuncDepSet. This improves the quality
of information available to optimization rules, and in particular, join
elimination rules.

Informs cockroachdb#102614

Release note: None
craig bot pushed a commit that referenced this issue Jun 24, 2023
105214: opt: allow join-elimination rules to apply in more cases r=DrewKimball a=DrewKimball

#### opt: allow join elimination rules to remap columns

When it can be proven that a join does not add rows to or remove them
from one of its inputs, the other input can often be removed, eliminating
the join. However, this can only be done if the columns from the
eliminated side are not needed.

This patch allows the join elimination rules to remap columns from the
eliminated side to the preserved side of the join, using the join's
functional dependencies. For example:
```
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x));

SELECT x, k, v FROM fk INNER JOIN xy ON v = x;
```
In the example above, the join could not previously be eliminated because
the `x` column is required in the output. Now, the `x` column is remapped
to the equivalent `v` column, allowing the join to be removed.

Fixes #102614

Release note (performance improvement): The optimizer can now eliminate
joins in more cases.

#### opt: infer equality filters for self joins

When a table is joined to itself with an equality that forms a key
over both join inputs, it is possible to infer equality filters
between each pair of columns at the same ordinal position in the base
table. This patch improves the logical props builder to infer these
self-join equalities in a join's FuncDepSet. This improves the quality
of information available to optimization rules, and in particular, join
elimination rules.

Informs #102614

Release note: None

Co-authored-by: Drew Kimball <[email protected]>
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jun 26, 2023
When a table is joined to itself with an equality that forms a key
over both join inputs, it is possible to infer equality filters
between each pair of columns at the same ordinal position in the base
table. This patch improves the logical props builder to infer these
self-join equalities in a join's FuncDepSet. This improves the quality
of information available to optimization rules, and in particular, join
elimination rules.

Informs cockroachdb#102614

Release note: None
craig bot pushed a commit that referenced this issue Jun 26, 2023
105214: opt: allow join-elimination rules to apply in more cases r=DrewKimball a=DrewKimball

#### opt: allow join elimination rules to remap columns

When it can be proven that a join does not add rows to or remove them
from one of its inputs, the other input can often be removed, eliminating
the join. However, this can only be done if the columns from the
eliminated side are not needed.

This patch allows the join elimination rules to remap columns from the
eliminated side to the preserved side of the join, using the join's
functional dependencies. For example:
```
CREATE TABLE xy (x INT PRIMARY KEY, y INT);
CREATE TABLE fk (k INT PRIMARY KEY, v INT NOT NULL, FOREIGN KEY (v) REFERENCES xy (x));

SELECT x, k, v FROM fk INNER JOIN xy ON v = x;
```
In the example above, the join could not previously be eliminated because
the `x` column is required in the output. Now, the `x` column is remapped
to the equivalent `v` column, allowing the join to be removed.

Fixes #102614

Release note (performance improvement): The optimizer can now eliminate
joins in more cases.

#### opt: infer equality filters for self joins

When a table is joined to itself with an equality that forms a key
over both join inputs, it is possible to infer equality filters
between each pair of columns at the same ordinal position in the base
table. This patch improves the logical props builder to infer these
self-join equalities in a join's FuncDepSet. This improves the quality
of information available to optimization rules, and in particular, join
elimination rules.

Informs #102614

Release note: None

Co-authored-by: Drew Kimball <[email protected]>
@craig craig bot closed this as completed in a87cfe6 Jun 26, 2023
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jul 20, 2023
We recently added support for column remapping in the join elimination
rules that allows columns from the eliminated input of the join to be
mapped to equivalent columns from the preserved input. This allows
joins to be eliminated in more cases - in particular, the self-join
patterns that can arise from an `UPDATE ... FROM` statement where the
table in the `FROM` clause is the same as the table being updated.

This patch adds a setting, `optimizer_use_improved_join_elimination`,
which gates the column-remapping logic for the join-elimination rules.
The plan is to backport the column-remapping changes to 23.1 behind
this setting turned off by default.

Informs cockroachdb#102614

Release note: None
craig bot pushed a commit that referenced this issue Jul 21, 2023
106925: upgrademanager: simplify TestMigrationFailure r=yuzefovich a=knz

Informs #76378 
Epic: CRDB-18499

This also removes `TODOTestTenantDisabled`.

(I have verified the test works, albeit still flaky due to #106648, by temporarily removing the skip.)

Release note: None

107319: opt: add session setting for join elimination optimization r=DrewKimball a=DrewKimball

We recently added support for column remapping in the join elimination rules that allows columns from the eliminated input of the join to be mapped to equivalent columns from the preserved input. This allows joins to be eliminated in more cases - in particular, the self-join patterns that can arise from an `UPDATE ... FROM` statement where the table in the `FROM` clause is the same as the table being updated.

This patch adds a setting, `optimizer_use_improved_join_elimination`, which gates the column-remapping logic for the join-elimination rules. The plan is to backport the column-remapping changes to 23.1 behind this setting turned off by default.

Informs #102614

Release note: None

107392: go.mod: bump Pebble to 94f91669 r=RaduBerinde a=RaduBerinde

94f91669 db: rename Experimental.SharedStorage to RemoteStorage in Options
c62c9127 objstorage: rename objstorage/shared package to remote
692f3b61 vfs: move errorfs package to vfs dir to allow use in CockroachDB
5e550364 cmd/pebble: allow passing an explicit checkpoint directory path
9c3f337a cmd/replay: always log checkpoint initialization
0cd8f1b8 replay: calculate separate write stall metrics for each reason
f9d08867 cmd/pebble: use non-default block cache size in replay tool
d0e583f8 *: improve humanize format
5e76dfab db: minor update of comment for TargetByteDeletionRate
456a2a2a objstorage: rename Shared to Remote in the objstorage provider API

Release note (general change): Formatting of byte figures in Pebble logs has been improved; any tools that parse these logs might need updating.

Epic: none

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Drew Kimball <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants