Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: unnecessary cross-joins with when generating a lookup join with a lookup expression #79384

Closed
mgartner opened this issue Apr 4, 2022 · 1 comment · Fixed by #79389
Closed
Assignees
Labels
branch-release-21.1 Used to mark GA and release blockers, technical advisories, and bugs for 21.1 branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Apr 4, 2022

Notice in the query plan below that the cross-join to produce the lookup_join_const_col_@7 column is unnecessary.

exec-ddl
CREATE TABLE t1234 (
  a INT,
  b INT,
  c INT,
  INDEX (a, b, c)
)
----

opt expect=GenerateLookupJoinsWithFilter
SELECT m FROM small JOIN t1234 ON b IN (1, 2, 3) AND c > 0 AND m = a
----
project
 ├── columns: m:1!null
 └── inner-join (lookup t1234@t1234_a_b_c_idx)
      ├── columns: m:1!null a:6!null b:7!null c:8!null
      ├── lookup expression
      │    └── filters
      │         ├── m:1 = a:6 [outer=(1,6), constraints=(/1: (/NULL - ]; /6: (/NULL - ]), fd=(1)==(6), (6)==(1)]
      │         ├── b:7 IN (1, 2, 3) [outer=(7), constraints=(/7: [/1 - /1] [/2 - /2] [/3 - /3]; tight)]
      │         └── c:8 > 0 [outer=(8), constraints=(/8: [/1 - ]; tight)]
      ├── fd: (1)==(6), (6)==(1)
      ├── inner-join (cross)
      │    ├── columns: m:1 "lookup_join_const_col_@7":12!null
      │    ├── multiplicity: left-rows(one-or-more), right-rows(zero-or-more)
      │    ├── scan small
      │    │    └── columns: m:1
      │    ├── values
      │    │    ├── columns: "lookup_join_const_col_@7":12!null
      │    │    ├── cardinality: [3 - 3]
      │    │    ├── (1,)
      │    │    ├── (2,)
      │    │    └── (3,)
      │    └── filters (true)
      └── filters (true)

Jira issue: CRDB-14777

@mgartner mgartner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 4, 2022
@mgartner mgartner self-assigned this Apr 4, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Apr 4, 2022
@mgartner
Copy link
Collaborator Author

mgartner commented Apr 4, 2022

This can cause incorrect query results:

statement ok
CREATE TABLE t79384a (
  k INT NOT NULL
)

statement ok
CREATE TABLE t79384b (
  a INT,
  b INT,
  c INT,
  INDEX (a, b, c)
)

statement ok
INSERT INTO t79384a VALUES (1)

statement ok
INSERT INTO t79384b VALUES (1, 1, 1)

# Tables t1 and t2 have a single row each, so this should never return more than
# one row.
query I
SELECT k FROM t79384a INNER LOOKUP JOIN t79384b ON k = a AND b IN (1, 2, 3) AND c > 0
----
1
1
1

mgartner added a commit to mgartner/cockroach that referenced this issue Apr 4, 2022
This commit fixes a bug that caused unnecessary cross-joins on the input
of lookup joins, causing both suboptimal query plans and incorrect query
results. The bug only affected lookup joins with lookup expressions.

Fixes cockroachdb#79384

Release note (bug fix): A bug has been fixed that caused the optimizer
to generate query plans with logically incorrect lookup joins. The bug
could only occur in queries with an inner join, e.g.,
`t1 INNER JOIN t2`, if all of the following were true:
  1. The join contained an equality condition between columns of both
     tables, e.g., `t1.a = t2.a`.
  2. A filter condition constrained a column to a set of specific
     values, e.g., `t2.b IN (1, 2, 3)`.
  3. A filter condition constrained a column to a range, e.g.,
     `t2.c > 0`.
  4. An index contained a column from each of the criteria above, e.g.,
     `INDEX t2(a, b, c)`.
This bug has been present since version 21.2.0.
craig bot pushed a commit that referenced this issue Apr 5, 2022
77742: sql: implement SHOW [ALL] CLUSTER SETTINGS FOR TENANT r=rafiss a=knz

All commits but the last 2 from #77740.
(Reviewers: only the last 2 commits belong to this PR.)

Informs #77471

Release justification: low risk, high benefit changes to existing functionality

79260: changefeedccl, backupresolver: refactor to hold on to mapping of target to descriptor r=[miretskiy,dt] a=HonoreDB


Changefeed statements need to resolve a bunch of table names at once,
 but unlike backups and grants they need to know which returned
descriptor corresponded to which input because they (now) take
target-specific options. We were reconstructing this awkwardly on
the calling side. This PR adds an optional parameter to the
 backupresolver method being used so that it can track which
 descriptor belongs to which input.

I'm probably being overly polite by making this optional,
but hey, it is a little extra memory footprint and not my package.

Release note: None

79324: changefeedccl: unify initial_scan option syntax r=sherman-grewal a=sherman-grewal

Resolves #79324

Currently, we have explicit options for each possible
behaviour that a user would like to achieve for
initial scans on changefeeds. For instance, a user
could specify:

- initial_scan
- no_initial_scan
- initial_scan_only

This seems a bit sprawling, and can inadvertently cause
contradictions in a changefeed statement. Hence, in this
PR we extend the option `initial_scan` to take on three
possible values: `'yes|no|only'`. Once this change
is made we will remove the explicit options from the
docs, but we will keep these options for backwards
compatibility.

Release note (enterprise change): Unify the syntax that
allows users to define the behaviour they would like
for initial scans on changefeeds by extending the
`initial_scan` option to take on three possible values:
`'yes|no|only'`.

Release justification: Small, safe refactor that will
improve the user experience when creating changefeeds.

Jira issue: CRDB-14693

79389: opt: do not generate unnecessary cross-joins on join input r=mgartner a=mgartner

#### opt: do not generate unnecessary cross-joins on lookup join input

This commit fixes a bug that caused unnecessary cross-joins on the input
of lookup joins, causing both suboptimal query plans and incorrect query
results. The bug only affected lookup joins with lookup expressions.

Fixes #79384

Release note (bug fix): A bug has been fixed that caused the optimizer
to generate query plans with logically incorrect lookup joins. The bug
can only occur in queries with an inner join, e.g., `t1 JOIN t2`, if all
of the following are true:
  1. The join contains an equality condition between columns of both
     tables, e.g., `t1.a = t2.a`.
  2. A query filter or `CHECK` constraint constrains a column to a set
     of specific values, e.g., `t2.b IN (1, 2, 3)`. In the case of a
     `CHECK` constraint, the column must be `NOT NULL`.
  3. A query filter or `CHECK` constraint constrains a column to a
     range, e.g., `t2.c > 0`. In the case of a `CHECK` constraint, the
     column must be `NOT NULL`.
  4. An index contains a column from each of the criteria above, e.g.,
     `INDEX t2(a, b, c)`.
This bug has been present since version 21.2.0.

#### opt: do not cross-join input of inverted semi-join

In #78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing #78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a `@>` b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.


79454: docs: update alter changefeed diagram r=ericharmeling a=kathancox

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Aaron Zinger <[email protected]>
Co-authored-by: Sherman Grewal <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Kathryn Hancox <[email protected]>
@craig craig bot closed this as completed in 3bf035f Apr 5, 2022
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 6, 2022
This commit fixes a bug that caused unnecessary cross-joins on the input
of lookup joins, causing both suboptimal query plans and incorrect query
results. The bug only affected lookup joins with lookup expressions.

Fixes cockroachdb#79384

Release note (bug fix): A bug has been fixed that caused the optimizer
to generate query plans with logically incorrect lookup joins. The bug
can only occur in queries with an inner join, e.g., `t1 JOIN t2`, if all
of the following are true:
  1. The join contains an equality condition between columns of both
     tables, e.g., `t1.a = t2.a`.
  2. A query filter or `CHECK` constraint constrains a column to a set
     of specific values, e.g., `t2.b IN (1, 2, 3)`. In the case of a
     `CHECK` constraint, the column must be `NOT NULL`.
  3. A query filter or `CHECK` constraint constrains a column to a
     range, e.g., `t2.c > 0`. In the case of a `CHECK` constraint, the
     column must be `NOT NULL`.
  4. An index contains a column from each of the criteria above, e.g.,
     `INDEX t2(a, b, c)`.
This bug has been present since version 21.2.0.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 6, 2022
This commit fixes a bug that caused unnecessary cross-joins on the input
of lookup joins, causing both suboptimal query plans and incorrect query
results. The bug only affected lookup joins with lookup expressions.

Fixes cockroachdb#79384

Release note (bug fix): A bug has been fixed that caused the optimizer
to generate query plans with logically incorrect lookup joins. The bug
can only occur in queries with an inner join, e.g., `t1 JOIN t2`, if all
of the following are true:
  1. The join contains an equality condition between columns of both
     tables, e.g., `t1.a = t2.a`.
  2. A query filter or `CHECK` constraint constrains a column to a set
     of specific values, e.g., `t2.b IN (1, 2, 3)`. In the case of a
     `CHECK` constraint, the column must be `NOT NULL`.
  3. A query filter or `CHECK` constraint constrains a column to a
     range, e.g., `t2.c > 0`. In the case of a `CHECK` constraint, the
     column must be `NOT NULL`.
  4. An index contains a column from each of the criteria above, e.g.,
     `INDEX t2(a, b, c)`.
This bug has been present since version 21.2.0.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 7, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit the some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
    test was added for this.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 8, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit the some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  4. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 11, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit the some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  4. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 11, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit has some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  4. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 12, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit has some other notable changes to the lookup join generation
logic:

  1. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  2. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  3. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
craig bot pushed a commit that referenced this issue Apr 12, 2022
79586: opt: do not build cross-joins on inputs of lookup joins r=mgartner a=mgartner

#### opt: do not build cross-joins on inputs of lookup joins

Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see #78681 and #79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see #66726).

This commit has some other notable changes to the lookup join generation
logic:

  1. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  2. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  3. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
@rytaft rytaft added C-technical-advisory Caused a technical advisory branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 branch-release-21.1 Used to mark GA and release blockers, technical advisories, and bugs for 21.1 labels Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-21.1 Used to mark GA and release blockers, technical advisories, and bugs for 21.1 branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants