Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: incorrect results due to cross-joining input of semi-join #78681

Closed
mgartner opened this issue Mar 28, 2022 · 0 comments · Fixed by #78685
Closed

opt: incorrect results due to cross-joining input of semi-join #78681

mgartner opened this issue Mar 28, 2022 · 0 comments · Fixed by #78685
Assignees
Labels
branch-release-21.1 Used to mark GA and release blockers, technical advisories, and bugs for 21.1 branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Mar 28, 2022

GenerateLookupJoins can cross-join the input of the join with a set of constant values, from a CHECK constraint, in order to constrain index prefix columns. This enables lookup joins to be generated in more cases. It is valid for inner joins, but it is not valid for left, anti, and semi joins. We currently disable this only for left and anti-joins, but not semi-joins. This incorrect transformation can cause incorrect query results.

Example:

CREATE TABLE a (
  a INT NOT NULL
);

CREATE TABLE bc (
  b INT NOT NULL,
  c INT NOT NULL,
  CHECK (b IN (1, 2, 3)),
  INDEX (b, c)
);

# Insert stats so that a lookup semi-join is selected.
statement ok
ALTER TABLE a INJECT STATISTICS '[
  {
    "columns": ["a"],
    "created_at": "2018-05-01 1:00:00.00000+00:00",
    "row_count": 10,
    "distinct_count": 10
  }
]';
ALTER TABLE bc INJECT STATISTICS '[
  {
    "columns": ["b"],
    "created_at": "2018-05-01 1:00:00.00000+00:00",
    "row_count": 10000000,
    "distinct_count": 3
  }
]'

INSERT INTO a VALUES (1);

INSERT INTO bc VALUES (1, 1), (2, 1), (3, 1);

SELECT a FROM a WHERE EXISTS (SELECT * FROM bc WHERE a = c);
----
1
1
1

The query plan reveals the problem. Notice how the input to the semi-join is cross-joined with all the possible values of b. This incorrectly increases the number of rows output by the join.

EXPLAIN (OPT, VERBOSE) SELECT a FROM a WHERE EXISTS (SELECT * FROM bc WHERE a =
c);
----
semi-join (lookup bc@bc_b_c_idx)
 ├── columns: a:1
 ├── key columns: [11 1] = [5 6]
 ├── stats: [rows=10, distinct(1)=10, null(1)=0, avgsize(1)=4]
 ├── cost: 844.7775
 ├── distribution: test
 ├── inner-join (cross)
 │    ├── columns: a:1 "lookup_join_const_col_@5":11
 │    ├── multiplicity: left-rows(one-or-more), right-rows(zero-or-more)
 │    ├── stats: [rows=30, distinct(1)=10, null(1)=0, avgsize(1)=4, distinct(11)=3, null(11)=0, avgsize(11)=4]
 │    ├── cost: 35.7575
 │    ├── distribution: test
 │    ├── scan a
 │    │    ├── columns: a:1
 │    │    ├── stats: [rows=10, distinct(1)=10, null(1)=0, avgsize(1)=4]
 │    │    ├── cost: 35.22
 │    │    ├── distribution: test
 │    │    ├── prune: (1)
 │    │    └── unfiltered-cols: (1-4)
 │    ├── values
 │    │    ├── columns: "lookup_join_const_col_@5":11
 │    │    ├── cardinality: [3 - 3]
 │    │    ├── stats: [rows=3, distinct(11)=3, null(11)=0, avgsize(11)=4]
 │    │    ├── cost: 0.04
 │    │    ├── distribution: test
 │    │    ├── (1,)
 │    │    ├── (2,)
 │    │    └── (3,)
 │    └── filters (true)
 └── filters (true)

Version Affected

This bug has been present since version 21.1.0 and affects all later releases.

Workaround

The best known workaround is 1) determine the prefix columns in the lookup join for which the cross join is created (b in the example above) and 2) make that column nullable or remove the CHECK constraint. This will prevent the incorrect lookup join from being generated.

Jira issue: CRDB-14215

gz#11864

@mgartner mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-21.1 branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 labels Mar 28, 2022
@mgartner mgartner self-assigned this Mar 28, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Mar 28, 2022
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 28, 2022
This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear under all of the following conditions: 1) the query contains a
semi-join, such as queries in the form:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t2 WHERE t.a = t2.a`)`;
2) the inner table had an index containing the equality column, like
`t2.a` in the example query; 3) the index contained one or more
columns that prefix the equality column; and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 29, 2022
This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear under all of the following conditions: 1) the query contains a
semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a`);
2) the inner table has an index containing the equality column, like
`t2.a` in the example query; 3) the index contains one or more
columns that prefix the equality column; and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 29, 2022
This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`,
2) the inner table has an index containing the equality column, like
`t2.a` in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 29, 2022
This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`,
2) the inner table has an index containing the equality column, like
`t2.a` in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
craig bot pushed a commit that referenced this issue Mar 29, 2022
78418: workload: support fixture import in multi-tenant mode r=nvanbenschoten a=nvanbenschoten

Fixes #75449.

This commit fixes fixture import into tenants, which do not have access to the `crdb_internal.gossip_liveness` table. This fixes the following:
- `[cockroach] workload fixtures import`
- `[cockroach] workload init --data-loader=IMPORT`

Release justification: low-risk change that improves workload functionality.

78626: backupccl: add deprecation warning for SHOW BACKUP without subdir r=adityamaru,benbardin a=msbutler

Informs: #78153

Release note (sql change): The `SHOW BACKUP` cmd  without the `IN` keyword to
specify a subdirectory is deprecated and will be removed from a future release.
Users are recommended to only create collection based backups and view them
with `SHOW BACKUP FROM <backup> IN <collection>`

78653: sql: add logging for grant_role r=RichardJCai a=RichardJCai

GRANT ROLE now logs to the logpb.Channel_USER_ADMIN channel.
Example: GRANT role1, role2 TO role3, role4 logs
"RoleNames":"‹role1›",‹role2›","Members":"‹role3›",‹ role4›"

Release note: None

Fixes #78245

78685: opt: do not cross-join input of semi-join r=mgartner a=mgartner

This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see #59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`,
2) the inner table has an index containing the equality column, like
`t2.a` in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.


78940: backupccl: fix nil pointer exception in restore OnFailOrCancel r=dt,stevendanna a=adityamaru

The execCfg field on the restore resumer was not being set correctly
in the OnFailOrCancel hook. This would lead to a nil pointer exception
during cleanup.

Other jobs such as import, backup use the execCfg directly off the
JobExecContext instead of storing it on the resumer. An alternative to
this fix could be to change all `r.ExecCfg` to use the execCfg on the JobExecContext.

Fixes: #76720

Release note (bug fix): Fixes a nil pointer exception during the
cleanup of a failed or cancelled restore job.

78962: kvserver: make the StoreRebalancer interval a cluster setting r=aayushshah15 a=aayushshah15

Release note (ops change): the `kv.allocator.load_based_rebalancing_interval`
cluster setting now lets operators choose the interval at which each store in the
cluster will check for load-based lease or replica rebalancing opportunities.


78971: sql, migration: skip flaky TestPublicSchemaMigration test r=ajwerner a=RichardJCai

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: richardjcai <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Aayush Shah <[email protected]>
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 29, 2022
This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`,
2) the inner table has an index containing the equality column, like
`t2.a` in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
@craig craig bot closed this as completed in 1d7811d Mar 29, 2022
blathers-crl bot pushed a commit that referenced this issue Mar 29, 2022
This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see #59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`,
2) the inner table has an index containing the equality column, like
`t2.a` in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
mgartner added a commit that referenced this issue Mar 29, 2022
This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see #59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`,
2) the inner table has an index containing the equality column, like
`t2.a` in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
fqazi pushed a commit to fqazi/cockroach that referenced this issue Apr 4, 2022
This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`,
2) the inner table has an index containing the equality column, like
`t2.a` in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 5, 2022
In cockroachdb#78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing cockroachdb#78681. This commit
address the same issue with `GenerateInvertedJoins`, which we originally
forgot to fix.

Informs cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 5, 2022
In cockroachdb#78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing cockroachdb#78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
craig bot pushed a commit that referenced this issue Apr 5, 2022
77742: sql: implement SHOW [ALL] CLUSTER SETTINGS FOR TENANT r=rafiss a=knz

All commits but the last 2 from #77740.
(Reviewers: only the last 2 commits belong to this PR.)

Informs #77471

Release justification: low risk, high benefit changes to existing functionality

79260: changefeedccl, backupresolver: refactor to hold on to mapping of target to descriptor r=[miretskiy,dt] a=HonoreDB


Changefeed statements need to resolve a bunch of table names at once,
 but unlike backups and grants they need to know which returned
descriptor corresponded to which input because they (now) take
target-specific options. We were reconstructing this awkwardly on
the calling side. This PR adds an optional parameter to the
 backupresolver method being used so that it can track which
 descriptor belongs to which input.

I'm probably being overly polite by making this optional,
but hey, it is a little extra memory footprint and not my package.

Release note: None

79324: changefeedccl: unify initial_scan option syntax r=sherman-grewal a=sherman-grewal

Resolves #79324

Currently, we have explicit options for each possible
behaviour that a user would like to achieve for
initial scans on changefeeds. For instance, a user
could specify:

- initial_scan
- no_initial_scan
- initial_scan_only

This seems a bit sprawling, and can inadvertently cause
contradictions in a changefeed statement. Hence, in this
PR we extend the option `initial_scan` to take on three
possible values: `'yes|no|only'`. Once this change
is made we will remove the explicit options from the
docs, but we will keep these options for backwards
compatibility.

Release note (enterprise change): Unify the syntax that
allows users to define the behaviour they would like
for initial scans on changefeeds by extending the
`initial_scan` option to take on three possible values:
`'yes|no|only'`.

Release justification: Small, safe refactor that will
improve the user experience when creating changefeeds.

Jira issue: CRDB-14693

79389: opt: do not generate unnecessary cross-joins on join input r=mgartner a=mgartner

#### opt: do not generate unnecessary cross-joins on lookup join input

This commit fixes a bug that caused unnecessary cross-joins on the input
of lookup joins, causing both suboptimal query plans and incorrect query
results. The bug only affected lookup joins with lookup expressions.

Fixes #79384

Release note (bug fix): A bug has been fixed that caused the optimizer
to generate query plans with logically incorrect lookup joins. The bug
can only occur in queries with an inner join, e.g., `t1 JOIN t2`, if all
of the following are true:
  1. The join contains an equality condition between columns of both
     tables, e.g., `t1.a = t2.a`.
  2. A query filter or `CHECK` constraint constrains a column to a set
     of specific values, e.g., `t2.b IN (1, 2, 3)`. In the case of a
     `CHECK` constraint, the column must be `NOT NULL`.
  3. A query filter or `CHECK` constraint constrains a column to a
     range, e.g., `t2.c > 0`. In the case of a `CHECK` constraint, the
     column must be `NOT NULL`.
  4. An index contains a column from each of the criteria above, e.g.,
     `INDEX t2(a, b, c)`.
This bug has been present since version 21.2.0.

#### opt: do not cross-join input of inverted semi-join

In #78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing #78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a `@>` b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.


79454: docs: update alter changefeed diagram r=ericharmeling a=kathancox

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Aaron Zinger <[email protected]>
Co-authored-by: Sherman Grewal <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Kathryn Hancox <[email protected]>
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 6, 2022
In cockroachdb#78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing cockroachdb#78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 6, 2022
In cockroachdb#78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing cockroachdb#78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
mgartner added a commit that referenced this issue Apr 6, 2022
In #78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing #78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 7, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit the some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
    test was added for this.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 8, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit the some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  4. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 11, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit the some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  4. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 11, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit has some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  4. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 12, 2022
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit has some other notable changes to the lookup join generation
logic:

  1. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  2. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  3. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
craig bot pushed a commit that referenced this issue Apr 12, 2022
79586: opt: do not build cross-joins on inputs of lookup joins r=mgartner a=mgartner

#### opt: do not build cross-joins on inputs of lookup joins

Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see #78681 and #79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see #66726).

This commit has some other notable changes to the lookup join generation
logic:

  1. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  2. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  3. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
@rytaft rytaft added C-technical-advisory Caused a technical advisory branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 branch-release-21.1 Used to mark GA and release blockers, technical advisories, and bugs for 21.1 and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 labels Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-21.1 Used to mark GA and release blockers, technical advisories, and bugs for 21.1 branch-release-21.2 Used to mark GA and release blockers, technical advisories, and bugs for 21.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants