Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: do not cross-join input of semi-join #78685

Merged
merged 1 commit into from
Mar 29, 2022

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Mar 28, 2022

This commit fixes a logical correctness bug caused when
GenerateLookupJoins cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see #59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);,
2) the inner table has an index containing the equality column, like
t2.a in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
NOT NULL and are constrained to a set of constant values via a CHECK
constraint or an IN condition in the filter.

@mgartner mgartner requested review from rytaft and a team March 28, 2022 23:47
@mgartner mgartner requested a review from a team as a code owner March 28, 2022 23:47
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/xform/join_funcs.go, line 461 at r1 (raw file):

				if joinType == opt.LeftJoinOp || joinType == opt.SemiJoinOp || joinType == opt.AntiJoinOp {
					// We cannot use the method constructJoinWithConstants to create a cross
					// join for left or anti joins, because constructing a cross join with

nit: also add semi join to this comment

Code quote:

// join for left or anti joins, because constructing a cross join with

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

nit: formatting of the release note is a little messed up (missing backtick maybe?)

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)

@mgartner mgartner force-pushed the 78681-fix-bad-semi-joins branch from 68d87ef to 60cc2a4 Compare March 29, 2022 00:06
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting of the release note is a little messed up (missing backtick maybe?)

Fixed.

I also added a logic test.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)


pkg/sql/opt/xform/join_funcs.go, line 461 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

nit: also add semi join to this comment

Done.

@mgartner mgartner force-pushed the 78681-fix-bad-semi-joins branch from 60cc2a4 to 98a4d31 Compare March 29, 2022 13:55
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)


-- commits, line 19 at r2:
nit: this makes it sound a bit like it can happen if any of them are true. I'd say "can appear if all of the following conditions are true"


-- commits, line 21 at r2:
nit: t2.a`); -> t2.a);` (fix in the PR description too)

This commit fixes a logical correctness bug caused when
`GenerateLookupJoins` cross-joins the input of a semi-join with a set of
constant values to constrain the prefix columns of the lookup index. The
cross-join is an invalid transformation because it increases the size of
the join's input and can increase the size of the join's output.

We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646).
When addressing those cases, the semi-join case was incorrectly assumed
to be safe.

Fixes cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true: 1) the query
contains a semi-join, such as queries in the form:
`SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`,
2) the inner table has an index containing the equality column, like
`t2.a` in the example query, 3) the index contains one or more
columns that prefix the equality column, and 4) the prefix columns are
`NOT NULL` and are constrained to a set of constant values via a `CHECK`
constraint or an `IN` condition in the filter.
@mgartner mgartner force-pushed the 78681-fix-bad-semi-joins branch from 98a4d31 to 1d7811d Compare March 29, 2022 14:52
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @mgartner)


-- commits, line 19 at r2:

Previously, rytaft (Rebecca Taft) wrote…

nit: this makes it sound a bit like it can happen if any of them are true. I'd say "can appear if all of the following conditions are true"

Done.


-- commits, line 21 at r2:

Previously, rytaft (Rebecca Taft) wrote…

nit: t2.a`); -> t2.a);` (fix in the PR description too)

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @mgartner)

@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

mgartner added a commit to mgartner/cockroach that referenced this pull request Mar 29, 2022
PR cockroachdb#78685 changes the query plan of one query in a logic test in a way
that makes the test flakey. This commit guarantees that the test cannot
be flakey, regardless of the query plan.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Mar 29, 2022
PR cockroachdb#78685 changes the query plan of one query in a logic test in a way
that makes the test flakey. This commit guarantees that the test cannot
be flakey, regardless of the query plan.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Mar 29, 2022
PR cockroachdb#78685 changes the query plan of one query in a logic test in a way
that makes the test flakey. This commit guarantees that the test cannot
be flakey, regardless of the query plan.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Mar 29, 2022
PR cockroachdb#78685 changes the query plan of one query in a logic test in a way
that makes the test flakey. This commit guarantees that the test cannot
be flakey, regardless of the query plan.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Mar 29, 2022
PR cockroachdb#78685 changes the query plan of one query in a logic test in a way
that makes the test flakey. This commit guarantees that the test cannot
be flakey, regardless of the query plan.

Release note: None
@craig
Copy link
Contributor

craig bot commented Mar 29, 2022

Build succeeded:

@craig craig bot merged commit 897c2da into cockroachdb:master Mar 29, 2022
@blathers-crl
Copy link

blathers-crl bot commented Mar 29, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 1d7811d to blathers/backport-release-21.1-78685: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.1.x failed. See errors above.


error creating merge commit from 1d7811d to blathers/backport-release-21.2-78685: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

craig bot pushed a commit that referenced this pull request Mar 29, 2022
78704: sql: propagate limit for top K sort correctly in tests r=yuzefovich a=yuzefovich

In 22.1 time frame we started propagating the value of K for top K sort
in the spec of the processor, and not in the post-processing spec, but
we forgot to update some of the tests accordingly.

Informs: #78592.

Release note: None

78949: kvserver: gossip l0sublevels instead of read amp r=kvoli a=kvoli

Previously read amplification was gossipped among stores to enable
future allocation decisions that would avoid candidates with high read
amplification. L0 Sublevels represents the number of levels with L0 and
is a portion of read amplification. This patch change read amplification
to l0 sublevels, as it is a better indicator of store health.

Release justification: low risk, replace deprecated gossip signal.

Release note: None

78984: sql: deflake unique logic test r=mgartner a=mgartner

PR #78685 changes the query plan of one query in a logic test in a way
that makes the test flakey. This commit guarantees that the test cannot
be flakey, regardless of the query plan.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
@mgartner mgartner deleted the 78681-fix-bad-semi-joins branch March 29, 2022 21:15
fqazi pushed a commit to fqazi/cockroach that referenced this pull request Apr 4, 2022
PR cockroachdb#78685 changes the query plan of one query in a logic test in a way
that makes the test flakey. This commit guarantees that the test cannot
be flakey, regardless of the query plan.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 5, 2022
In cockroachdb#78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing cockroachdb#78681. This commit
address the same issue with `GenerateInvertedJoins`, which we originally
forgot to fix.

Informs cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 5, 2022
In cockroachdb#78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing cockroachdb#78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
craig bot pushed a commit that referenced this pull request Apr 5, 2022
77742: sql: implement SHOW [ALL] CLUSTER SETTINGS FOR TENANT r=rafiss a=knz

All commits but the last 2 from #77740.
(Reviewers: only the last 2 commits belong to this PR.)

Informs #77471

Release justification: low risk, high benefit changes to existing functionality

79260: changefeedccl, backupresolver: refactor to hold on to mapping of target to descriptor r=[miretskiy,dt] a=HonoreDB


Changefeed statements need to resolve a bunch of table names at once,
 but unlike backups and grants they need to know which returned
descriptor corresponded to which input because they (now) take
target-specific options. We were reconstructing this awkwardly on
the calling side. This PR adds an optional parameter to the
 backupresolver method being used so that it can track which
 descriptor belongs to which input.

I'm probably being overly polite by making this optional,
but hey, it is a little extra memory footprint and not my package.

Release note: None

79324: changefeedccl: unify initial_scan option syntax r=sherman-grewal a=sherman-grewal

Resolves #79324

Currently, we have explicit options for each possible
behaviour that a user would like to achieve for
initial scans on changefeeds. For instance, a user
could specify:

- initial_scan
- no_initial_scan
- initial_scan_only

This seems a bit sprawling, and can inadvertently cause
contradictions in a changefeed statement. Hence, in this
PR we extend the option `initial_scan` to take on three
possible values: `'yes|no|only'`. Once this change
is made we will remove the explicit options from the
docs, but we will keep these options for backwards
compatibility.

Release note (enterprise change): Unify the syntax that
allows users to define the behaviour they would like
for initial scans on changefeeds by extending the
`initial_scan` option to take on three possible values:
`'yes|no|only'`.

Release justification: Small, safe refactor that will
improve the user experience when creating changefeeds.

Jira issue: CRDB-14693

79389: opt: do not generate unnecessary cross-joins on join input r=mgartner a=mgartner

#### opt: do not generate unnecessary cross-joins on lookup join input

This commit fixes a bug that caused unnecessary cross-joins on the input
of lookup joins, causing both suboptimal query plans and incorrect query
results. The bug only affected lookup joins with lookup expressions.

Fixes #79384

Release note (bug fix): A bug has been fixed that caused the optimizer
to generate query plans with logically incorrect lookup joins. The bug
can only occur in queries with an inner join, e.g., `t1 JOIN t2`, if all
of the following are true:
  1. The join contains an equality condition between columns of both
     tables, e.g., `t1.a = t2.a`.
  2. A query filter or `CHECK` constraint constrains a column to a set
     of specific values, e.g., `t2.b IN (1, 2, 3)`. In the case of a
     `CHECK` constraint, the column must be `NOT NULL`.
  3. A query filter or `CHECK` constraint constrains a column to a
     range, e.g., `t2.c > 0`. In the case of a `CHECK` constraint, the
     column must be `NOT NULL`.
  4. An index contains a column from each of the criteria above, e.g.,
     `INDEX t2(a, b, c)`.
This bug has been present since version 21.2.0.

#### opt: do not cross-join input of inverted semi-join

In #78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing #78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a `@>` b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.


79454: docs: update alter changefeed diagram r=ericharmeling a=kathancox

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Aaron Zinger <[email protected]>
Co-authored-by: Sherman Grewal <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Kathryn Hancox <[email protected]>
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 6, 2022
In cockroachdb#78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing cockroachdb#78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 6, 2022
In cockroachdb#78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing cockroachdb#78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs cockroachdb#78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
mgartner added a commit that referenced this pull request Apr 6, 2022
In #78685, we prevented `GenerateLookupJoins` from incorrect creating a
cross-join on the input of a semi-join, addressing #78681. This commit
addresses the same issue with `GenerateInvertedJoins`, which we
originally forgot to fix.

Informs #78681

Release note (bug fix): A bug has been fixed which caused the optimizer
to generate invalid query plans which could result in incorrect query
results. The bug, which has been present since version 21.1.0, can
appear if all of the following conditions are true:
  1. The query contains a semi-join, such as queries in the form
     `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a @> b.b)`.
  2. The inner table has a multi-column inverted index containing the
     inverted column in the filter.
  3. The index prefix columns are constrained to a set of values via the
     filter or a `CHECK` constraint, e.g., with an `IN` operator. In the
     case of a `CHECK` constraint, the column is `NOT NULL`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: incorrect results due to cross-joining input of semi-join
5 participants