-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fix bug with incorrect results returned by left lookup/inverted join #59646
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…join This commit fixes a bug in which incorrect results could be returned for left and anti lookup and inverted joins. This could happen when one of the prefix columns was constrained to more than one constant value, either due to a check constraint or an IN clause. The current implementation used to support lookup and inverted joins for this case creates a cross-join between the constant values and the input, thus artifically increasing the size of the input to the join. This works for inner and semi joins, but it is not correct for left or anti joins, because non-matching input rows will be returned multiple times in the output, which is incorrect. This commit fixes the bug by avoiding creating lookup or inverted joins in these cases. However, this is suboptimal for performance, and in a future commit we should try to find a way to create left/anti inverted/lookup joins safely for these cases. Fixes cockroachdb#59615 Release note (bug fix): Fixed a bug in which incorrect results could be returned for left and anti joins. This could happen when one of the columns on one side of the join was constrained to multiple constant values, either due to a check constraint or an IN clause. The bug resulted in non-matching input rows getting returned multiple times in the output, which is incorrect. This bug only affected previous alpha releases of 21.1, and has now been fixed.
mgartner
approved these changes
Feb 1, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
TFTR! bors r+ |
Build succeeded: |
mgartner
added a commit
to mgartner/cockroach
that referenced
this pull request
Mar 28, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes cockroachdb#78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear under all of the following conditions: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t WHERE EXISTS (SELECT * FROM t2 WHERE t.a = t2.a`)`; 2) the inner table had an index containing the equality column, like `t2.a` in the example query; 3) the index contained one or more columns that prefix the equality column; and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
mgartner
added a commit
to mgartner/cockroach
that referenced
this pull request
Mar 29, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes cockroachdb#78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear under all of the following conditions: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a`); 2) the inner table has an index containing the equality column, like `t2.a` in the example query; 3) the index contains one or more columns that prefix the equality column; and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
mgartner
added a commit
to mgartner/cockroach
that referenced
this pull request
Mar 29, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes cockroachdb#78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`, 2) the inner table has an index containing the equality column, like `t2.a` in the example query, 3) the index contains one or more columns that prefix the equality column, and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
mgartner
added a commit
to mgartner/cockroach
that referenced
this pull request
Mar 29, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes cockroachdb#78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`, 2) the inner table has an index containing the equality column, like `t2.a` in the example query, 3) the index contains one or more columns that prefix the equality column, and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
mgartner
added a commit
to mgartner/cockroach
that referenced
this pull request
Mar 29, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes cockroachdb#78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`, 2) the inner table has an index containing the equality column, like `t2.a` in the example query, 3) the index contains one or more columns that prefix the equality column, and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
This was referenced Mar 29, 2022
craig bot
pushed a commit
that referenced
this pull request
Mar 29, 2022
78418: workload: support fixture import in multi-tenant mode r=nvanbenschoten a=nvanbenschoten Fixes #75449. This commit fixes fixture import into tenants, which do not have access to the `crdb_internal.gossip_liveness` table. This fixes the following: - `[cockroach] workload fixtures import` - `[cockroach] workload init --data-loader=IMPORT` Release justification: low-risk change that improves workload functionality. 78626: backupccl: add deprecation warning for SHOW BACKUP without subdir r=adityamaru,benbardin a=msbutler Informs: #78153 Release note (sql change): The `SHOW BACKUP` cmd without the `IN` keyword to specify a subdirectory is deprecated and will be removed from a future release. Users are recommended to only create collection based backups and view them with `SHOW BACKUP FROM <backup> IN <collection>` 78653: sql: add logging for grant_role r=RichardJCai a=RichardJCai GRANT ROLE now logs to the logpb.Channel_USER_ADMIN channel. Example: GRANT role1, role2 TO role3, role4 logs "RoleNames":"‹role1›",‹role2›","Members":"‹role3›",‹ role4›" Release note: None Fixes #78245 78685: opt: do not cross-join input of semi-join r=mgartner a=mgartner This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see #59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes #78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`, 2) the inner table has an index containing the equality column, like `t2.a` in the example query, 3) the index contains one or more columns that prefix the equality column, and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter. 78940: backupccl: fix nil pointer exception in restore OnFailOrCancel r=dt,stevendanna a=adityamaru The execCfg field on the restore resumer was not being set correctly in the OnFailOrCancel hook. This would lead to a nil pointer exception during cleanup. Other jobs such as import, backup use the execCfg directly off the JobExecContext instead of storing it on the resumer. An alternative to this fix could be to change all `r.ExecCfg` to use the execCfg on the JobExecContext. Fixes: #76720 Release note (bug fix): Fixes a nil pointer exception during the cleanup of a failed or cancelled restore job. 78962: kvserver: make the StoreRebalancer interval a cluster setting r=aayushshah15 a=aayushshah15 Release note (ops change): the `kv.allocator.load_based_rebalancing_interval` cluster setting now lets operators choose the interval at which each store in the cluster will check for load-based lease or replica rebalancing opportunities. 78971: sql, migration: skip flaky TestPublicSchemaMigration test r=ajwerner a=RichardJCai Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: richardjcai <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Aayush Shah <[email protected]>
mgartner
added a commit
to mgartner/cockroach
that referenced
this pull request
Mar 29, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes cockroachdb#78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`, 2) the inner table has an index containing the equality column, like `t2.a` in the example query, 3) the index contains one or more columns that prefix the equality column, and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
blathers-crl bot
pushed a commit
that referenced
this pull request
Mar 29, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see #59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes #78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`, 2) the inner table has an index containing the equality column, like `t2.a` in the example query, 3) the index contains one or more columns that prefix the equality column, and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
mgartner
added a commit
that referenced
this pull request
Mar 29, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see #59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes #78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`, 2) the inner table has an index containing the equality column, like `t2.a` in the example query, 3) the index contains one or more columns that prefix the equality column, and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
fqazi
pushed a commit
to fqazi/cockroach
that referenced
this pull request
Apr 4, 2022
This commit fixes a logical correctness bug caused when `GenerateLookupJoins` cross-joins the input of a semi-join with a set of constant values to constrain the prefix columns of the lookup index. The cross-join is an invalid transformation because it increases the size of the join's input and can increase the size of the join's output. We already avoid these cross-joins for left and anti-joins (see cockroachdb#59646). When addressing those cases, the semi-join case was incorrectly assumed to be safe. Fixes cockroachdb#78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1) the query contains a semi-join, such as queries in the form: `SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a);`, 2) the inner table has an index containing the equality column, like `t2.a` in the example query, 3) the index contains one or more columns that prefix the equality column, and 4) the prefix columns are `NOT NULL` and are constrained to a set of constant values via a `CHECK` constraint or an `IN` condition in the filter.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit fixes a bug in which incorrect results could be returned for left
and anti lookup and inverted joins. This could happen when one of the prefix
columns was constrained to more than one constant value, either due to a check
constraint or an
IN
clause. The current implementation used to support lookupand inverted joins for this case creates a cross-join between the constant
values and the input, thus artifically increasing the size of the input to the
join. This works for inner and semi joins, but it is not correct for left or
anti joins, because non-matching input rows will be returned multiple times in
the output, which is incorrect.
This commit fixes the bug by avoiding creating lookup or inverted joins in
these cases. However, this is suboptimal for performance, and in a future
commit we should try to find a way to create left/anti inverted/lookup joins
safely for these cases.
Fixes #59615
Release note (bug fix): Fixed a bug in which incorrect results could be
returned for left and anti joins. This could happen when one of the columns on
one side of the join was constrained to multiple constant values, either due to
a check constraint or an IN clause. The bug resulted in non-matching input rows
getting returned multiple times in the output, which is incorrect. This bug
only affected previous alpha releases of 21.1, and has now been fixed.