-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: unnecessary cross-joins with when generating a lookup join with a lookup expression #79384
Labels
branch-release-21.1
Used to mark GA and release blockers, technical advisories, and bugs for 21.1
branch-release-21.2
Used to mark GA and release blockers, technical advisories, and bugs for 21.2
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-technical-advisory
Caused a technical advisory
T-sql-queries
SQL Queries Team
Comments
mgartner
added
the
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
label
Apr 4, 2022
This can cause incorrect query results:
|
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Apr 4, 2022
This commit fixes a bug that caused unnecessary cross-joins on the input of lookup joins, causing both suboptimal query plans and incorrect query results. The bug only affected lookup joins with lookup expressions. Fixes cockroachdb#79384 Release note (bug fix): A bug has been fixed that caused the optimizer to generate query plans with logically incorrect lookup joins. The bug could only occur in queries with an inner join, e.g., `t1 INNER JOIN t2`, if all of the following were true: 1. The join contained an equality condition between columns of both tables, e.g., `t1.a = t2.a`. 2. A filter condition constrained a column to a set of specific values, e.g., `t2.b IN (1, 2, 3)`. 3. A filter condition constrained a column to a range, e.g., `t2.c > 0`. 4. An index contained a column from each of the criteria above, e.g., `INDEX t2(a, b, c)`. This bug has been present since version 21.2.0.
craig bot
pushed a commit
that referenced
this issue
Apr 5, 2022
77742: sql: implement SHOW [ALL] CLUSTER SETTINGS FOR TENANT r=rafiss a=knz All commits but the last 2 from #77740. (Reviewers: only the last 2 commits belong to this PR.) Informs #77471 Release justification: low risk, high benefit changes to existing functionality 79260: changefeedccl, backupresolver: refactor to hold on to mapping of target to descriptor r=[miretskiy,dt] a=HonoreDB Changefeed statements need to resolve a bunch of table names at once, but unlike backups and grants they need to know which returned descriptor corresponded to which input because they (now) take target-specific options. We were reconstructing this awkwardly on the calling side. This PR adds an optional parameter to the backupresolver method being used so that it can track which descriptor belongs to which input. I'm probably being overly polite by making this optional, but hey, it is a little extra memory footprint and not my package. Release note: None 79324: changefeedccl: unify initial_scan option syntax r=sherman-grewal a=sherman-grewal Resolves #79324 Currently, we have explicit options for each possible behaviour that a user would like to achieve for initial scans on changefeeds. For instance, a user could specify: - initial_scan - no_initial_scan - initial_scan_only This seems a bit sprawling, and can inadvertently cause contradictions in a changefeed statement. Hence, in this PR we extend the option `initial_scan` to take on three possible values: `'yes|no|only'`. Once this change is made we will remove the explicit options from the docs, but we will keep these options for backwards compatibility. Release note (enterprise change): Unify the syntax that allows users to define the behaviour they would like for initial scans on changefeeds by extending the `initial_scan` option to take on three possible values: `'yes|no|only'`. Release justification: Small, safe refactor that will improve the user experience when creating changefeeds. Jira issue: CRDB-14693 79389: opt: do not generate unnecessary cross-joins on join input r=mgartner a=mgartner #### opt: do not generate unnecessary cross-joins on lookup join input This commit fixes a bug that caused unnecessary cross-joins on the input of lookup joins, causing both suboptimal query plans and incorrect query results. The bug only affected lookup joins with lookup expressions. Fixes #79384 Release note (bug fix): A bug has been fixed that caused the optimizer to generate query plans with logically incorrect lookup joins. The bug can only occur in queries with an inner join, e.g., `t1 JOIN t2`, if all of the following are true: 1. The join contains an equality condition between columns of both tables, e.g., `t1.a = t2.a`. 2. A query filter or `CHECK` constraint constrains a column to a set of specific values, e.g., `t2.b IN (1, 2, 3)`. In the case of a `CHECK` constraint, the column must be `NOT NULL`. 3. A query filter or `CHECK` constraint constrains a column to a range, e.g., `t2.c > 0`. In the case of a `CHECK` constraint, the column must be `NOT NULL`. 4. An index contains a column from each of the criteria above, e.g., `INDEX t2(a, b, c)`. This bug has been present since version 21.2.0. #### opt: do not cross-join input of inverted semi-join In #78685, we prevented `GenerateLookupJoins` from incorrect creating a cross-join on the input of a semi-join, addressing #78681. This commit addresses the same issue with `GenerateInvertedJoins`, which we originally forgot to fix. Informs #78681 Release note (bug fix): A bug has been fixed which caused the optimizer to generate invalid query plans which could result in incorrect query results. The bug, which has been present since version 21.1.0, can appear if all of the following conditions are true: 1. The query contains a semi-join, such as queries in the form `SELECT * FROM a WHERE EXISTS (SELECT * FROM b WHERE a.a `@>` b.b)`. 2. The inner table has a multi-column inverted index containing the inverted column in the filter. 3. The index prefix columns are constrained to a set of values via the filter or a `CHECK` constraint, e.g., with an `IN` operator. In the case of a `CHECK` constraint, the column is `NOT NULL`. 79454: docs: update alter changefeed diagram r=ericharmeling a=kathancox Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Aaron Zinger <[email protected]> Co-authored-by: Sherman Grewal <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Kathryn Hancox <[email protected]>
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Apr 6, 2022
This commit fixes a bug that caused unnecessary cross-joins on the input of lookup joins, causing both suboptimal query plans and incorrect query results. The bug only affected lookup joins with lookup expressions. Fixes cockroachdb#79384 Release note (bug fix): A bug has been fixed that caused the optimizer to generate query plans with logically incorrect lookup joins. The bug can only occur in queries with an inner join, e.g., `t1 JOIN t2`, if all of the following are true: 1. The join contains an equality condition between columns of both tables, e.g., `t1.a = t2.a`. 2. A query filter or `CHECK` constraint constrains a column to a set of specific values, e.g., `t2.b IN (1, 2, 3)`. In the case of a `CHECK` constraint, the column must be `NOT NULL`. 3. A query filter or `CHECK` constraint constrains a column to a range, e.g., `t2.c > 0`. In the case of a `CHECK` constraint, the column must be `NOT NULL`. 4. An index contains a column from each of the criteria above, e.g., `INDEX t2(a, b, c)`. This bug has been present since version 21.2.0.
This was referenced Apr 6, 2022
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Apr 6, 2022
This commit fixes a bug that caused unnecessary cross-joins on the input of lookup joins, causing both suboptimal query plans and incorrect query results. The bug only affected lookup joins with lookup expressions. Fixes cockroachdb#79384 Release note (bug fix): A bug has been fixed that caused the optimizer to generate query plans with logically incorrect lookup joins. The bug can only occur in queries with an inner join, e.g., `t1 JOIN t2`, if all of the following are true: 1. The join contains an equality condition between columns of both tables, e.g., `t1.a = t2.a`. 2. A query filter or `CHECK` constraint constrains a column to a set of specific values, e.g., `t2.b IN (1, 2, 3)`. In the case of a `CHECK` constraint, the column must be `NOT NULL`. 3. A query filter or `CHECK` constraint constrains a column to a range, e.g., `t2.c > 0`. In the case of a `CHECK` constraint, the column must be `NOT NULL`. 4. An index contains a column from each of the criteria above, e.g., `INDEX t2(a, b, c)`. This bug has been present since version 21.2.0.
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Apr 7, 2022
Rules that generate lookup joins no longer build cross-joins on the input of the lookup join to constrain an index column to a set of constant values. Instead, a lookup expression is used which generates multiple lookup spans during execution. Building a cross-join added extra complexity to the optimizer logic and has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a cross-join could significantly increase the number of input rows, so it was likely less efficient than using a lookup expression. If an index column is constrained to a single constant value, we still project the value in the input of the lookup join so that it can be used as an equality column. This is more performant than using a lookup expression with an equality condition (see cockroachdb#66726). This commit the some other notable changes to the lookup join generation logic: 1. `inputProjections` is now initialized with a specific capacity to ensure that future appends will not cause allocations. 2. The logic has been reordered. The case when a single constant value is projected is handled before the multiple constant value or range cases. 3. An unnecessary projection is no longer added for single constant values when we revert to using a lookup expression. An explicit test was added for this. Note that this commit does not affect inverted joins. Cross-joins can still be generated on inverted join inputs. Release note: None
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Apr 8, 2022
Rules that generate lookup joins no longer build cross-joins on the input of the lookup join to constrain an index column to a set of constant values. Instead, a lookup expression is used which generates multiple lookup spans during execution. Building a cross-join added extra complexity to the optimizer logic and has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a cross-join could significantly increase the number of input rows, so it was likely less efficient than using a lookup expression. If an index column is constrained to a single constant value, we still project the value in the input of the lookup join so that it can be used as an equality column. This is more performant than using a lookup expression with an equality condition (see cockroachdb#66726). This commit the some other notable changes to the lookup join generation logic: 1. `inputProjections` is now initialized with a specific capacity to ensure that future appends will not cause allocations. 2. The logic has been reordered. The case when a single constant value is projected is handled before the multiple constant value or range cases. 3. An unnecessary projection is no longer added for single constant values when we revert to using a lookup expression. An explicit test was added for this. 4. Fix `isCanonicalLookupJoinFilter` so that expressions with inequalities and either an equality or `IN` are not used as lookup expressions. The `multiSpanGenerator` does not support these types of expressions, and would panic if it encountered them. This bug has existed for a while, but has been hidden because of the preference for the cross-join strategy. Note that there are several cases where a lookup join contains filters in the `ON` expression that are redundant with the lookup expression. I'm planning on addressing this in a future commit. Note that this commit does not affect inverted joins. Cross-joins can still be generated on inverted join inputs. Release note: None
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Apr 11, 2022
Rules that generate lookup joins no longer build cross-joins on the input of the lookup join to constrain an index column to a set of constant values. Instead, a lookup expression is used which generates multiple lookup spans during execution. Building a cross-join added extra complexity to the optimizer logic and has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a cross-join could significantly increase the number of input rows, so it was likely less efficient than using a lookup expression. If an index column is constrained to a single constant value, we still project the value in the input of the lookup join so that it can be used as an equality column. This is more performant than using a lookup expression with an equality condition (see cockroachdb#66726). This commit the some other notable changes to the lookup join generation logic: 1. `inputProjections` is now initialized with a specific capacity to ensure that future appends will not cause allocations. 2. The logic has been reordered. The case when a single constant value is projected is handled before the multiple constant value or range cases. 3. An unnecessary projection is no longer added for single constant values when we revert to using a lookup expression. An explicit test was added for this. 4. Fix `isCanonicalLookupJoinFilter` so that expressions with inequalities and either an equality or `IN` are not used as lookup expressions. The `multiSpanGenerator` does not support these types of expressions, and would panic if it encountered them. This bug has existed for a while, but has been hidden because of the preference for the cross-join strategy. Note that there are several cases where a lookup join contains filters in the `ON` expression that are redundant with the lookup expression. I'm planning on addressing this in a future commit. Note that this commit does not affect inverted joins. Cross-joins can still be generated on inverted join inputs. Release note: None
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Apr 11, 2022
Rules that generate lookup joins no longer build cross-joins on the input of the lookup join to constrain an index column to a set of constant values. Instead, a lookup expression is used which generates multiple lookup spans during execution. Building a cross-join added extra complexity to the optimizer logic and has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a cross-join could significantly increase the number of input rows, so it was likely less efficient than using a lookup expression. If an index column is constrained to a single constant value, we still project the value in the input of the lookup join so that it can be used as an equality column. This is more performant than using a lookup expression with an equality condition (see cockroachdb#66726). This commit has some other notable changes to the lookup join generation logic: 1. `inputProjections` is now initialized with a specific capacity to ensure that future appends will not cause allocations. 2. The logic has been reordered. The case when a single constant value is projected is handled before the multiple constant value or range cases. 3. An unnecessary projection is no longer added for single constant values when we revert to using a lookup expression. An explicit test was added for this. 4. Fix `isCanonicalLookupJoinFilter` so that expressions with inequalities and either an equality or `IN` are not used as lookup expressions. The `multiSpanGenerator` does not support these types of expressions, and would panic if it encountered them. This bug has existed for a while, but has been hidden because of the preference for the cross-join strategy. Note that there are several cases where a lookup join contains filters in the `ON` expression that are redundant with the lookup expression. I'm planning on addressing this in a future commit. Note that this commit does not affect inverted joins. Cross-joins can still be generated on inverted join inputs. Release note: None
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Apr 12, 2022
Rules that generate lookup joins no longer build cross-joins on the input of the lookup join to constrain an index column to a set of constant values. Instead, a lookup expression is used which generates multiple lookup spans during execution. Building a cross-join added extra complexity to the optimizer logic and has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a cross-join could significantly increase the number of input rows, so it was likely less efficient than using a lookup expression. If an index column is constrained to a single constant value, we still project the value in the input of the lookup join so that it can be used as an equality column. This is more performant than using a lookup expression with an equality condition (see cockroachdb#66726). This commit has some other notable changes to the lookup join generation logic: 1. The logic has been reordered. The case when a single constant value is projected is handled before the multiple constant value or range cases. 2. An unnecessary projection is no longer added for single constant values when we revert to using a lookup expression. An explicit test was added for this. 3. Fix `isCanonicalLookupJoinFilter` so that expressions with inequalities and either an equality or `IN` are not used as lookup expressions. The `multiSpanGenerator` does not support these types of expressions, and would panic if it encountered them. This bug has existed for a while, but has been hidden because of the preference for the cross-join strategy. Note that there are several cases where a lookup join contains filters in the `ON` expression that are redundant with the lookup expression. I'm planning on addressing this in a future commit. Note that this commit does not affect inverted joins. Cross-joins can still be generated on inverted join inputs. Release note: None
craig bot
pushed a commit
that referenced
this issue
Apr 12, 2022
79586: opt: do not build cross-joins on inputs of lookup joins r=mgartner a=mgartner #### opt: do not build cross-joins on inputs of lookup joins Rules that generate lookup joins no longer build cross-joins on the input of the lookup join to constrain an index column to a set of constant values. Instead, a lookup expression is used which generates multiple lookup spans during execution. Building a cross-join added extra complexity to the optimizer logic and has been shown to be error prone (see #78681 and #79384). Also, a cross-join could significantly increase the number of input rows, so it was likely less efficient than using a lookup expression. If an index column is constrained to a single constant value, we still project the value in the input of the lookup join so that it can be used as an equality column. This is more performant than using a lookup expression with an equality condition (see #66726). This commit has some other notable changes to the lookup join generation logic: 1. The logic has been reordered. The case when a single constant value is projected is handled before the multiple constant value or range cases. 2. An unnecessary projection is no longer added for single constant values when we revert to using a lookup expression. An explicit test was added for this. 3. Fix `isCanonicalLookupJoinFilter` so that expressions with inequalities and either an equality or `IN` are not used as lookup expressions. The `multiSpanGenerator` does not support these types of expressions, and would panic if it encountered them. This bug has existed for a while, but has been hidden because of the preference for the cross-join strategy. Note that there are several cases where a lookup join contains filters in the `ON` expression that are redundant with the lookup expression. I'm planning on addressing this in a future commit. Note that this commit does not affect inverted joins. Cross-joins can still be generated on inverted join inputs. Release note: None Co-authored-by: Marcus Gartner <[email protected]>
rytaft
added
C-technical-advisory
Caused a technical advisory
branch-release-21.2
Used to mark GA and release blockers, technical advisories, and bugs for 21.2
branch-release-21.1
Used to mark GA and release blockers, technical advisories, and bugs for 21.1
labels
Dec 6, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
branch-release-21.1
Used to mark GA and release blockers, technical advisories, and bugs for 21.1
branch-release-21.2
Used to mark GA and release blockers, technical advisories, and bugs for 21.2
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-technical-advisory
Caused a technical advisory
T-sql-queries
SQL Queries Team
Notice in the query plan below that the cross-join to produce the
lookup_join_const_col_@7
column is unnecessary.Jira issue: CRDB-14777
The text was updated successfully, but these errors were encountered: