-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt,sql: ensure validation of implicitly-partitioned partial unique indexes is efficient #65473
Labels
A-multiregion
Related to multi-region
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-multiregion
T-sql-queries
SQL Queries Team
Comments
rytaft
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
A-multiregion
Related to multi-region
T-multiregion
T-sql-queries
SQL Queries Team
labels
May 19, 2021
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Jun 16, 2021
This commit adds a new exploration rule, EliminateIndexJoinInsideGroupBy. EliminateIndexJoinInsideGroupBy removes an IndexJoin operator if it can be proven that the removal does not affect the output of the parent grouping operator. This is the case if: 1. Only columns from the index join's input are being used by the grouping operator. 2. The OrderingChoice of the grouping operator can be expressed with only columns from the index join's input. Or in other words, at least one column in every ordering group is one of the output columns from the index join's input. This rule is useful when using partial indexes. When generating partial index scans, expressions can be removed from filters because they exactly match expressions in partial index predicates and there is no need to apply the filter after the scan. Columns referenced in the removed expressions may no longer need to be fetched. Consider the example: CREATE TABLE t (i INT, s STRING, INDEX (i) WHERE s IN ('foo','bar')) SELECT DISTINCT i FROM t WHERE s IN ('foo','bar') The normalized expression for the SELECT query is: distinct-on ├── columns: i:1 ├── grouping columns: i:1 └── select ├── columns: i:1 s:2!null ├── scan t │ └── columns: i:1 s:2 └── filters └── s:2 IN ('foo','bar') GeneratePartialIndexScans will generate this expression: distinct-on ├── columns: i:1 ├── grouping columns: i:1 └── index-join t ├── columns: i:1 s:2!null └── scan t@secondary,partial └── columns: i:1 rowid:4!null The IndexJoin is created because the Select expression in the previous expression required s in order to apply the (s IN ('foo','bar')) filter. However, because rows in the partial index are already filtered by (s IN ('foo','bar')), column s does not need to be fetched. The IndexJoin can be eliminated, resulting in the expression: distinct-on ├── columns: i:1 ├── grouping columns: i:1 └── scan t@secondary,partial └── columns: i:1 rowid:4!null This rule is needed to ensure that SplitGroupByScanIntoUnionScans can fire in order to create an efficient plan to validate new implicitly partitioned partial unique indexes. Fixes cockroachdb#65473 Release note (performance improvement): Validation of a new partial UNIQUE index in a REGIONAL BY ROW table no longer requires an inefficient and memory-intensive hash aggregation query. The optimizer can now plan the validation query so that it uses all streaming operations, which are much more efficient.
craig bot
pushed a commit
that referenced
this issue
Jun 24, 2021
66565: opt: eliminate unnecessary index join inside group by r=rytaft a=rytaft This commit adds a new exploration rule, `EliminateIndexJoinInsideGroupBy`. `EliminateIndexJoinInsideGroupBy` removes an `IndexJoin` operator if it can be proven that the removal does not affect the output of the parent grouping operator. This is the case if: 1. Only columns from the index join's input are being used by the grouping operator. 2. The `OrderingChoice` of the grouping operator can be expressed with only columns from the index join's input. Or in other words, at least one column in every ordering group is one of the output columns from the index join's input. This rule is useful when using partial indexes. When generating partial index scans, expressions can be removed from filters because they exactly match expressions in partial index predicates and there is no need to apply the filter after the scan. Columns referenced in the removed expressions may no longer need to be fetched. Consider the example: ``` CREATE TABLE t (i INT, s STRING, INDEX (i) WHERE s IN ('foo','bar')) SELECT DISTINCT i FROM t WHERE s IN ('foo','bar') ``` The normalized expression for the `SELECT` query is: ``` distinct-on ├── columns: i:1 ├── grouping columns: i:1 └── select ├── columns: i:1 s:2!null ├── scan t │ └── columns: i:1 s:2 └── filters └── s:2 IN ('foo','bar') ``` `GeneratePartialIndexScans` will generate this expression: ``` distinct-on ├── columns: i:1 ├── grouping columns: i:1 └── index-join t ├── columns: i:1 s:2!null └── scan t@secondary,partial └── columns: i:1 rowid:4!null ``` The `IndexJoin` is created because the `Select` expression in the previous expression required `s` in order to apply the `(s IN ('foo','bar'))` filter. However, because rows in the partial index are already filtered by `(s IN ('foo','bar'))`, column s does not need to be fetched. The `IndexJoin` can be eliminated, resulting in the expression: ``` distinct-on ├── columns: i:1 ├── grouping columns: i:1 └── scan t@secondary,partial └── columns: i:1 rowid:4!null ``` This rule is needed to ensure that `SplitGroupByScanIntoUnionScans` can fire in order to create an efficient plan to validate new implicitly partitioned partial unique indexes. Fixes #65473 Release note (performance improvement): Validation of a new partial `UNIQUE` index in a `REGIONAL BY ROW` table no longer requires an inefficient and memory-intensive hash aggregation query. The optimizer can now plan the validation query so that it uses all streaming operations, which are much more efficient. 66840: backupccl: fix error during span-merging optimization r=pbardea a=pbardea This commit fixes a bug where a backup may crash if a full backup with revision history is taken on a table that has index-merging opportunities on descriptor revisions that lie before the GC TTL of the table. Since the merging of index-spans during backup planning is entirely an optimization (in an effort to reduce the number of spans we place a protectedts over), it is safe to ignore such errors and treat it as un-mergable. Fixes #66797. Release note (bug fix): Fix an error that backup would produce in some full backups with revision history. Previously some full backups would erroneously produce an error in the form of `batch timestamp <some_timestamp> must be after replica GC threshold <some_other_timestamp>`. Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Paul Bardea <[email protected]>
rytaft
added a commit
to rytaft/cockroach
that referenced
this issue
Jul 6, 2021
This commit updates the exploration rule EliminateIndexJoinInsideGroupBy and renames it to EliminateIndexJoinOrProjectInsideGroupBy. The rule now removes either an IndexJoin or Project operator if it can be proven that the removal does not affect the output of the parent grouping operator. Removal of a Project is needed in cases where the partial index predicate constrains some columns to be constant, and therefore provides those columns as constant projections. If the projected columns are not actually needed by the GroupBy, however, the Project is not necessary and interferes with other rules matching, such as SplitGroupByScanIntoUnionScans. Informs cockroachdb#65473 Release note (performance improvement): Improved the efficiency of validation for some partial unique indexes in REGIONAL BY ROW tables by improving the query plan to use all streaming operations.
craig bot
pushed a commit
that referenced
this issue
Jul 6, 2021
66989: roachtest: check metrics during multi-region TPC-C runs r=ajstorm a=otan See individual commits for details. Resolves #65911 Resolves #66953 67263: opt,ccl: eliminate Project inside GroupBy, add tests for validation of partial unique indexes r=rytaft a=rytaft **opt: add a test-only check for `NullsAreDistinct` in `GroupingPrivate`** This commit adds a check that `NullsAreDistinct` is true iff the grouping operator is `UpsertDistinctOn` or `EnsureUpsertDistinctOn`. Release note: None **opt: add exploration rule to eliminate `Project` inside `GroupBy`** This commit updates the exploration rule `EliminateIndexJoinInsideGroupBy` and renames it to `EliminateIndexJoinOrProjectInsideGroupBy`. The rule now removes either an `IndexJoin` or `Project` operator if it can be proven that the removal does not affect the output of the parent grouping operator. Removal of a `Project` is needed in cases where the partial index predicate constrains some columns to be constant, and therefore provides those columns as constant projections. If the projected columns are not actually needed by the `GroupBy`, however, the `Project` is not necessary and interferes with other rules matching, such as `SplitGroupByScanIntoUnionScans`. Informs #65473 Release note (performance improvement): Improved the efficiency of validation for some partial unique indexes in `REGIONAL BY ROW` tables by improving the query plan to use all streaming operations. **ccl: add tests for validation of partial unique indexes** This commit adds tests to ensure that the expected query plan is used to validate new partial unique indexes in `REGIONAL BY ROW` tables. Informs #65473 Release note: None Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Rebecca Taft <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-multiregion
Related to multi-region
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-multiregion
T-sql-queries
SQL Queries Team
#65355 ensured that validation of implicitly partitioned full unique indexes is efficient. However, it did not improve efficiency for partial indexes. For example, validating the index
UNIQUE (a) WHERE b > 0
results in this plan, which uses a hash aggregate:Epic CRDB-2528
The text was updated successfully, but these errors were encountered: