Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
58358: opt: prune partial index columns and simplify partial index projections r=rytaft a=mgartner Fixes cockroachdb#51623 #### opt: do not derive prune columns for Upsert, Update, Delete We no longer derive output prune columns for Upsert, Update, and Delete ops in `DerivePruneCols`. There are no PruneCols rules for these operators, so deriving their prune columns was only performing unnecessary work. There are other rules that prune the fetch and return columns for these operators. These rules do not rely on `DerivePruneCols`. Release note: None #### sql: remove logic to determine fetch cols in row updater Previously, the `row.MakeUpdater` function had logic to determine the fetch columns required for an update operation. This is not necessary because the cost based optimizer already determines the necessary fetch columns and plumbs them to `MakeUpdater` as the `requestedCols` argument. Release note: None #### opt: safer access to partial index predicates in TableMeta Previously, partial index predicate expressions in TableMeta were the source-of-truth used within the optimizer to determine if an index is a partial index. However, partial index predicates are not added to TableMeta for all types of statements in optbuilder. Therefore, it was not safe to assume this was a source-of-truth. This commit unexports the map of partial index predicates in TableMeta. Access to partial index predicates must now be done via `TableMeta.PartialIndexPredicate`. This function checks the catalog to determine if an index is a partial index, and panics if there is not a corresponding predicate expression in the partial index predicate map. This makes the function an actual a source-of-truth. Release note: None #### opt: move addPartialIndexPredicatesForTable to optbuilder/partial_index.go Release note: None #### opt: prune update/upsert fetch columns not needed for partial indexes Indexed columns of partial indexes are now only fetched for UPDATE and UPSERT operations when needed. They are pruned in cases where it is guaranteed that they are not needed to build old or new index entries. For example, consider the table and UPDATE: CREATE TABLE t ( a INT PRIMARY KEY, b INT, c INT, d INT, INDEX (b) WHERE c > 0, FAMILY (a), FAMILY (b), FAMILY (c), FAMILY (d) ) UPDATE t SET d = d + 1 WHERE a = 1 The partial index is guaranteed not to change with this UPDATE because neither its indexed columns not the columns referenced in its predicate are mutating. Therefore, the existing values of b do not need to be fetched to maintain the state of the partial index. Furthermore, the primary index does require the existing values of b because no columns in b's family are mutating. So, b can be pruned from the UPDATE's fetch columns. Release note (performance improvement): Previously, indexed columns of partial indexes were always fetched for UPDATEs and UPSERTs. Now they are only fetched if they are required for maintaining the state of the index. If an UPDATE or UPSERT mutates columns that are neither indexed by a partial index nor referenced in a partial index predicate, they will no longer be fetched (assuming that they are not needed to maintain the state of other indexes, including the primary index). #### opt: normalize partial index PUT/DEL projections to false The `SimplifyPartialIndexProjections` normalization rule has been added that normalizes synthesized partial index PUT and DEL columns to False when it is guaranteed that a mutation will not require changed to the associated partial index. This normalization can lead to further normalizations, such as pruning columns that the synthesized projections relied on. The motivation for this change is to allow fully disjoint updates to different columns in the same row, when the columns are split across different families. By pruning columns not needed to maintain a partial index, we're not forced to scan all column families. This can ultimately reduce contention during updates. Release note (performance improvement): UPDATE and UPSERT operations on tables with partial indexes no longer evaluate partial index predicate expressions when it is guaranteed that the operation will not alter the state of the partial index. In some cases, this can eliminate fetching the existing value of columns that are referenced in partial index predicates. 58373: streamingccl: add ingestion job framework r=pbardea a=adityamaru This change introduces a new StreamIngestionJob. It does not do much more than laying out the general outline of the job, which is very similar to other bulk jobs such as changefeed, backup etc. More precisely: - Introduces StreamIngestionDetails job details proto - Hooks up the dependency to a mock stream client - Introduces a StreamIngestionProcessorSpec - Sets up a simple DistSQL flow which round-robin assigns the partitions to the processors. Most notable TODOs in job land which will be addressed in follow up PRs: - StreamIngestionPlanHook to create this job. It Will involve figuring out SQL syntax. - Introducing a ts watermark in both the job and processors. This watermark will represent the lowest resolved ts which all processors have ingested till. Iron out semantics on job start and resumption. - Introducing a StreamIngestionFrontier processor which will slurp the results from the StreamIngestionProcessors, and use them to keep track of the minimum resolved ts across all processors. Fixes: cockroachdb#57399 Release note: None 58477: opt: prevent columns reuse in Union and UnionAll r=rytaft a=mgartner #### opt: fix columns in SplitScanIntoUnionScans constraint This commit fixes a minor bug in `SplitScanIntoUnionScans` that resulted in a scan's constraint containing columns not associated with the scan. This did not affect the correctness of results. However it appears that it did cause inaccurate stats calculations; I had to add histogram buckets to the tests to coerce the optimizer into choosing the same plan for the corresponding test. Release note: None #### opt: do not reuse columns for Unions in SplitScanIntoUnionScans Unions generated in SplitScanIntoUnionScans no longer reuse column IDs from their left children as output column IDs. Reusing column IDs in this way has shown to be dangerous (see cockroachdb#58434). Release note: None #### opt: add Union column ID check to CheckExpr A check has been added to `CheckExpr` that asserts that the output columns of `Union`s and `UnionAll`s are not reused from the left or right inputs of the union. Reusing columns in this way is dangerous (see cockroachdb#58434). Release note: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Aditya Maru <[email protected]>
- Loading branch information