sql: remove unnecessary index joins in partial index validation queries #70116

mgartner · 2021-09-13T13:25:27Z

A specific case of #54588 which is particularly impactful is when unnecessary index joins occur for the partial index validation query performed when adding a new partial index. This type of query results in an index join operating on every row in the partial index.

For example:

defaultdb> CREATE TABLE t (a INT, b INT, INDEX a_idx (a) WHERE b > 0);
CREATE TABLE

defaultdb> SELECT count(1) FROM t@a_idx WHERE b > 0;
  count
---------
      0
(1 row)

defaultdb> EXPLAIN SELECT count(1) FROM t@a_idx WHERE b > 0;
                    info
--------------------------------------------
  distribution: full
  vectorized: true

  • group (scalar)
  │
  └── • index join
      │ table: t@primary
      │
      └── • scan
            missing stats
            table: t@a_idx (partial index)
            spans: FULL SCAN
(12 rows)

We should add an exploration rule to eliminate these type of index joins so that adding a partial index to a large table is not significantly more expensive than adding a non-partial index.

The workaround for now is to create a partial index that stores all the columns that are referenced in its predicate.

The text was updated successfully, but these errors were encountered:

67737: opt: zigzag scan hints r=cucaroach a=cucaroach Fixes #35814 Enable table@{FORCE_ZIGZAG} and table@{FORCE_ZIGZAG=index} hints to lower the cost of zigzag plans to that they will be preferred over other plans. Note that this is a breaking change, if customer has an index called FORCE_ZIGZAG and is hinting with table@{FORCE_ZIGZAG} that will no longer do what it did before. Not sure what the ramifications of that are. Release note (sql change): Extend index scan hint to allow zigzag joins to be preferred. 69887: kv,migration: rm code handling legacy raft truncated state r=irfansharif a=irfansharif Fixes #66544. Fixes #66834. We fully migrated away from the replicated truncated state and the old range applied state keys (RaftAppliedIndexLegacyKey, LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in #58088. We're now guaranteed to never see the legacy truncated+applied state representations, making this code (finally) safe to delete. Release justification: cluster version cleanup Release note: None --- This turned out to be pretty hairy and invasive. I'll wait for a few CI runs and inspect the diffs here a bit more closely. Given the surface area here, I'm curious if we feel this is worth getting in during stability. Some questions to consider: - The various legacy keys that were removed were removed in their entirety. Our CLI to scan over store keys will no longer recognize these keys. Is that ok? - I don't think we need any checks to make sure we're not re-using some of these keys in the future -- but I'd like other opinions. - Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for the unreplicated truncated key as well, so just renamed it more appropriately. - v21.1 nodes have to interop with v20.2 nodes where the legacy representation was possible, and it identified such cases through booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState for example). This means that we still need to explicitly set them in v21.2 code. Crucially we'll stop reading these fields in v21.2 code, letting us remove these fields entirely in v22.1. - The treatment above does not extend to ReplicaState.UsingAppliedStateKey; that field was only being used to trigger an idempotent migration to start using the newer applied state key representation, not as a way to express that the replica should expect the legacy representation. Since [#58088](#58088) has already migrated every range in the system, we no longer need to trigger anything, and this field can be removed. 70059: roachtest: make --workload optional r=mgartner a=mgartner This commit makes the `--workload` flag optional because some roachtests do not require a workload binary. Release note: None 70120: opt: match ScalarGroupBy in EliminateIndexJoinOrProjectInsideGroupBy r=mgartner a=mgartner This commit extends EliminateIndexJoinOrProjectInsideGroupBy so that ScalarGroupBy expressions are also matched. This allows the rule to eliminate unnecessary index joins in more cases. The primary motivation for this change was to make partial index validation queries more efficient. These queries always have ScalarGroupBy expressions because they are in the form: SELECT count(1) FROM table@partial_index WHERE predicate Prior to this change, an index join was planned for these queries which would operate on every row in the partial index. This could be extremely expensive for large partial indexes. Fixes #70116 Release note (performance improvement): A limitation has been fixed that made creating partial indexes inefficient. 70162: ui: fix default behaviours of columns on stmt page on cc console r=maryliag a=maryliag When a CC Console user open the Statement page for the first time (no cache was created for column selector), this commits make sure that the default columns will be displayed. Fixes: #70160 Release justification: Category 4 Release note (bug fix): Default columns being displayed on Statements page on CC console when the user never made any selection Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]>

This commit extends EliminateIndexJoinOrProjectInsideGroupBy so that ScalarGroupBy expressions are also matched. This allows the rule to eliminate unnecessary index joins in more cases. The primary motivation for this change was to make partial index validation queries more efficient. These queries always have ScalarGroupBy expressions because they are in the form: SELECT count(1) FROM table@partial_index WHERE predicate Prior to this change, an index join was planned for these queries which would operate on every row in the partial index. This could be extremely expensive for large partial indexes. Fixes #70116 Release note (performance improvement): A limitation has been fixed that made creating partial indexes inefficient.

mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-quick-win Likely to be a quick win for someone experienced. labels Sep 13, 2021

mgartner self-assigned this Sep 13, 2021

blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 13, 2021

mgartner mentioned this issue Sep 13, 2021

opt: match ScalarGroupBy in EliminateIndexJoinOrProjectInsideGroupBy #70120

Merged

craig bot closed this as completed in 2751851 Sep 14, 2021

blathers-crl bot mentioned this issue Sep 14, 2021

release-21.2: opt: match ScalarGroupBy in EliminateIndexJoinOrProjectInsideGroupBy #70205

Merged

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: remove unnecessary index joins in partial index validation queries #70116

sql: remove unnecessary index joins in partial index validation queries #70116

mgartner commented Sep 13, 2021

sql: remove unnecessary index joins in partial index validation queries #70116

sql: remove unnecessary index joins in partial index validation queries #70116

Comments

mgartner commented Sep 13, 2021