Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: remove unnecessary index joins in partial index validation queries #70116

Closed
mgartner opened this issue Sep 13, 2021 · 0 comments · Fixed by #70120
Closed

sql: remove unnecessary index joins in partial index validation queries #70116

mgartner opened this issue Sep 13, 2021 · 0 comments · Fixed by #70120
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

A specific case of #54588 which is particularly impactful is when unnecessary index joins occur for the partial index validation query performed when adding a new partial index. This type of query results in an index join operating on every row in the partial index.

For example:

defaultdb> CREATE TABLE t (a INT, b INT, INDEX a_idx (a) WHERE b > 0);
CREATE TABLE

defaultdb> SELECT count(1) FROM t@a_idx WHERE b > 0;
  count
---------
      0
(1 row)

defaultdb> EXPLAIN SELECT count(1) FROM t@a_idx WHERE b > 0;
                    info
--------------------------------------------
  distribution: full
  vectorized: true

  • group (scalar)
  │
  └── • index join
      │ table: t@primary
      │
      └── • scan
            missing stats
            table: t@a_idx (partial index)
            spans: FULL SCAN
(12 rows)

We should add an exploration rule to eliminate these type of index joins so that adding a partial index to a large table is not significantly more expensive than adding a non-partial index.

The workaround for now is to create a partial index that stores all the columns that are referenced in its predicate.

@mgartner mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-quick-win Likely to be a quick win for someone experienced. labels Sep 13, 2021
@mgartner mgartner self-assigned this Sep 13, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 13, 2021
craig bot pushed a commit that referenced this issue Sep 14, 2021
67737: opt: zigzag scan hints r=cucaroach a=cucaroach

Fixes #35814

Enable table@{FORCE_ZIGZAG} and table@{FORCE_ZIGZAG=index} hints to lower the cost
of zigzag plans to that they will be preferred over other plans.

Note that this is a breaking change, if customer has an index called
FORCE_ZIGZAG and is hinting with table@{FORCE_ZIGZAG} that will no longer do what
it did before. Not sure what the ramifications of that are.

Release note (sql change): Extend index scan hint to allow zigzag joins to be preferred.


69887: kv,migration: rm code handling legacy raft truncated state r=irfansharif a=irfansharif

Fixes #66544. 
Fixes #66834.

We fully migrated away from the replicated truncated state
and the old range applied state keys (RaftAppliedIndexLegacyKey,
LeaseAppliedIndexLegacyKey, and RangeStatsLegacyKey) in #58088. We're
now guaranteed to never see the legacy truncated+applied state
representations, making this code (finally) safe to delete.

Release justification: cluster version cleanup
Release note: None

---

This turned out to be pretty hairy and invasive. I'll wait for a few CI
runs and inspect the diffs here a bit more closely. Given the surface
area here, I'm curious if we feel this is worth getting in during
stability. Some questions to consider:

- The various legacy keys that were removed were removed in their
  entirety. Our CLI to scan over store keys will no longer recognize these
  keys. Is that ok?
- I don't think we need any checks to make sure we're not re-using some
  of these keys in the future -- but I'd like other opinions.
- Looks like we re-purposed the LocalRaftTruncatedStateLegacySuffix for
  the unreplicated truncated key as well, so just renamed it more
  appropriately.
- v21.1 nodes have to interop with v20.2 nodes where the legacy
  representation was possible, and it identified such cases through
  booleans in proto messages (SnapshotRequest.Header.UseUnreplicatedTruncatedState
  for example). This means that we still need to explicitly set them in
  v21.2 code. Crucially we'll stop reading these fields in v21.2 code,
  letting us remove these fields entirely in v22.1.
- The treatment above does not extend to ReplicaState.UsingAppliedStateKey;
  that field was only being used to trigger an idempotent migration to
  start using the newer applied state key representation, not as a way
  to express that the replica should expect the legacy representation.
  Since [#58088](#58088) has already migrated every range in the system, we no
  longer need to trigger anything, and this field can be removed.

70059: roachtest: make --workload optional r=mgartner a=mgartner

This commit makes the `--workload` flag optional because some roachtests
do not require a workload binary.

Release note: None

70120: opt: match ScalarGroupBy in EliminateIndexJoinOrProjectInsideGroupBy r=mgartner a=mgartner

This commit extends EliminateIndexJoinOrProjectInsideGroupBy so that
ScalarGroupBy expressions are also matched. This allows the rule to
eliminate unnecessary index joins in more cases.

The primary motivation for this change was to make partial index
validation queries more efficient. These queries always have
ScalarGroupBy expressions because they are in the form:

    SELECT count(1) FROM table@partial_index WHERE predicate

Prior to this change, an index join was planned for these queries which
would operate on every row in the partial index. This could be extremely
expensive for large partial indexes.

Fixes #70116

Release note (performance improvement): A limitation has been fixed that
made creating partial indexes inefficient.

70162: ui: fix default behaviours of columns on stmt page on cc console r=maryliag a=maryliag

When a CC Console user open the Statement page for the first time
(no cache was created for column selector), this commits make sure
that the default columns will be displayed.

Fixes: #70160

Release justification: Category 4
Release note (bug fix): Default columns being displayed on Statements
page on CC console when the user never made any selection

Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
@craig craig bot closed this as completed in 2751851 Sep 14, 2021
blathers-crl bot pushed a commit that referenced this issue Sep 14, 2021
This commit extends EliminateIndexJoinOrProjectInsideGroupBy so that
ScalarGroupBy expressions are also matched. This allows the rule to
eliminate unnecessary index joins in more cases.

The primary motivation for this change was to make partial index
validation queries more efficient. These queries always have
ScalarGroupBy expressions because they are in the form:

    SELECT count(1) FROM table@partial_index WHERE predicate

Prior to this change, an index join was planned for these queries which
would operate on every row in the partial index. This could be extremely
expensive for large partial indexes.

Fixes #70116

Release note (performance improvement): A limitation has been fixed that
made creating partial indexes inefficient.
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant