opt: avoid choosing index with unreliable stats #64570

RaduBerinde · 2021-05-03T14:48:38Z

A customer ran into a case where they were doing a single row UPSERT. Instead of choosing the primary index (which would scan at most 1 row), the optimizer is choosing a secondary index. The index is chosen because according to the histogram, the relevant value would have no rows. But the stats are stale and the query actually reads through 100k+ rows from the index.

We have discussed augmenting the cost value with an "uncertainty range", which would address this problem (primary index has <=1 row with 100% certainty, the secondary index has expected 0 rows but with no upper bound). This would be a big change; but I believe we can also consider a more targeted fix, e.g. we could give a heavy cost "discount" to scan operators which have a known cardinality bound (or a penalty to scans with no cardinality bound).

Below is an illustration. The t_y_v_idx index could in principle return any number of rows.

> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
> create table t (x int, y int, v int, index (y, v), primary key (x,y));
> insert into t values (1, 1, 1), (2, 2, 2), (3, 3, 3);
> create statistics foo from t;
> explain upsert into t values (10, 10, 10);
                                             info
----------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • upsert
  │ into: t(x, y, v)
  │ auto commit
  │ arbiter indexes: primary
  │
  └── • cross join (left outer)
      │ estimated row count: 1
      │
      ├── • values
      │     size: 3 columns, 1 row
      │
      └── • filter
          │ estimated row count: 1
          │ filter: x = 10
          │
          └── • scan
                estimated row count: 0 (<0.01% of the table; stats collected 32 seconds ago)
                table: t@t_y_v_idx
                spans: [/10 - /10]

The correct plan would be:

[email protected]:26257/defaultdb> explain upsert into t values (1, 1, 1);
                                         info
---------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • upsert
  │ into: t(x, y, v)
  │ auto commit
  │ arbiter indexes: primary
  │
  └── • cross join (left outer)
      │ estimated row count: 1
      │
      ├── • values
      │     size: 3 columns, 1 row
      │
      └── • scan
            estimated row count: 1 (31% of the table; stats collected 15 seconds ago)
            table: t@primary
            spans: [/1/1 - /1/1]
            locking strength: for update

gz#9150

Jira issue: CRDB-7134

Jira issue: CRDB-13889

gz#16142

gz#18109

The text was updated successfully, but these errors were encountered:

mgartner · 2021-05-04T17:57:27Z

Given that statistics are always out of date, and they are an estimate, would it be reasonable to put a limit on how low statistics can reduce an estimated row count? For example, what if statistics could lower the row count estimate to no less than 1; only contradictions are guaranteed to make a row count 0. In this case it might have to be a lower limit of 1 plus some epsilon, so that the cost doesn't match the primary index lookup.

mgartner · 2021-05-04T19:00:11Z

It would be nice to get to this in the 21.2 release. cc @kevin-v-ngo @awoods187 for visibility.

awoods187 · 2021-05-05T14:12:18Z

I'm pro doing this in 21.2 provided the opportunity cost isn't too high. Let's discuss level of effort during the milestone planning meeting

RaduBerinde · 2021-05-05T20:45:40Z

Saw another instance of this in the wild. The use case involved a "status" column and a partial index that restricts the status to "pending". The vast majority of rows have "done" status, with occasionally a few thousand rows showing as "pending". The automatic stats can happen to see zero "pending" rows or see a few thousand of them, depending on timing. When stats show zero rows, the partial index is estimated to be empty and becomes an enticing index for the optimizer. In this case, the reliable plan was to use the PK which guaranteed that we scan at most one row.

This commit adds a new cost function, largeCardinalityRowCountPenalty, which calculates a penalty that should be added to the row count of scans. It is non-zero for expressions with unbounded maximum cardinality or with maximum cardinality exceeding the row count estimate. Adding a few rows worth of cost helps prevent surprising plans for very small tables or for when stats are stale. Fixes cockroachdb#64570 Release note (performance improvement): When choosing between index scans that are estimated to have the same number of rows, the optimizer now prefers indexes for which it has higher certainty about the maximum number of rows over indexes for which there is more uncertainty in the estimated row count. This helps to avoid choosing suboptimal plans for small tables or if the statistics are stale.

66973: ui: surface the transaction restarts chart r=matthewtodd a=matthewtodd Resolves #65856 Release note (ui change): The KV transaction restarts chart was moved from the "distributed" metrics to the "sql" metrics page so as to be close to the SQL transactions chart, for more prominent visibility. 66979: opt: add cost penalty for scans with large cardinality r=rytaft a=rytaft **opt: ensure we prefer a reverse scan to sorting a forward scan** This commit fixes an issue where in some edge cases the optimizer would prefer sorting the output of a forward scan over performing a reverse scan (when there is no need to sort the output of the reverse scan). Release note (performance improvement): The optimizer now prefers performing a reverse scan over a forward scan + sort if the reverse scan eliminates the need for a sort and the plans are otherwise equivalent. This was the case before in most cases, but some edge cases with a small number of rows have been fixed. **opt: add cost penalty for scans with large cardinality** This commit adds a new cost function, `largeCardinalityRowCountPenalty`, which calculates a penalty that should be added to the row count of scans. It is non-zero for expressions with unbounded maximum cardinality or with maximum cardinality exceeding the row count estimate. Adding a few rows worth of cost helps prevent surprising plans for very small tables or for when stats are stale. Fixes #64570 Release note (performance improvement): When choosing between index scans that are estimated to have the same number of rows, the optimizer now prefers indexes for which it has higher certainty about the maximum number of rows over indexes for which there is more uncertainty in the estimated row count. This helps to avoid choosing suboptimal plans for small tables or if the statistics are stale. Co-authored-by: Matthew Todd <[email protected]> Co-authored-by: Rebecca Taft <[email protected]>

This commit adds a new cost function, largeCardinalityRowCountPenalty, which calculates a penalty that should be added to the row count of scans. It is non-zero for expressions with unbounded maximum cardinality or with maximum cardinality exceeding the row count estimate. Adding a few rows worth of cost helps prevent surprising plans for very small tables or for when stats are stale. Fixes cockroachdb#64570 Release note (performance improvement): When choosing between index scans that are estimated to have the same number of rows, the optimizer now prefers indexes for which it has higher certainty about the maximum number of rows over indexes for which there is more uncertainty in the estimated row count. This helps to avoid choosing suboptimal plans for small tables or if the statistics are stale.

rytaft · 2021-07-09T18:10:54Z

Unfortunately it seems like the fix with the cost penalty is not robust enough to handle some variations on this issue. For example, if the alternative plan has cardinality 10, we may not choose it.

This is going to require some more thought, so I'll reopen this issue for now.

mgartner · 2021-07-21T17:28:35Z

Just saw another variation on this. It's similar to the example from Radu above:

Saw another instance of this in the wild. The use case involved a "status" column and a partial index that restricts the status to "pending". The vast majority of rows have "done" status, with occasionally a few thousand rows showing as "pending". The automatic stats can happen to see zero "pending" rows or see a few thousand of them, depending on timing. When stats show zero rows, the partial index is estimated to be empty and becomes an enticing index for the optimizer. In this case, the reliable plan was to use the PK which guaranteed that we scan at most one row.

However, it's slightly different. There is an updated_at column and a FK-like column other_id. The query fetches recently updated rows with a specific other_id, e.g. WHERE other_id = 1234 AND updated_at > now() - '1 hour'. There's a 12-bucket hash sharded index on updated_at and a regular secondary index on other_id.

The histograms for the updated_at column do not include the most recently updated values, so the optimizer prefers to scan the index on updated_at. It estimates that it will only scan 6 rows, but in reality it scans over 100k.

The better plan would be to scan the index on other_id which is estimated to scan only about 150 rows, and in reality only scans 2 rows.

Unfortunately, cardinality cannot help us here because other_id is not unique.

I've run into problems like this running Postgres in the past. Our solution was to make automated stats collections much more aggressive. If a table gets very large, automatic stats collection is very unlikely to run if it is triggered by some % of rows being mutated.

Imagine a table that gets 100k inserts per day. It's been around for 1000 days so it now has 100m rows. With our default sql.stats.automatic_collection.fraction_stale_rows at 0.2, 25m rows need to be inserted to trigger automatic stats (25m stale rows / 125m total rows = 0.2), which won't be for 250 days. That's 25m rows and 250 days worth of values that are missing from histograms.

To make automatic stats much more aggressive, you can set sql.stats.automatic_collection.fraction_stale_rows to 0 and set sql.stats.automatic_collection.min_stale_rows to some number, say 10k. This ensures that no matter how big the table is, you're guaranteed to collect stats for the most recent rows every so often.

Postgres has the ability adjust these stats knobs at the table level. I don't believe we have that ability yet, but it would be useful for this; a user needs the ability to tune these knobs at a per table level based on the workload.

Here's how to tune auto-stats collection in Postgres for a specific table:

ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE t SET (autovacuum_vacuum_threshold = 10000);

Informs cockroachdb#64570 Release note (sql change): Added a new session setting, optimizer_always_use_histograms, which ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale.

98194: opt: add setting to always use histograms to calculate stats r=rytaft a=rytaft Informs #64570 Release note (sql change): Added a new session setting, `optimizer_always_use_histograms`, which ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale. Co-authored-by: Rebecca Taft <[email protected]>

Informs cockroachdb#64570 Release note (sql change): Added a new session setting, optimizer_always_use_histograms, which ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale.

Informs #64570 Release note (sql change): Added a new session setting, optimizer_always_use_histograms, which ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale.

Informs cockroachdb#64570 Release note (sql change): Added a new session setting, optimizer_always_use_histograms, which ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale.

This commit enables the new session setting, optimizer_always_use_histograms, by default. This ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale. Informs cockroachdb#64570 Release note: None

98248: opt: enable optimizer_always_use_histograms by default r=rytaft a=rytaft This commit enables the new session setting, `optimizer_always_use_histograms`, by default. This ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale. Informs #64570 Release note: None Co-authored-by: Rebecca Taft <[email protected]>

lin-crl · 2023-03-23T17:14:36Z

Hi how do we estimate row counts for auto-incremental columns like int as part of PK?
For example PK is (col1 int, col2, int).
I just opened a support case on behalf of customer and support may open an escalation later. Since col1 is auto-incremental, a new value in col1 will always be out of col1 boundary. I wonder if we should use average row counts for col1, instead of 0 in this case. Thank you!

michae2 · 2023-03-23T17:40:41Z

Hi how do we estimate row counts for auto-incremental columns like int as part of PK? For example PK is (col1 int, col2, int). I just opened a support case on behalf of customer and support may open an escalation later. Since col1 is auto-incremental, a new value in col1 will always be out of col1 boundary. I wonder if we should use average row counts for col1, instead of 0 in this case. Thank you!

To help with auto-increment columns, and other columns with predictable histogram changes, we added statistics forecasting in v22.2 (as described in #79872).

(If the problem is on v22.1, one way to check whether forecasting will help is to capture a statement bundle, and then use cockroach debug statement-bundle recreate with a v22.2 binary, and EXPLAIN the query again to see if forecasting is used.)

lin-crl · 2023-03-23T18:21:08Z

This is great to hear! @michae2 we'll validate it

vbabenkoru · 2023-05-25T17:40:52Z

@michae2 Statistics forecasting doesn't work for this unfortunately, if one of the columns doesn't increase consistently.

lin-crl · 2023-05-25T18:39:20Z

@vbabenkoru Could you share a bit more on how the column is used? Perhaps on slack channel so we can discuss specifics on your use case?

vbabenkoru · 2023-05-25T19:20:14Z

@mgartner's proposal would work in our case I believe, since forecasting isn't always possible or reliable, and the default behavior of assuming 0 rows whenever it encounters a new value is the main issue here.

mgartner · 2023-05-30T21:53:48Z

@vbabenkoru You mean the proposal here, correct? I have a prototype of it that I'm hoping to find time to finish soon.

itsbilal · 2024-04-30T19:20:50Z

Ran into this on the DRT drt-chaos cluster, so adding O-testcluster. Internal conversation

ajstorm · 2024-05-02T01:02:22Z

Hit it on drt-large too, which resulted in this duplicate issue.

RaduBerinde added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 3, 2021

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

rytaft self-assigned this Jun 28, 2021

rytaft mentioned this issue Jun 28, 2021

opt: add cost penalty for scans with large cardinality #66979

Merged

craig bot closed this as completed in 215cdcb Jun 28, 2021

rytaft mentioned this issue Jul 8, 2021

release-21.1: opt: add cost penalty for scans with large cardinality #67388

Merged

rytaft reopened this Jul 9, 2021

mgartner closed this as completed Jul 21, 2021

mgartner reopened this Jul 21, 2021

rytaft mentioned this issue Mar 8, 2023

opt: add setting to always use histograms to calculate stats #98194

Merged

This was referenced Mar 8, 2023

release-22.1: opt: add setting to always use histograms to calculate stats #98229

Merged

release-22.2: opt: add setting to always use histograms to calculate stats #98230

Merged

rytaft mentioned this issue Mar 8, 2023

opt: enable optimizer_always_use_histograms by default #98248

Merged

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023

mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Feb 1, 2024

mgartner mentioned this issue Apr 30, 2024

queries: unexplained plan oscillation when running MR tpcc #123157

Open

itsbilal added the O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster label Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: avoid choosing index with unreliable stats #64570

opt: avoid choosing index with unreliable stats #64570

RaduBerinde commented May 3, 2021 •

edited by DingDLu

Loading

mgartner commented May 4, 2021

mgartner commented May 4, 2021

awoods187 commented May 5, 2021

RaduBerinde commented May 5, 2021

rytaft commented Jul 9, 2021

mgartner commented Jul 21, 2021 •

edited

Loading

lin-crl commented Mar 23, 2023 •

edited

Loading

michae2 commented Mar 23, 2023

lin-crl commented Mar 23, 2023

vbabenkoru commented May 25, 2023 •

edited

Loading

lin-crl commented May 25, 2023

vbabenkoru commented May 25, 2023

mgartner commented May 30, 2023

itsbilal commented Apr 30, 2024

ajstorm commented May 2, 2024

opt: avoid choosing index with unreliable stats #64570

opt: avoid choosing index with unreliable stats #64570

Comments

RaduBerinde commented May 3, 2021 • edited by DingDLu Loading

mgartner commented May 4, 2021

mgartner commented May 4, 2021

awoods187 commented May 5, 2021

RaduBerinde commented May 5, 2021

rytaft commented Jul 9, 2021

mgartner commented Jul 21, 2021 • edited Loading

lin-crl commented Mar 23, 2023 • edited Loading

michae2 commented Mar 23, 2023

lin-crl commented Mar 23, 2023

vbabenkoru commented May 25, 2023 • edited Loading

lin-crl commented May 25, 2023

vbabenkoru commented May 25, 2023

mgartner commented May 30, 2023

itsbilal commented Apr 30, 2024

ajstorm commented May 2, 2024

RaduBerinde commented May 3, 2021 •

edited by DingDLu

Loading

mgartner commented Jul 21, 2021 •

edited

Loading

lin-crl commented Mar 23, 2023 •

edited

Loading

vbabenkoru commented May 25, 2023 •

edited

Loading