Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: avoid choosing index with unreliable stats #64570

Open
RaduBerinde opened this issue May 3, 2021 · 39 comments · Fixed by #66979
Open

opt: avoid choosing index with unreliable stats #64570

RaduBerinde opened this issue May 3, 2021 · 39 comments · Fixed by #66979
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster T-sql-queries SQL Queries Team

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented May 3, 2021

A customer ran into a case where they were doing a single row UPSERT. Instead of choosing the primary index (which would scan at most 1 row), the optimizer is choosing a secondary index. The index is chosen because according to the histogram, the relevant value would have no rows. But the stats are stale and the query actually reads through 100k+ rows from the index.

We have discussed augmenting the cost value with an "uncertainty range", which would address this problem (primary index has <=1 row with 100% certainty, the secondary index has expected 0 rows but with no upper bound). This would be a big change; but I believe we can also consider a more targeted fix, e.g. we could give a heavy cost "discount" to scan operators which have a known cardinality bound (or a penalty to scans with no cardinality bound).

Below is an illustration. The t_y_v_idx index could in principle return any number of rows.

> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
> create table t (x int, y int, v int, index (y, v), primary key (x,y));
> insert into t values (1, 1, 1), (2, 2, 2), (3, 3, 3);
> create statistics foo from t;
> explain upsert into t values (10, 10, 10);
                                             info
----------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • upsert
  │ into: t(x, y, v)
  │ auto commit
  │ arbiter indexes: primary
  │
  └── • cross join (left outer)
      │ estimated row count: 1
      │
      ├── • values
      │     size: 3 columns, 1 row
      │
      └── • filter
          │ estimated row count: 1
          │ filter: x = 10
          │
          └── • scan
                estimated row count: 0 (<0.01% of the table; stats collected 32 seconds ago)
                table: t@t_y_v_idx
                spans: [/10 - /10]

The correct plan would be:

[email protected]:26257/defaultdb> explain upsert into t values (1, 1, 1);
                                         info
---------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • upsert
  │ into: t(x, y, v)
  │ auto commit
  │ arbiter indexes: primary
  │
  └── • cross join (left outer)
      │ estimated row count: 1
      │
      ├── • values
      │     size: 3 columns, 1 row
      │
      └── • scan
            estimated row count: 1 (31% of the table; stats collected 15 seconds ago)
            table: t@primary
            spans: [/1/1 - /1/1]
            locking strength: for update

gz#9150

Jira issue: CRDB-7134

Jira issue: CRDB-13889

gz#16142

gz#18109

@RaduBerinde RaduBerinde added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 3, 2021
@mgartner
Copy link
Collaborator

mgartner commented May 4, 2021

Given that statistics are always out of date, and they are an estimate, would it be reasonable to put a limit on how low statistics can reduce an estimated row count? For example, what if statistics could lower the row count estimate to no less than 1; only contradictions are guaranteed to make a row count 0. In this case it might have to be a lower limit of 1 plus some epsilon, so that the cost doesn't match the primary index lookup.

@mgartner
Copy link
Collaborator

mgartner commented May 4, 2021

It would be nice to get to this in the 21.2 release. cc @kevin-v-ngo @awoods187 for visibility.

@awoods187
Copy link
Contributor

I'm pro doing this in 21.2 provided the opportunity cost isn't too high. Let's discuss level of effort during the milestone planning meeting

@RaduBerinde
Copy link
Member Author

Saw another instance of this in the wild. The use case involved a "status" column and a partial index that restricts the status to "pending". The vast majority of rows have "done" status, with occasionally a few thousand rows showing as "pending". The automatic stats can happen to see zero "pending" rows or see a few thousand of them, depending on timing. When stats show zero rows, the partial index is estimated to be empty and becomes an enticing index for the optimizer. In this case, the reliable plan was to use the PK which guaranteed that we scan at most one row.

@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@rytaft rytaft self-assigned this Jun 28, 2021
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 28, 2021
This commit adds a new cost function, largeCardinalityRowCountPenalty,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes cockroachdb#64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 28, 2021
This commit adds a new cost function, largeCardinalityRowCountPenalty,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes cockroachdb#64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.
craig bot pushed a commit that referenced this issue Jun 28, 2021
66973: ui: surface the transaction restarts chart r=matthewtodd a=matthewtodd

Resolves #65856
    
Release note (ui change): The KV transaction restarts chart was moved from the "distributed" metrics to the "sql" metrics page so as to be close to the SQL transactions chart, for more prominent visibility.

66979: opt: add cost penalty for scans with large cardinality r=rytaft a=rytaft

**opt: ensure we prefer a reverse scan to sorting a forward scan**

This commit fixes an issue where in some edge cases the optimizer would
prefer sorting the output of a forward scan over performing a reverse scan
(when there is no need to sort the output of the reverse scan).

Release note (performance improvement): The optimizer now prefers
performing a reverse scan over a forward scan + sort if the reverse
scan eliminates the need for a sort and the plans are otherwise
equivalent. This was the case before in most cases, but some edge
cases with a small number of rows have been fixed.

**opt: add cost penalty for scans with large cardinality**

This commit adds a new cost function, `largeCardinalityRowCountPenalty`,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes #64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.

Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
@craig craig bot closed this as completed in 215cdcb Jun 28, 2021
rytaft added a commit to rytaft/cockroach that referenced this issue Jul 8, 2021
This commit adds a new cost function, largeCardinalityRowCountPenalty,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes cockroachdb#64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.
@rytaft
Copy link
Collaborator

rytaft commented Jul 9, 2021

Unfortunately it seems like the fix with the cost penalty is not robust enough to handle some variations on this issue. For example, if the alternative plan has cardinality 10, we may not choose it.

This is going to require some more thought, so I'll reopen this issue for now.

@rytaft rytaft reopened this Jul 9, 2021
@mgartner
Copy link
Collaborator

mgartner commented Jul 21, 2021

Just saw another variation on this. It's similar to the example from Radu above:

Saw another instance of this in the wild. The use case involved a "status" column and a partial index that restricts the status to "pending". The vast majority of rows have "done" status, with occasionally a few thousand rows showing as "pending". The automatic stats can happen to see zero "pending" rows or see a few thousand of them, depending on timing. When stats show zero rows, the partial index is estimated to be empty and becomes an enticing index for the optimizer. In this case, the reliable plan was to use the PK which guaranteed that we scan at most one row.

However, it's slightly different. There is an updated_at column and a FK-like column other_id. The query fetches recently updated rows with a specific other_id, e.g. WHERE other_id = 1234 AND updated_at > now() - '1 hour'. There's a 12-bucket hash sharded index on updated_at and a regular secondary index on other_id.

The histograms for the updated_at column do not include the most recently updated values, so the optimizer prefers to scan the index on updated_at. It estimates that it will only scan 6 rows, but in reality it scans over 100k.

The better plan would be to scan the index on other_id which is estimated to scan only about 150 rows, and in reality only scans 2 rows.

Unfortunately, cardinality cannot help us here because other_id is not unique.

I've run into problems like this running Postgres in the past. Our solution was to make automated stats collections much more aggressive. If a table gets very large, automatic stats collection is very unlikely to run if it is triggered by some % of rows being mutated.

Imagine a table that gets 100k inserts per day. It's been around for 1000 days so it now has 100m rows. With our default sql.stats.automatic_collection.fraction_stale_rows at 0.2, 25m rows need to be inserted to trigger automatic stats (25m stale rows / 125m total rows = 0.2), which won't be for 250 days. That's 25m rows and 250 days worth of values that are missing from histograms.

To make automatic stats much more aggressive, you can set sql.stats.automatic_collection.fraction_stale_rows to 0 and set sql.stats.automatic_collection.min_stale_rows to some number, say 10k. This ensures that no matter how big the table is, you're guaranteed to collect stats for the most recent rows every so often.

Postgres has the ability adjust these stats knobs at the table level. I don't believe we have that ability yet, but it would be useful for this; a user needs the ability to tune these knobs at a per table level based on the workload.

Here's how to tune auto-stats collection in Postgres for a specific table:

ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE t SET (autovacuum_vacuum_threshold = 10000);

@mgartner mgartner reopened this Jul 21, 2021
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 8, 2023
Informs cockroachdb#64570

Release note (sql change): Added a new session setting,
optimizer_always_use_histograms, which ensures that the optimizer
always uses histograms when available to calculate the statistics
of every plan that it explores. Enabling this setting can prevent
the optimizer from choosing a suboptimal index when statistics for
a table are stale.
craig bot pushed a commit that referenced this issue Mar 8, 2023
98194: opt: add setting to always use histograms to calculate stats r=rytaft a=rytaft

Informs #64570

Release note (sql change): Added a new session setting, `optimizer_always_use_histograms`, which ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale.

Co-authored-by: Rebecca Taft <[email protected]>
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 8, 2023
Informs cockroachdb#64570

Release note (sql change): Added a new session setting,
optimizer_always_use_histograms, which ensures that the optimizer
always uses histograms when available to calculate the statistics
of every plan that it explores. Enabling this setting can prevent
the optimizer from choosing a suboptimal index when statistics for
a table are stale.
rytaft added a commit that referenced this issue Mar 8, 2023
Informs #64570

Release note (sql change): Added a new session setting,
optimizer_always_use_histograms, which ensures that the optimizer
always uses histograms when available to calculate the statistics
of every plan that it explores. Enabling this setting can prevent
the optimizer from choosing a suboptimal index when statistics for
a table are stale.
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 8, 2023
Informs cockroachdb#64570

Release note (sql change): Added a new session setting,
optimizer_always_use_histograms, which ensures that the optimizer
always uses histograms when available to calculate the statistics
of every plan that it explores. Enabling this setting can prevent
the optimizer from choosing a suboptimal index when statistics for
a table are stale.
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 8, 2023
Informs cockroachdb#64570

Release note (sql change): Added a new session setting,
optimizer_always_use_histograms, which ensures that the optimizer
always uses histograms when available to calculate the statistics
of every plan that it explores. Enabling this setting can prevent
the optimizer from choosing a suboptimal index when statistics for
a table are stale.
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 8, 2023
This commit enables the new session setting,
optimizer_always_use_histograms, by default. This ensures that the
optimizer always uses histograms when available to calculate the
statistics of every plan that it explores. Enabling this setting
can prevent the optimizer from choosing a suboptimal index when
statistics for a table are stale.

Informs cockroachdb#64570

Release note: None
craig bot pushed a commit that referenced this issue Mar 9, 2023
98248: opt: enable optimizer_always_use_histograms by default r=rytaft a=rytaft

This commit enables the new session setting, `optimizer_always_use_histograms`, by default. This ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale.

Informs #64570

Release note: None

Co-authored-by: Rebecca Taft <[email protected]>
@lin-crl
Copy link
Contributor

lin-crl commented Mar 23, 2023

Hi how do we estimate row counts for auto-incremental columns like int as part of PK?
For example PK is (col1 int, col2, int).
I just opened a support case on behalf of customer and support may open an escalation later. Since col1 is auto-incremental, a new value in col1 will always be out of col1 boundary. I wonder if we should use average row counts for col1, instead of 0 in this case. Thank you!

@michae2
Copy link
Collaborator

michae2 commented Mar 23, 2023

Hi how do we estimate row counts for auto-incremental columns like int as part of PK? For example PK is (col1 int, col2, int). I just opened a support case on behalf of customer and support may open an escalation later. Since col1 is auto-incremental, a new value in col1 will always be out of col1 boundary. I wonder if we should use average row counts for col1, instead of 0 in this case. Thank you!

To help with auto-increment columns, and other columns with predictable histogram changes, we added statistics forecasting in v22.2 (as described in #79872).

(If the problem is on v22.1, one way to check whether forecasting will help is to capture a statement bundle, and then use cockroach debug statement-bundle recreate with a v22.2 binary, and EXPLAIN the query again to see if forecasting is used.)

@lin-crl
Copy link
Contributor

lin-crl commented Mar 23, 2023

This is great to hear! @michae2 we'll validate it

@vbabenkoru
Copy link

vbabenkoru commented May 25, 2023

@michae2 Statistics forecasting doesn't work for this unfortunately, if one of the columns doesn't increase consistently.

@lin-crl
Copy link
Contributor

lin-crl commented May 25, 2023

@vbabenkoru Could you share a bit more on how the column is used? Perhaps on slack channel so we can discuss specifics on your use case?

@vbabenkoru
Copy link

@mgartner's proposal would work in our case I believe, since forecasting isn't always possible or reliable, and the default behavior of assuming 0 rows whenever it encounters a new value is the main issue here.

@mgartner
Copy link
Collaborator

@vbabenkoru You mean the proposal here, correct? I have a prototype of it that I'm hoping to find time to finish soon.

@mgartner mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023
@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Feb 1, 2024
@itsbilal itsbilal added the O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster label Apr 30, 2024
@itsbilal
Copy link
Member

Ran into this on the DRT drt-chaos cluster, so adding O-testcluster. Internal conversation

@ajstorm
Copy link
Collaborator

ajstorm commented May 2, 2024

Hit it on drt-large too, which resulted in this duplicate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.