opt, sql: improve automatic statistics for small tables #56615

rytaft · 2020-11-12T14:44:19Z

We have recently had reports of the optimizer choosing a bad plan due to inaccurate or misleading statistics on small tables. This can happen for a couple of reasons:

The table is brand new. We always collect statistics when a table is created, so unless it was created with data already in the table using CREATE TABLE AS ..., the first statistics collection will show that there are 0 rows. To avoid constantly refreshing small tables, we have a setting that prevents refreshing until at least 500 rows have changed. So we won't refresh the stats for this table again until at least 500 rows have been inserted. Therefore, the table could have 499 rows, but the optimizer will still think it has 0 rows. (We actually assume that all tables have at least one row to prevent creating really bad plans, but an efficient plan for 1 row could be very different from an efficient plan for 499 rows).
The size of the table is reflected correctly by the statistics, but scanning the table is much more expensive than we estimate it to be because there are many tombstones in the storage layer. This can happen when a small table is very frequently updated.

To fix the first issue, we have discussed a couple of ideas:

Always assume that every table has at least 500 rows. This has the advantage of being simple, and will probably do the right thing most of the time. It would also help ensure plan stability for small tables (regardless of whether they were created recently or not), and might help fix the second issue above related to tombstones. It could have some unintended consequences, though (e.g., not using a lookup/index join when we should)
We could disable the 500-row check when triggering automatic statistics refreshes until we have at least 4-5 stats refreshes completed. This would give us more accurate stats for brand new tables, but might not help in the case where a table has been around for a while. It would not help with the tombstone issue above either.

To fix the second issue, there are some additional options:

We could actually include the number of tombstones per table as an additional statistic that we collect, and/or collect the average number of keys/bytes scanned or skipped over per table row
We could have some notion of "plan stability", which we could get by modifying the plan cache somehow. For example, instead of just invalidating the plan cache when stats change, we could save the plan for comparison against a new plan, and select the old plan if it's within some small cost factor
We could also just increase the cost of unconstrained, unlimited scans, but that seems like it could cause other problems and lead us to pick worse plans in cases where the table doesn't actually have a lot of tombstones

To deal effectively with this issue, we'll probably want to implement some combination of the above ideas.

cc @RaduBerinde, @nvanbenschoten

Epic: CRDB-16930

Jira issue: CRDB-2921

Jira issue: CRDB-13904

The text was updated successfully, but these errors were encountered:

rytaft · 2020-11-12T15:09:27Z

Always assuming 500 rows seems like a nice and simple solution, but it does have a few complications/drawbacks in addition to the ones mentioned above. In particular, we need to figure out what to do about the column statistics. Column statistics consist of histograms, distinct counts, and null counts, and we use them in the optimizer to estimate the selectivity of predicates, estimate the cardinality of GROUP BY operations, and more. How will we change the histograms, distinct counts, and null counts to match up with the row count of 500? If the stats show that we have X rows, do we simply multiply everything by 500/X?

Presumably we would want to ignore any existing column statistics if the row count is very small (e.g., 0 or 1 rows), but at what point should we actually use those stats? When there are at least 5 rows? 10 rows?

This complication could be one reason to prefer the alternative idea listed above of allowing more frequent refreshes of small tables if there are fewer than 4-5 recent refreshes.

nvanbenschoten · 2020-11-12T16:53:17Z

It could have some unintended consequences, though (e.g., not using a lookup/index join when we should)

Could you explain why always assuming a table has at least 500 rows would result in not using a lookup join when we should? If anything, I would expect it to have the opposite effect – using a lookup join when we shouldn't.

We could also just increase the cost of unconstrained, unlimited scans, but that seems like it could cause other problems and lead us to pick worse plans in cases where the table doesn't actually have a lot of tombstones

Are there cases that we know about where this would result in worse plans?

rytaft · 2020-11-12T17:13:50Z

Could you explain why always assuming a table has at least 500 rows would result in not using a lookup join when we should? If anything, I would expect it to have the opposite effect – using a lookup join when we shouldn't.

That could happen if the small table is the input to the lookup join. We might choose a hash or merge join because we think the lookup join will be too expensive, when in fact that would be the better plan.

Are there cases that we know about where this would result in worse plans?

I don't know of a particular issue/example, but I would think that if we had to choose between doing a full table scan of a small table v. doing a bunch of index/lookup joins so that we could do a constrained scan of a larger table, that would not be worth it.

There is definitely some tuning of the overall cost model that we can and should do, but it's dangerous to change the costs of different operators in isolation since it changes the relative costs of everything else. I'm not saying that increasing the cost of unconstrained scans is a bad idea, I just don't think we should do it without considering all the plan changes that will inevitably result.

RaduBerinde · 2020-11-12T18:42:23Z

Could you explain why always assuming a table has at least 500 rows would result in not using a lookup join when we should? If anything, I would expect it to have the opposite effect – using a lookup join when we shouldn't.

That could happen if the small table is the input to the lookup join. We might choose a hash or merge join because we think the lookup join will be too expensive, when in fact that would be the better plan.

I see, like joining a 10 row table with a 10000 row table. 10 lookup joins is clearly better than a full scan of the big table, but 500 lookup joins could be too expensive. Maybe 500 is too much, maybe it should be around the smallest number where we prefer index joins for single-key accesses (over full table scans).

We could actually include the number of tombstones per table as an additional statistic that we collect, and/or collect the average number of keys/bytes scanned or skipped over per table row

One problem with this is that the tombsones can expire without any mutations to the table, and we won't know to refresh the stats. Maybe it should be a historical average of what we've seen rather than what was there when the stats last ran.

Increasing the cost of unconstrained scans seems too arbitrary to me. A constrained scan can have the same problem (especially something like (/NULL - ])

rytaft · 2020-11-12T19:53:11Z

Maybe 500 is too much, maybe it should be around the smallest number where we prefer index joins for single-key accesses (over full table scans).

That could work -- we could do a hybrid solution for the first issue: pick a minimum value that's larger than 1 but smaller than 500, and also trigger stats refreshes more frequently for small tables

Maybe it should be a historical average of what we've seen rather than what was there when the stats last ran.

That makes sense. We're already keeping the last 4-5 stats refreshes, so we could calculate the average over all of those refreshes.

To add another data point about how it's difficult to find the correct relative cost between operators: Here's an example where a full table scan is a better plan than doing a constrained scan + index join: #46677. This PR was an attempt to fix similar issues by making index joins more expensive: #54768.

rytaft · 2021-01-25T15:40:43Z

Another idea from for the problem of no stats right when a new cluster starts up: determine some crude stats after data is first loaded based on the sizes of the tables on disk. The import process should also know how many rows were created.

RaduBerinde · 2021-02-25T19:12:15Z

I have been playing around with various changes. Increasing the row count artificially is problematic - you have to figure out a way to do that to the distinct counts as well somehow, otherwise it doesn't work in many cases (e.g. it won't help with choosing lookup joins). A change like this causes many plan changes and seems pretty risky. I am also worried that it would lead to bad plans when tables are legitimately small.

A less promising change seems to be adding a small penalty for full table scans. It is enough to dissuade unnecessary full scans on small tables (e.g. for FKs). It still changes a few plans that will need to be checked.

rytaft · 2021-02-26T01:47:55Z

A change like this causes many plan changes and seems pretty risky.

This is what I was finding as well when I was working on the fix to #60493 (I initially tried to fix it by artificially inflating the row count).

I think increasing the cost of full scans could help.

I also want to explore the option of triggering more frequent stats refreshes for small tables, perhaps until there are at least 4 stats refreshes (which is how much history we keep around). I might have time to try that out next week..

The main motivation for this change is to avoid seemingly bad plans when the tables are empty or very small, in particular FK checks doing full scans instead of using an index. This has caused confusion with customers and users on multiple occasions. In addition, even if the plans are good for small tables, they are risky as the tables could get bigger than our stats show. Also, there are other hidden costs to scanning more than we need (like contention). This change makes a small cost adjustment - it adds the cost of 10 rows to all full scans. This is enough to discourage full scans on small tables, and to prefer constrained scans even when the row count is the same. Informs cockroachdb#56615. Release justification: low risk, high benefit changes to existing functionality. Release note (performance improvement): fixed cases where the optimizer was doing unnecessary full table scans when the table was very small (according to the last collected statistics).

61680: opt: increase cost of full scans r=RaduBerinde a=RaduBerinde The main motivation for this change is to avoid seemingly bad plans when the tables are empty or very small, in particular FK checks doing full scans instead of using an index. This has caused confusion with customers and users on multiple occasions. In addition, even if the plans are good for small tables, they are risky as the tables could get bigger than our stats show. Also, there are other hidden costs to scanning more than we need (like contention). This change makes a relatively small cost adjustment - it adds the cost of 10 rows to full scans. This is enough to discourage full scans on small tables, and to prefer constrained scans even when the row count is the same. Informs #56615. Fixes #56661. Release justification: low risk, high benefit changes to existing functionality. Release note (performance improvement): fixed cases where the optimizer was doing unnecessary full table scans when the table was very small (according to the last collected statistics). Co-authored-by: Radu Berinde <[email protected]>

The main motivation for this change is to avoid seemingly bad plans when the tables are empty or very small, in particular FK checks doing full scans instead of using an index. This has caused confusion with customers and users on multiple occasions. In addition, even if the plans are good for small tables, they are risky as the tables could get bigger than our stats show. Also, there are other hidden costs to scanning more than we need (like contention). This change makes a small cost adjustment - it adds the cost of 10 rows to all full scans. This is enough to discourage full scans on small tables, and to prefer constrained scans even when the row count is the same. Informs cockroachdb#56615. Release justification: low risk, high benefit changes to existing functionality. Release note (performance improvement): fixed cases where the optimizer was doing unnecessary full table scans when the table was very small (according to the last collected statistics).

RaduBerinde · 2021-03-16T16:26:25Z

#61680 is the change that we wanted in 21.1 for this issue. Moving out of the 21.1 bucket.

rytaft added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-optimizer SQL logical planning and optimizations. labels Nov 12, 2020

RaduBerinde mentioned this issue Feb 23, 2021

Full index scan planned for tables with small number of rows #56661

Closed

RaduBerinde self-assigned this Feb 25, 2021

RaduBerinde mentioned this issue Mar 9, 2021

opt: increase cost of full scans #61680

Merged

RaduBerinde mentioned this issue Mar 10, 2021

release-21.1: opt: increase cost of full scans #61805

Merged

rytaft mentioned this issue Mar 30, 2021

Incorrect plan on update where Primary Key supplied #62088

Closed

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

rytaft unassigned RaduBerinde Feb 21, 2023

michae2 added the A-sql-table-stats Table statistics (and their automatic refresh). label Mar 27, 2023

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023

rytaft added O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA labels Feb 5, 2024

exalate-issue-sync bot removed the P-3 Issues/test failures with no fix SLA label Feb 5, 2024

rytaft added the P-3 Issues/test failures with no fix SLA label Feb 5, 2024

mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt, sql: improve automatic statistics for small tables #56615

opt, sql: improve automatic statistics for small tables #56615

rytaft commented Nov 12, 2020 •

edited by vy-ton

Loading

rytaft commented Nov 12, 2020

nvanbenschoten commented Nov 12, 2020

rytaft commented Nov 12, 2020

RaduBerinde commented Nov 12, 2020

rytaft commented Nov 12, 2020

rytaft commented Jan 25, 2021 •

edited by ajstorm

Loading

RaduBerinde commented Feb 25, 2021

rytaft commented Feb 26, 2021

RaduBerinde commented Mar 16, 2021

opt, sql: improve automatic statistics for small tables #56615

opt, sql: improve automatic statistics for small tables #56615

Comments

rytaft commented Nov 12, 2020 • edited by vy-ton Loading

rytaft commented Nov 12, 2020

nvanbenschoten commented Nov 12, 2020

rytaft commented Nov 12, 2020

RaduBerinde commented Nov 12, 2020

rytaft commented Nov 12, 2020

rytaft commented Jan 25, 2021 • edited by ajstorm Loading

RaduBerinde commented Feb 25, 2021

rytaft commented Feb 26, 2021

RaduBerinde commented Mar 16, 2021

rytaft commented Nov 12, 2020 •

edited by vy-ton

Loading

rytaft commented Jan 25, 2021 •

edited by ajstorm

Loading