Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: forecast for different columnsets at different times #104174

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Jun 1, 2023

Before this change, all statistics forecasts for a table were at the same future time, determined as:

  time of most recent statistics collection (for any columnset)
  + average time between automatic collections (incl. all columnsets)

This commit changes the formula slightly to:

  time of most recent statistics collection (for **this** columnset)
  + average time between automatic collections (incl. all columnsets)

Meaning columnsets that were not included in the most recent statistics collection will now have an older forecast time than columnsets that were included. Columnsets that were included in the most recent collection will still all have the same forecast time.

This will have two effects on the optimizer:

  1. When using the first table statistic to get a row estimate for the table, statistics builder will now favor forecasts of columnsets included in the most recent statistics collection over forecasts of columnsets not included.
  2. Forecasts of columnsets not included in the most recent statistics collection will be now be more similar to their most recent collection, but also potentially more stale.

Fixes: #103958

Release note (bug fix): Fix a rare bug where stale multi-column table statistics could cause table statistics forecasts to be inaccurate, leading to unoptimal query plans.

@michae2 michae2 requested review from msirek and cucaroach June 1, 2023 00:26
@michae2 michae2 requested a review from a team as a code owner June 1, 2023 00:26
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2 michae2 added backport-22.2.x backport-23.1.x Flags PRs that need to be backported to 23.1 labels Jun 1, 2023
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @michae2, and @msirek)


-- commits line 23 at r1:
nit: included in the ...

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix!
:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @cucaroach and @michae2)

Before this change, all statistics forecasts for a table were at the
same future time, determined as:

  time of most recent statistics collection (for any columnset)
  + average time between automatic collections (incl. all columnsets)

This commit changes the formula slightly to:

  time of most recent statistics collection (for **this** columnset)
  + average time between automatic collections (incl. all columnsets)

Meaning columnsets that were _not_ included in the most recent
statistics collection will now have an older forecast time than
columnsets that _were_ included. Columnsets that were included in the
most recent collection will still all have the same forecast time.

This will have two effects on the optimizer:
1. When using the first table statistic to get a row estimate for the
   table, statistics builder will now favor forecasts of columnsets
   included in the most recent statistics collection over forecasts of
   columnsets not included.
2. Forecasts of columnsets not included in the most recent statistics
   collection will be now be more similar to their most recent
   collection, but also potentially more stale.

Fixes: cockroachdb#103958

Release note (bug fix): Fix a rare bug where stale multi-column table
statistics could cause table statistics forecasts to be inaccurate,
leading to unoptimal query plans.
@michae2
Copy link
Collaborator Author

michae2 commented Jun 1, 2023

Thanks for the reviews!

bors r=mgartner,msirek

@craig
Copy link
Contributor

craig bot commented Jun 1, 2023

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Jun 1, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 9592779 to blathers/backport-release-22.2-104174: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/stats: forecast of stale stats supersedes fresh stats
4 participants