-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/stats: forecast for different columnsets at different times #104174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach, @michae2, and @msirek)
-- commits
line 23 at r1:
nit: included in the ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @cucaroach and @michae2)
Before this change, all statistics forecasts for a table were at the same future time, determined as: time of most recent statistics collection (for any columnset) + average time between automatic collections (incl. all columnsets) This commit changes the formula slightly to: time of most recent statistics collection (for **this** columnset) + average time between automatic collections (incl. all columnsets) Meaning columnsets that were _not_ included in the most recent statistics collection will now have an older forecast time than columnsets that _were_ included. Columnsets that were included in the most recent collection will still all have the same forecast time. This will have two effects on the optimizer: 1. When using the first table statistic to get a row estimate for the table, statistics builder will now favor forecasts of columnsets included in the most recent statistics collection over forecasts of columnsets not included. 2. Forecasts of columnsets not included in the most recent statistics collection will be now be more similar to their most recent collection, but also potentially more stale. Fixes: cockroachdb#103958 Release note (bug fix): Fix a rare bug where stale multi-column table statistics could cause table statistics forecasts to be inaccurate, leading to unoptimal query plans.
Thanks for the reviews! bors r=mgartner,msirek |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 9592779 to blathers/backport-release-22.2-104174: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Before this change, all statistics forecasts for a table were at the same future time, determined as:
This commit changes the formula slightly to:
Meaning columnsets that were not included in the most recent statistics collection will now have an older forecast time than columnsets that were included. Columnsets that were included in the most recent collection will still all have the same forecast time.
This will have two effects on the optimizer:
Fixes: #103958
Release note (bug fix): Fix a rare bug where stale multi-column table statistics could cause table statistics forecasts to be inaccurate, leading to unoptimal query plans.