Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need more info about automatic stats settings #4809

Closed
rytaft opened this issue May 15, 2019 · 0 comments · Fixed by #4925
Closed

Need more info about automatic stats settings #4809

rytaft opened this issue May 15, 2019 · 0 comments · Fixed by #4925
Assignees
Labels
C-doc-improvement P-2 Normal priority; secondary task T-missing-info
Milestone

Comments

@rytaft
Copy link
Contributor

rytaft commented May 15, 2019

The automatic stats docs (https://www.cockroachlabs.com/docs/v19.1/cost-based-optimizer.html#controlling-automatic-statistics) mention two of the settings used to control automatic stats: sql.stats.automatic_collection.fraction_stale_rows and sql.stats.automatic_collection.min_stale_rows, but they don't give enough info to really understand how they work. One thing that would be helpful would be a direct link to those settings on the settings page, but it would also help to add a bit more info in the "Controlling automatic statistics" section. Some points we should emphasize:

  • The target number of stale rows in a particular table before a refresh is actually equal to target_stale_rows = fraction_stale_rows * <size of table> + min_stale_rows.
  • We don't necessarily perform a refresh when exactly that target number of rows are stale, because the decision to start a refresh is probabilistic. After each mutation operation (INSERT, UPDATE, or DELETE), the probability of a refresh is P(refresh) = (# rows updated, inserted, deleted)/target_stale_rows. The result is that on average, we will perform a refresh after target_stale_rows are stale, but it may be higher or lower.
  • To ensure that we always have stats and they aren't too stale, we also always perform a refresh if there are no stats or if it's been a long time since the last refresh (a "long time" is defined as 2x the average time between refreshes over the last 4 refreshes).

I don't think we necessarily need all this detail, but I've put it here so you can judge what to include or not. Please let me know if you have any questions -- thanks!
cc @RaduBerinde

@rmloveland rmloveland added A-sql P-2 Normal priority; secondary task labels Jun 4, 2019
rmloveland added a commit that referenced this issue Jun 17, 2019
... and node restarts after stats deletion to clear caches.

Summary of changes:

- Add a new subsection to CBO page, 'Controlling statistics refresh
  rate', where we describe the cases when stats are refreshed in more
  detail.

- To match the structure of the above, we break the instructions for
  deleting stats into a new section 'Turning off statistics'

- Finally, tweak stats deletion instructions on CBO page and CREATE
  STATS page so both say that nodes must be restarted
  post-stats-deletion to clear caches.

Fixes #4809, #4872.
rmloveland added a commit that referenced this issue Jun 19, 2019
... and node restarts after stats deletion to clear caches.

Summary of changes:

- Add a new subsection to CBO page, 'Controlling statistics refresh
  rate', where we describe the cases when stats are refreshed in more
  detail.

- To match the structure of the above, we break the instructions for
  deleting stats into a new section 'Turning off statistics'

- Finally, tweak stats deletion instructions on CBO page and CREATE
  STATS page so both say that nodes must be restarted
  post-stats-deletion to clear caches.

Fixes #4809, #4872.
rmloveland added a commit that referenced this issue Jun 20, 2019
... and node restarts after stats deletion to clear caches.

Summary of changes:

- Add a new subsection to CBO page, 'Controlling statistics refresh
  rate', where we describe the cases when stats are refreshed in more
  detail.

- To match the structure of the above, we break the instructions for
  deleting stats into a new section 'Turning off statistics'

- Finally, tweak stats deletion instructions on CBO page and CREATE
  STATS page so both say that nodes must be restarted
  post-stats-deletion to clear caches.

Fixes #4809, #4872.
@jseldess jseldess added this to the 19.2 milestone Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-doc-improvement P-2 Normal priority; secondary task T-missing-info
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants