-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document more auto stats features & knobs #4635
Conversation
Some of this PR is factual and some more conceptual. In addition to adding facts about settings I hope to provide more context around why users should mostly leave auto stats alone unless they are troubleshooting / know what they are doing. However what I have here may not be the right way to think about it (or express it). It may be making too strong a claim about leaving things alone, since there may be use cases that are pretty common for disabling / throttling stats, or running CREATE STATS manually - I'm honestly not sure, so I welcome any feedback. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - thanks, @rmloveland! There are definitely workloads where it might make sense to turn off stats or decrease the frequency of refreshes, so maybe it's worth softening that language a bit. Not sure if you want to go into details, but workloads where the queries are very simple (e.g., lots of single key lookups or updates) probably won't benefit much from stats, and could be negatively impacted. Otherwise, I think the message is right that most users should probably leave the default settings as-is.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)
v19.1/cost-based-optimizer.md, line 93 at r1 (raw file):
{{site.data.alerts.callout_info}} Note that [schema changes](online-schema-changes.html) trigger automatic statistics collection for the affected table(s).
[nit] Seems a bit redundant that you have the word "Note" twice (it's not visible here, but visible in the UI)
v19.1/cost-based-optimizer.md, line 103 at r1 (raw file):
- `sql.stats.automatic_collection.fraction_stale_rows` - `sql.stats.automatic_collection.min_stale_rows`
Is it possible to link directly to the documentation for these settings in the cluster settings page?
v19.1/create-statistics.md, line 13 at r1 (raw file):
- Columns that are part of the primary key or an index (in other words, all indexed columns). - Up to 100 non-indexed columns.
This is only true if the column isn't specified explicitly
v19.1/create-statistics.md, line 53 at r1 (raw file):
### Create statistics on a default set of columns The `CREATE STATISTICS` statement shown below automatically figures out which columns to get statistics on — specifically, it chooses columns which are part of the primary key and/or an index.
This needs an update -- maybe you can just move the new stuff you added about 100 non-indexed columns down here
v19.1/create-statistics.md, line 75 at r1 (raw file):
{% include {{ page.version.version }}/misc/delete-statistics.md %} ### View statistics jobs
There are now two different commands to see user-created stats and system-created stats, so I think you might want to mention both here.
SHOW JOBS
will only show user-created stats (along with other jobs like IMPORT, etc).SHOW AUTOMATIC JOBS
will only show automatic stats jobs (Amruta is working on the docs page for this command).
I think you also might want to re-run the query below, since the output will look different (there won't be any rows with type AUTO CREATE STATS
if you use SHOW JOBS
). Alternatively, you could just change the example to use [SHOW AUTOMATIC JOBS]
instead of [SHOW JOBS]
.
Sorry this is a big change :(
0bb455e
to
2320b09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @rytaft! Re: the language, based on your comments I've left it mostly as is, with the slight update to say that
The information provided in this section is useful for troubleshooting or performance tuning by advanced users.
under the assumption that the folks who are more advanced will know who they are :-)
Regarding your other comments, I think they've been addressed, PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
v19.1/cost-based-optimizer.md, line 93 at r1 (raw file):
Previously, rytaft wrote…
[nit] Seems a bit redundant that you have the word "Note" twice (it's not visible here, but visible in the UI)
Thanks, I always forget that those little macros insert the "Note" text!
Fixed.
v19.1/cost-based-optimizer.md, line 103 at r1 (raw file):
Previously, rytaft wrote…
Is it possible to link directly to the documentation for these settings in the cluster settings page?
Yes!
Fixed.
v19.1/create-statistics.md, line 13 at r1 (raw file):
Previously, rytaft wrote…
This is only true if the column isn't specified explicitly
Fixed by updating to say:
"Up to 100 non-indexed columns (unless you specify which columns to create statistics on, as shown in this example (link)"
v19.1/create-statistics.md, line 53 at r1 (raw file):
Previously, rytaft wrote…
This needs an update -- maybe you can just move the new stuff you added about 100 non-indexed columns down here
Fixed by updating to use the new list of what's updated - thanks for the catch!
v19.1/create-statistics.md, line 75 at r1 (raw file):
Previously, rytaft wrote…
There are now two different commands to see user-created stats and system-created stats, so I think you might want to mention both here.
SHOW JOBS
will only show user-created stats (along with other jobs like IMPORT, etc).SHOW AUTOMATIC JOBS
will only show automatic stats jobs (Amruta is working on the docs page for this command).I think you also might want to re-run the query below, since the output will look different (there won't be any rows with type
AUTO CREATE STATS
if you useSHOW JOBS
). Alternatively, you could just change the example to use[SHOW AUTOMATIC JOBS]
instead of[SHOW JOBS]
.Sorry this is a big change :(
No worries, everything changes all the time :-)
Fixed by updating to add mention of both, describe when they happen, and show how to look at them.
Please let me know if I got the flavor right.
Note that I can't yet link to SHOW AUTO JOBS since it's not on master, but we can fix later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rmloveland!
Reviewed 3 of 3 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained
v19.1/create-statistics.md, line 75 at r1 (raw file):
Previously, rmloveland (Rich Loveland) wrote…
No worries, everything changes all the time :-)
Fixed by updating to add mention of both, describe when they happen, and show how to look at them.
Please let me know if I got the flavor right.
Note that I can't yet link to SHOW AUTO JOBS since it's not on master, but we can fix later.
Looks great - thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small edits, plus some style suggestions that you can take/leave. Otherwise, LGTM!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rmloveland)
v19.1/cost-based-optimizer.md, line 105 at r2 (raw file):
- [`sql.stats.automatic_collection.min_stale_rows`](cluster-settings.html#sql.stats.automatic_collection.min_stale_rows) If you need to turn off automatic statistics collection, follow the steps below.
Change the .
to a :
v19.1/cost-based-optimizer.md, line 114 at r2 (raw file):
~~~ 2. Look up what statistics were created by the automatic statistics generator using the [`SHOW STATISTICS`](show-statistics.html) statement.
Style suggestion: "2. Use the SHOW STATISTICS
statement to view automatically generated statistics."
v19.1/cost-based-optimizer.md, line 116 at r2 (raw file):
2. Look up what statistics were created by the automatic statistics generator using the [`SHOW STATISTICS`](show-statistics.html) statement. 3. Delete the automatically generated statistics using the instructions in [delete statistics](create-statistics.html#delete-statistics).
I think "delete statistics" should be capitalized since it's the section heading, but might also want to clarify and say "using the instructions in the Delete statistics section."
v19.1/create-statistics.md, line 10 at r2 (raw file):
Once you [create a table](create-table.html) and load data into it (e.g., [`INSERT`](insert.html), [`IMPORT`](import.html)), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs. `CREATE STATISTICS` automatically figures out which columns to get statistics on — specifically, it chooses:
Style suggestion: Replace "figures out" with "determines"
2320b09
to
d18de4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and various suggestions, @lhirata!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @lhirata)
v19.1/cost-based-optimizer.md, line 105 at r2 (raw file):
Previously, lhirata wrote…
Change the
.
to a:
Fixed.
v19.1/cost-based-optimizer.md, line 114 at r2 (raw file):
Previously, lhirata wrote…
Style suggestion: "2. Use the
SHOW STATISTICS
statement to view automatically generated statistics."
Fixed.
v19.1/cost-based-optimizer.md, line 116 at r2 (raw file):
Previously, lhirata wrote…
I think "delete statistics" should be capitalized since it's the section heading, but might also want to clarify and say "using the instructions in the Delete statistics section."
Fixed (went with capitalizing "Delete statistics").
v19.1/create-statistics.md, line 10 at r2 (raw file):
Previously, lhirata wrote…
Style suggestion: Replace "figures out" with "determines"
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @lhirata, @rmloveland, and @rytaft)
_includes/v19.1/sql/settings/settings.md, line 105 at r3 (raw file):
<tr><td><code>sql.query_cache.enabled</code></td><td>boolean</td><td><code>true</code></td><td>enable the query cache</td></tr> <tr><td><code>sql.stats.automatic_collection.enabled</code></td><td>boolean</td><td><code>true</code></td><td>automatic statistics collection mode</td></tr> <tr><td><a name="sql.stats.automatic_collection.fraction_stale_rows"></a><code>sql.stats.automatic_collection.fraction_stale_rows</code></td><td>float</td><td><code>0.2</code></td><td>target fraction of stale rows per table that will trigger a statistics refresh</td></tr>
This isn't a good idea. This file is auto-generated by cockroach and regularly copied over from that repo, so your change will be very short-lived. Instead, we should update cockroach to generate all of these settings with anchor tags like this. Can you open an issue in cockroach to do that?
v19.1/cost-based-optimizer.md, line 71 at r3 (raw file):
To control how often the automatic statistics jobs run on your cluster, adjust the following [cluster settings](cluster-settings.html). They define the target number of rows in a table that should be stale before statistics on that table are refreshed. - [`sql.stats.automatic_collection.fraction_stale_rows`](cluster-settings.html#sql.stats.automatic_collection.fraction_stale_rows)
See above. The anchor tags you're linking to aren't going to survive very long.
d18de4b
to
2e956ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review Jesse!
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @jseldess, @lhirata, and @rytaft)
_includes/v19.1/sql/settings/settings.md, line 105 at r3 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
This isn't a good idea. This file is auto-generated by cockroach and regularly copied over from that repo, so your change will be very short-lived. Instead, we should update cockroach to generate all of these settings with anchor tags like this. Can you open an issue in cockroach to do that?
Fixed by removing the anchor tags and links thereto.
Filed cockroachdb/cockroach#36733 to request the anchor tag feature.
v19.1/cost-based-optimizer.md, line 71 at r3 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
See above. The anchor tags you're linking to aren't going to survive very long.
Fixed by removing.
Hi @rmloveland - I noticed there is another page which mentions how to delete statistics, but it doesn't mention the "restart to clear cache part": https://www.cockroachlabs.com/docs/stable/create-statistics.html#delete-statistics Should we update that too? Or just have it link to the CBO section? |
Thanks @RaduBerinde , created #4872 to fix. |
Fixes #4455, #4570, #4517.
Summary of changes:
Add more context re: auto stats in general, and deemphasize turning it
off, tweaking the settings, or running CREATE STATS manually
Note that auto stats update after schema changes
Add docs on cluster settings for throttling auto stats