Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document more auto stats features & knobs #4635

Merged
merged 1 commit into from
Apr 10, 2019

Conversation

rmloveland
Copy link
Contributor

Fixes #4455, #4570, #4517.

Summary of changes:

  • Add more context re: auto stats in general, and deemphasize turning it
    off, tweaking the settings, or running CREATE STATS manually

  • Note that auto stats update after schema changes

  • Add docs on cluster settings for throttling auto stats

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rmloveland rmloveland requested a review from rytaft April 4, 2019 21:21
@rmloveland
Copy link
Contributor Author

@rytaft

Some of this PR is factual and some more conceptual. In addition to adding facts about settings I hope to provide more context around why users should mostly leave auto stats alone unless they are troubleshooting / know what they are doing.

However what I have here may not be the right way to think about it (or express it). It may be making too strong a claim about leaving things alone, since there may be use cases that are pretty common for disabling / throttling stats, or running CREATE STATS manually - I'm honestly not sure, so I welcome any feedback. Thanks!

Copy link
Contributor

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - thanks, @rmloveland! There are definitely workloads where it might make sense to turn off stats or decrease the frequency of refreshes, so maybe it's worth softening that language a bit. Not sure if you want to go into details, but workloads where the queries are very simple (e.g., lots of single key lookups or updates) probably won't benefit much from stats, and could be negatively impacted. Otherwise, I think the message is right that most users should probably leave the default settings as-is.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)


v19.1/cost-based-optimizer.md, line 93 at r1 (raw file):

{{site.data.alerts.callout_info}}
Note that [schema changes](online-schema-changes.html) trigger automatic statistics collection for the affected table(s).

[nit] Seems a bit redundant that you have the word "Note" twice (it's not visible here, but visible in the UI)


v19.1/cost-based-optimizer.md, line 103 at r1 (raw file):

- `sql.stats.automatic_collection.fraction_stale_rows`
- `sql.stats.automatic_collection.min_stale_rows`

Is it possible to link directly to the documentation for these settings in the cluster settings page?


v19.1/create-statistics.md, line 13 at r1 (raw file):

- Columns that are part of the primary key or an index (in other words, all indexed columns).
- Up to 100 non-indexed columns.

This is only true if the column isn't specified explicitly


v19.1/create-statistics.md, line 53 at r1 (raw file):

### Create statistics on a default set of columns

The `CREATE STATISTICS` statement shown below automatically figures out which columns to get statistics on — specifically, it chooses columns which are part of the primary key and/or an index.

This needs an update -- maybe you can just move the new stuff you added about 100 non-indexed columns down here


v19.1/create-statistics.md, line 75 at r1 (raw file):

{% include {{ page.version.version }}/misc/delete-statistics.md %}

### View statistics jobs

There are now two different commands to see user-created stats and system-created stats, so I think you might want to mention both here.

  • SHOW JOBS will only show user-created stats (along with other jobs like IMPORT, etc).
  • SHOW AUTOMATIC JOBS will only show automatic stats jobs (Amruta is working on the docs page for this command).

I think you also might want to re-run the query below, since the output will look different (there won't be any rows with type AUTO CREATE STATS if you use SHOW JOBS). Alternatively, you could just change the example to use [SHOW AUTOMATIC JOBS] instead of [SHOW JOBS].

Sorry this is a big change :(

@rmloveland rmloveland force-pushed the 20190404-stats-all-the-things branch from 0bb455e to 2320b09 Compare April 8, 2019 16:04
Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @rytaft! Re: the language, based on your comments I've left it mostly as is, with the slight update to say that

The information provided in this section is useful for troubleshooting or performance tuning by advanced users.

under the assumption that the folks who are more advanced will know who they are :-)

Regarding your other comments, I think they've been addressed, PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


v19.1/cost-based-optimizer.md, line 93 at r1 (raw file):

Previously, rytaft wrote…

[nit] Seems a bit redundant that you have the word "Note" twice (it's not visible here, but visible in the UI)

Thanks, I always forget that those little macros insert the "Note" text!

Fixed.


v19.1/cost-based-optimizer.md, line 103 at r1 (raw file):

Previously, rytaft wrote…

Is it possible to link directly to the documentation for these settings in the cluster settings page?

Yes!

Fixed.


v19.1/create-statistics.md, line 13 at r1 (raw file):

Previously, rytaft wrote…

This is only true if the column isn't specified explicitly

Fixed by updating to say:

"Up to 100 non-indexed columns (unless you specify which columns to create statistics on, as shown in this example (link)"


v19.1/create-statistics.md, line 53 at r1 (raw file):

Previously, rytaft wrote…

This needs an update -- maybe you can just move the new stuff you added about 100 non-indexed columns down here

Fixed by updating to use the new list of what's updated - thanks for the catch!


v19.1/create-statistics.md, line 75 at r1 (raw file):

Previously, rytaft wrote…

There are now two different commands to see user-created stats and system-created stats, so I think you might want to mention both here.

  • SHOW JOBS will only show user-created stats (along with other jobs like IMPORT, etc).
  • SHOW AUTOMATIC JOBS will only show automatic stats jobs (Amruta is working on the docs page for this command).

I think you also might want to re-run the query below, since the output will look different (there won't be any rows with type AUTO CREATE STATS if you use SHOW JOBS). Alternatively, you could just change the example to use [SHOW AUTOMATIC JOBS] instead of [SHOW JOBS].

Sorry this is a big change :(

No worries, everything changes all the time :-)

Fixed by updating to add mention of both, describe when they happen, and show how to look at them.

Please let me know if I got the flavor right.

Note that I can't yet link to SHOW AUTO JOBS since it's not on master, but we can fix later.

Copy link
Contributor

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks @rmloveland!

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


v19.1/create-statistics.md, line 75 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

No worries, everything changes all the time :-)

Fixed by updating to add mention of both, describe when they happen, and show how to look at them.

Please let me know if I got the flavor right.

Note that I can't yet link to SHOW AUTO JOBS since it's not on master, but we can fix later.

Looks great - thank you!

@rmloveland rmloveland requested a review from lnhsingh April 8, 2019 17:31
Copy link
Contributor

@lnhsingh lnhsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small edits, plus some style suggestions that you can take/leave. Otherwise, LGTM!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rmloveland)


v19.1/cost-based-optimizer.md, line 105 at r2 (raw file):

- [`sql.stats.automatic_collection.min_stale_rows`](cluster-settings.html#sql.stats.automatic_collection.min_stale_rows)

If you need to turn off automatic statistics collection, follow the steps below.

Change the . to a :


v19.1/cost-based-optimizer.md, line 114 at r2 (raw file):

    ~~~

2. Look up what statistics were created by the automatic statistics generator using the [`SHOW STATISTICS`](show-statistics.html) statement.

Style suggestion: "2. Use the SHOW STATISTICS statement to view automatically generated statistics."


v19.1/cost-based-optimizer.md, line 116 at r2 (raw file):

2. Look up what statistics were created by the automatic statistics generator using the [`SHOW STATISTICS`](show-statistics.html) statement.

3. Delete the automatically generated statistics using the instructions in [delete statistics](create-statistics.html#delete-statistics).

I think "delete statistics" should be capitalized since it's the section heading, but might also want to clarify and say "using the instructions in the Delete statistics section."


v19.1/create-statistics.md, line 10 at r2 (raw file):

Once you [create a table](create-table.html) and load data into it (e.g., [`INSERT`](insert.html), [`IMPORT`](import.html)), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs.

`CREATE STATISTICS` automatically figures out which columns to get statistics on — specifically, it chooses:

Style suggestion: Replace "figures out" with "determines"

@rmloveland rmloveland force-pushed the 20190404-stats-all-the-things branch from 2320b09 to d18de4b Compare April 9, 2019 15:52
Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and various suggestions, @lhirata!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lhirata)


v19.1/cost-based-optimizer.md, line 105 at r2 (raw file):

Previously, lhirata wrote…

Change the . to a :

Fixed.


v19.1/cost-based-optimizer.md, line 114 at r2 (raw file):

Previously, lhirata wrote…

Style suggestion: "2. Use the SHOW STATISTICS statement to view automatically generated statistics."

Fixed.


v19.1/cost-based-optimizer.md, line 116 at r2 (raw file):

Previously, lhirata wrote…

I think "delete statistics" should be capitalized since it's the section heading, but might also want to clarify and say "using the instructions in the Delete statistics section."

Fixed (went with capitalizing "Delete statistics").


v19.1/create-statistics.md, line 10 at r2 (raw file):

Previously, lhirata wrote…

Style suggestion: Replace "figures out" with "determines"

Fixed.

@rmloveland rmloveland requested a review from jseldess April 9, 2019 15:56
Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @lhirata, @rmloveland, and @rytaft)


_includes/v19.1/sql/settings/settings.md, line 105 at r3 (raw file):

<tr><td><code>sql.query_cache.enabled</code></td><td>boolean</td><td><code>true</code></td><td>enable the query cache</td></tr>
<tr><td><code>sql.stats.automatic_collection.enabled</code></td><td>boolean</td><td><code>true</code></td><td>automatic statistics collection mode</td></tr>
<tr><td><a name="sql.stats.automatic_collection.fraction_stale_rows"></a><code>sql.stats.automatic_collection.fraction_stale_rows</code></td><td>float</td><td><code>0.2</code></td><td>target fraction of stale rows per table that will trigger a statistics refresh</td></tr>

This isn't a good idea. This file is auto-generated by cockroach and regularly copied over from that repo, so your change will be very short-lived. Instead, we should update cockroach to generate all of these settings with anchor tags like this. Can you open an issue in cockroach to do that?


v19.1/cost-based-optimizer.md, line 71 at r3 (raw file):

To control how often the automatic statistics jobs run on your cluster, adjust the following [cluster settings](cluster-settings.html). They define the target number of rows in a table that should be stale before statistics on that table are refreshed.

- [`sql.stats.automatic_collection.fraction_stale_rows`](cluster-settings.html#sql.stats.automatic_collection.fraction_stale_rows)

See above. The anchor tags you're linking to aren't going to survive very long.

Fixes #4455, #4570, #4517.

Summary of changes:

- Add more context re: auto stats in general, and deemphasize turning it
  off, tweaking the settings, or running CREATE STATS manually

- Note that auto stats update after schema changes

- Add docs on cluster settings for throttling auto stats
@rmloveland rmloveland force-pushed the 20190404-stats-all-the-things branch from d18de4b to 2e956ec Compare April 10, 2019 20:38
Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Jesse!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @jseldess, @lhirata, and @rytaft)


_includes/v19.1/sql/settings/settings.md, line 105 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

This isn't a good idea. This file is auto-generated by cockroach and regularly copied over from that repo, so your change will be very short-lived. Instead, we should update cockroach to generate all of these settings with anchor tags like this. Can you open an issue in cockroach to do that?

Fixed by removing the anchor tags and links thereto.

Filed cockroachdb/cockroach#36733 to request the anchor tag feature.


v19.1/cost-based-optimizer.md, line 71 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

See above. The anchor tags you're linking to aren't going to survive very long.

Fixed by removing.

@rmloveland rmloveland merged commit 718bd96 into master Apr 10, 2019
@rmloveland rmloveland deleted the 20190404-stats-all-the-things branch April 10, 2019 20:45
@RaduBerinde
Copy link
Member

Hi @rmloveland - I noticed there is another page which mentions how to delete statistics, but it doesn't mention the "restart to clear cache part": https://www.cockroachlabs.com/docs/stable/create-statistics.html#delete-statistics

Should we update that too? Or just have it link to the CBO section?

@rmloveland
Copy link
Contributor Author

Thanks @RaduBerinde , created #4872 to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revise optimizer table stats info with more context re: auto vs. non-auto
6 participants