Skip to content

Commit

Permalink
Document more auto stats features & knobs
Browse files Browse the repository at this point in the history
Fixes #4455, #4570, #4517.

Summary of changes:

- Add more context re: auto stats in general, and deemphasize turning it
  off, tweaking the settings, or running CREATE STATS manually

- Note that auto stats update after schema changes

- Add docs on cluster settings for throttling auto stats
  • Loading branch information
rmloveland committed Apr 4, 2019
1 parent d43c1a8 commit 0bb455e
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 39 deletions.
16 changes: 0 additions & 16 deletions _includes/v19.1/misc/automatic-statistics.md

This file was deleted.

41 changes: 29 additions & 12 deletions v19.1/cost-based-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,26 +82,43 @@ This is not meant to be an exhaustive list. To check whether a particular query

## Table statistics

The cost-based optimizer can often find more performant query execution plans if it has access to statistical data on the contents of your database's tables. This statistical data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.
The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.

{% include {{ page.version.version }}/misc/automatic-statistics.md %}

To manually generate statistics for a table, run a [`CREATE STATISTICS`](create-statistics.html) statement like the one shown below. It automatically figures out which columns to get statistics on — specifically, it chooses:
<span class="version-tag">New in v19.1</span>: By default, CockroachDB generates table statistics automatically as tables are updated. It does this [using a background job](create-statistics.html#view-statistics-jobs) that automatically figures out which columns to get statistics on &mdash; specifically, it chooses:

- Columns that are part of the primary key or an index (in other words, all indexed columns).
- Up to 100 non-indexed columns.

Note that the above also describes the statistics gathered by the automatic statistics feature, since it runs a query similar to the one shown below.

{% include copy-clipboard.html %}
~~~ sql
> CREATE STATISTICS employees_stats FROM employees;
~~~

{{site.data.alerts.callout_info}}
Every time the [`CREATE STATISTICS`](create-statistics.html) statement is executed, it kicks off a background job. For more information, see [View statistics jobs](create-statistics.html#view-statistics-jobs).
Note that [schema changes](online-schema-changes.html) trigger automatic statistics collection for the affected table(s).
{{site.data.alerts.end}}

### Controlling automatic statistics

For best query performance, most users should leave automatic statistics enabled with the default settings. The information provided in this section is for troubleshooting.

To control how often the automatic statistics jobs run on your cluster, adjust the following [cluster settings](cluster-settings.html). They define the target number of rows in a table that should be stale before statistics on that table are refreshed.

- `sql.stats.automatic_collection.fraction_stale_rows`
- `sql.stats.automatic_collection.min_stale_rows`

If you need to turn off automatic statistics collection, follow the steps below.

1. Run the following statement to disable the automatic statistics [cluster setting](cluster-settings.html):

{% include copy-clipboard.html %}
~~~ sql
> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
~~~

2. Look up what statistics were created by the automatic statistics generator using the [`SHOW STATISTICS`](show-statistics.html) statement.

3. Delete the automatically generated statistics using the instructions in [delete statistics](create-statistics.html#delete-statistics).

4. Restart the nodes in your cluster to clear the statistics caches.

For instructions showing how to manually generate statistics, see the examples in the [`CREATE STATISTICS` documentation](create-statistics.html).

## Query plan cache

<span class="version-tag">New in v19.1</span>: CockroachDB uses a cache for the query plans generated by the optimizer. This can lead to faster query execution since the database can reuse a query plan that was previously calculated, rather than computing a new plan each time a query is executed.
Expand Down
34 changes: 23 additions & 11 deletions v19.1/create-statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ Use the `CREATE STATISTICS` [statement](sql-statements.html) to generate table s

Once you [create a table](create-table.html) and load data into it (e.g., [`INSERT`](insert.html), [`IMPORT`](import.html)), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs.

`CREATE STATISTICS` automatically figures out which columns to get statistics on &mdash; specifically, it chooses:

- Columns that are part of the primary key or an index (in other words, all indexed columns).
- Up to 100 non-indexed columns.

{{site.data.alerts.callout_info}}
<span class="version-tag">New in v19.1</span>: [Automatic statistics is enabled by default](cost-based-optimizer.html#table-statistics); most users don't need to issue `CREATE STATISTICS` statements directly.
{{site.data.alerts.end}}

## Synopsis

<div>
Expand All @@ -28,10 +37,6 @@ The user must have the `CREATE` [privilege](authorization.html#assign-privileges

## Examples

### Automatic table statistics

{% include {{ page.version.version }}/misc/automatic-statistics.md %}

### Create statistics on a specific column

{% include copy-clipboard.html %}
Expand Down Expand Up @@ -69,7 +74,7 @@ For more information about how the `AS OF SYSTEM TIME` clause works, including s

### View statistics jobs

Every time the `CREATE STATISTICS` statement is executed, it kicks off a background job. This is true for queries issued by your application as well as queries issued by the [automatic stats](#automatic-table-statistics) feature.
Every time the `CREATE STATISTICS` statement is executed, it kicks off a background job. This is true for queries issued by your application as well as queries issued by the [automatic stats feature](cost-based-optimizer.html#table-statistics).

To view statistics jobs, issue the following query that uses [`SHOW JOBS`](show-jobs.html).

Expand All @@ -79,12 +84,19 @@ To view statistics jobs, issue the following query that uses [`SHOW JOBS`](show-
~~~

~~~
job_id | job_type | description | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id
--------------------+--------------+-------------------------------------------------------------------------------------+-----------+-----------+----------------+----------------------------+----------------------------+----------------------------+----------------------------+--------------------+-------+----------------
429997863416791041 | CREATE STATS | CREATE STATISTICS employee_stats FROM test.public.employees AS OF SYSTEM TIME '-1m' | root | succeeded | | 2019-02-27 19:22:13.904065 | 2019-02-27 19:22:13.909684 | 2019-02-27 19:22:14.203006 | 2019-02-27 19:22:14.203007 | 1 | | 1
429996681838297089 | CREATE STATS | CREATE STATISTICS __auto__ FROM [67] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-02-27 19:16:13.314916 | 2019-02-27 19:16:13.317949 | 2019-02-27 19:16:13.63022 | 2019-02-27 19:16:13.630221 | 1 | | 1
429996676782456833 | CREATE STATS | CREATE STATISTICS __auto__ FROM [66] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-02-27 19:16:11.771999 | 2019-02-27 19:16:11.775159 | 2019-02-27 19:16:13.308078 | 2019-02-27 19:16:13.308079 | 1 | | 1
429996676018601985 | CREATE STATS | CREATE STATISTICS __auto__ FROM [65] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-02-27 19:16:11.538883 | 2019-02-27 19:16:11.542195 | 2019-02-27 19:16:11.762671 | 2019-02-27 19:16:11.762672 | 1 | | 1
job_id | job_type | description | statement | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id
--------------------+-------------------+-----------------------------------------------------+---------------------------------------------------------------+-----------+-----------+----------------+----------------------------+----------------------------+----------------------------+----------------------------+--------------------+-------+----------------
440126573959512065 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.order_line | CREATE STATISTICS __auto__ FROM [61] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:59:31.056986 | 2019-04-04 13:59:31.059442 | 2019-04-04 13:59:40.975497 | 2019-04-04 13:59:40.975498 | 1 | | 1
440126554231275521 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.stock | CREATE STATISTICS __auto__ FROM [60] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:59:25.036411 | 2019-04-04 13:59:25.040731 | 2019-04-04 13:59:31.053151 | 2019-04-04 13:59:31.053151 | 1 | | 1
440126352196435969 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.history | CREATE STATISTICS __auto__ FROM [56] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:58:23.380263 | 2019-04-04 13:58:23.384597 | 2019-04-04 13:58:25.023725 | 2019-04-04 13:58:25.023726 | 1 | | 1
440126345266462721 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.item | CREATE STATISTICS __auto__ FROM [59] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:58:21.265405 | 2019-04-04 13:58:21.267658 | 2019-04-04 13:58:23.377281 | 2019-04-04 13:58:23.377281 | 1 | | 1
440126345144532993 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.warehouse | CREATE STATISTICS __auto__ FROM [53] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:58:21.228193 | 2019-04-04 13:58:21.230397 | 2019-04-04 13:58:21.262612 | 2019-04-04 13:58:21.262613 | 1 | | 1
440126333637033985 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.customer | CREATE STATISTICS __auto__ FROM [55] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:58:17.716385 | 2019-04-04 13:58:17.718692 | 2019-04-04 13:58:21.225282 | 2019-04-04 13:58:21.225282 | 1 | | 1
440126328489476097 | AUTO CREATE STATS | Table statistics refresh for tpcc.public."order" | CREATE STATISTICS __auto__ FROM [57] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:58:16.145472 | 2019-04-04 13:58:16.148248 | 2019-04-04 13:58:17.713295 | 2019-04-04 13:58:17.713295 | 1 | | 1
440126319591227393 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.stock | CREATE STATISTICS __auto__ FROM [60] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:58:13.429948 | 2019-04-04 13:58:13.43343 | 2019-04-04 13:58:16.142435 | 2019-04-04 13:58:16.142436 | 1 | | 1
440126319390261249 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.district | CREATE STATISTICS __auto__ FROM [54] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:58:13.368614 | 2019-04-04 13:58:13.373541 | 2019-04-04 13:58:13.426379 | 2019-04-04 13:58:13.42638 | 1 | | 1
440126319248474113 | AUTO CREATE STATS | Table statistics refresh for tpcc.public.new_order | CREATE STATISTICS __auto__ FROM [58] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-04 13:58:13.325351 | 2019-04-04 13:58:13.330711 | 2019-04-04 13:58:13.363199 | 2019-04-04 13:58:13.3632 | 1 | | 1
(10 rows)
~~~

## See Also
Expand Down

0 comments on commit 0bb455e

Please sign in to comment.