Skip to content

Commit

Permalink
Document more auto stats features & knobs
Browse files Browse the repository at this point in the history
Fixes #4455, #4570, #4517.

Summary of changes:

- Add more context re: auto stats in general, and deemphasize turning it
  off, tweaking the settings, or running CREATE STATS manually

- Note that auto stats update after schema changes

- Add docs on cluster settings for throttling auto stats
  • Loading branch information
rmloveland committed Apr 9, 2019
1 parent bb8cd13 commit d18de4b
Show file tree
Hide file tree
Showing 4 changed files with 82 additions and 52 deletions.
16 changes: 0 additions & 16 deletions _includes/v19.1/misc/automatic-statistics.md

This file was deleted.

4 changes: 2 additions & 2 deletions _includes/v19.1/sql/settings/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,9 @@
<tr><td><code>sql.parallel_scans.enabled</code></td><td>boolean</td><td><code>true</code></td><td>parallelizes scanning different ranges when the maximum result size can be deduced</td></tr>
<tr><td><code>sql.query_cache.enabled</code></td><td>boolean</td><td><code>true</code></td><td>enable the query cache</td></tr>
<tr><td><code>sql.stats.automatic_collection.enabled</code></td><td>boolean</td><td><code>true</code></td><td>automatic statistics collection mode</td></tr>
<tr><td><code>sql.stats.automatic_collection.fraction_stale_rows</code></td><td>float</td><td><code>0.2</code></td><td>target fraction of stale rows per table that will trigger a statistics refresh</td></tr>
<tr><td><a name="sql.stats.automatic_collection.fraction_stale_rows"></a><code>sql.stats.automatic_collection.fraction_stale_rows</code></td><td>float</td><td><code>0.2</code></td><td>target fraction of stale rows per table that will trigger a statistics refresh</td></tr>
<tr><td><code>sql.stats.automatic_collection.max_fraction_idle</code></td><td>float</td><td><code>0.9</code></td><td>maximum fraction of time that automatic statistics sampler processors are idle</td></tr>
<tr><td><code>sql.stats.automatic_collection.min_stale_rows</code></td><td>integer</td><td><code>500</code></td><td>target minimum number of stale rows per table that will trigger a statistics refresh</td></tr>
<tr><td><a name="sql.stats.automatic_collection.min_stale_rows"></a><code>sql.stats.automatic_collection.min_stale_rows</code></td><td>integer</td><td><code>500</code></td><td>target minimum number of stale rows per table that will trigger a statistics refresh</td></tr>
<tr><td><code>sql.stats.post_events.enabled</code></td><td>boolean</td><td><code>false</code></td><td>if set, an event is shown for every CREATE STATISTICS job</td></tr>
<tr><td><code>sql.tablecache.lease.refresh_limit</code></td><td>integer</td><td><code>50</code></td><td>maximum number of tables to periodically refresh leases for</td></tr>
<tr><td><code>sql.trace.log_statement_execute</code></td><td>boolean</td><td><code>false</code></td><td>set to true to enable logging of executed statements</td></tr>
Expand Down
41 changes: 29 additions & 12 deletions v19.1/cost-based-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,26 +51,43 @@ This is not meant to be an exhaustive list. To check whether a particular query

## Table statistics

The cost-based optimizer can often find more performant query execution plans if it has access to statistical data on the contents of your database's tables. This statistical data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.
The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.

{% include {{ page.version.version }}/misc/automatic-statistics.md %}

To manually generate statistics for a table, run a [`CREATE STATISTICS`](create-statistics.html) statement like the one shown below. It automatically figures out which columns to get statistics on &mdash; specifically, it chooses:
<span class="version-tag">New in v19.1</span>: By default, CockroachDB generates table statistics automatically as tables are updated. It does this [using a background job](create-statistics.html#view-statistics-jobs) that automatically determines which columns to get statistics on &mdash; specifically, it chooses:

- Columns that are part of the primary key or an index (in other words, all indexed columns).
- Up to 100 non-indexed columns.

Note that the above also describes the statistics gathered by the automatic statistics feature, since it runs a query similar to the one shown below.

{% include copy-clipboard.html %}
~~~ sql
> CREATE STATISTICS employees_stats FROM employees;
~~~

{{site.data.alerts.callout_info}}
Every time the [`CREATE STATISTICS`](create-statistics.html) statement is executed, it kicks off a background job. For more information, see [View statistics jobs](create-statistics.html#view-statistics-jobs).
[Schema changes](online-schema-changes.html) trigger automatic statistics collection for the affected table(s).
{{site.data.alerts.end}}

### Controlling automatic statistics

For best query performance, most users should leave automatic statistics enabled with the default settings. The information provided in this section is useful for troubleshooting or performance tuning by advanced users.

To control how often the automatic statistics jobs run on your cluster, adjust the following [cluster settings](cluster-settings.html). They define the target number of rows in a table that should be stale before statistics on that table are refreshed.

- [`sql.stats.automatic_collection.fraction_stale_rows`](cluster-settings.html#sql.stats.automatic_collection.fraction_stale_rows)
- [`sql.stats.automatic_collection.min_stale_rows`](cluster-settings.html#sql.stats.automatic_collection.min_stale_rows)

If you need to turn off automatic statistics collection, follow the steps below:

1. Run the following statement to disable the automatic statistics [cluster setting](cluster-settings.html):

{% include copy-clipboard.html %}
~~~ sql
> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
~~~

2. Use the [`SHOW STATISTICS`](show-statistics.html) statement to view automatically generated statistics.

3. Delete the automatically generated statistics using the instructions in [Delete statistics](create-statistics.html#delete-statistics).

4. Restart the nodes in your cluster to clear the statistics caches.

For instructions showing how to manually generate statistics, see the examples in the [`CREATE STATISTICS` documentation](create-statistics.html).

## Query plan cache

<span class="version-tag">New in v19.1</span>: CockroachDB uses a cache for the query plans generated by the optimizer. This can lead to faster query execution since the database can reuse a query plan that was previously calculated, rather than computing a new plan each time a query is executed.
Expand Down
73 changes: 51 additions & 22 deletions v19.1/create-statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ Use the `CREATE STATISTICS` [statement](sql-statements.html) to generate table s

Once you [create a table](create-table.html) and load data into it (e.g., [`INSERT`](insert.html), [`IMPORT`](import.html)), table statistics can be generated. Table statistics help the cost-based optimizer determine the cardinality of the rows used in each query, which helps to predict more accurate costs.

`CREATE STATISTICS` automatically figures out which columns to get statistics on &mdash; specifically, it chooses:

- Columns that are part of the primary key or an index (in other words, all indexed columns).
- Up to 100 non-indexed columns (unless you specify which columns to create statistics on, as shown in [this example](#create-statistics-on-a-specific-column)).

{{site.data.alerts.callout_info}}
<span class="version-tag">New in v19.1</span>: [Automatic statistics is enabled by default](cost-based-optimizer.html#table-statistics); most users don't need to issue `CREATE STATISTICS` statements directly.
{{site.data.alerts.end}}

## Synopsis

<div>
Expand All @@ -28,10 +37,6 @@ The user must have the `CREATE` [privilege](authorization.html#assign-privileges

## Examples

### Automatic table statistics

{% include {{ page.version.version }}/misc/automatic-statistics.md %}

### Create statistics on a specific column

{% include copy-clipboard.html %}
Expand All @@ -45,7 +50,10 @@ Multi-column statistics are not supported yet.

### Create statistics on a default set of columns

The `CREATE STATISTICS` statement shown below automatically figures out which columns to get statistics on &mdash; specifically, it chooses columns which are part of the primary key and/or an index.
The `CREATE STATISTICS` statement shown below automatically figures out which columns to get statistics on &mdash; specifically, it chooses:

- Columns that are part of the primary key or an index (in other words, all indexed columns).
- Up to 100 non-indexed columns.

{% include copy-clipboard.html %}
~~~ sql
Expand All @@ -69,23 +77,44 @@ For more information about how the `AS OF SYSTEM TIME` clause works, including s

### View statistics jobs

Every time the `CREATE STATISTICS` statement is executed, it kicks off a background job. This is true for queries issued by your application as well as queries issued by the [automatic stats](#automatic-table-statistics) feature.

To view statistics jobs, issue the following query that uses [`SHOW JOBS`](show-jobs.html).

{% include copy-clipboard.html %}
~~~ sql
> SELECT * FROM [SHOW JOBS] WHERE job_type LIKE '%CREATE STATS%';
~~~

~~~
job_id | job_type | description | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id
--------------------+--------------+-------------------------------------------------------------------------------------+-----------+-----------+----------------+----------------------------+----------------------------+----------------------------+----------------------------+--------------------+-------+----------------
429997863416791041 | CREATE STATS | CREATE STATISTICS employee_stats FROM test.public.employees AS OF SYSTEM TIME '-1m' | root | succeeded | | 2019-02-27 19:22:13.904065 | 2019-02-27 19:22:13.909684 | 2019-02-27 19:22:14.203006 | 2019-02-27 19:22:14.203007 | 1 | | 1
429996681838297089 | CREATE STATS | CREATE STATISTICS __auto__ FROM [67] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-02-27 19:16:13.314916 | 2019-02-27 19:16:13.317949 | 2019-02-27 19:16:13.63022 | 2019-02-27 19:16:13.630221 | 1 | | 1
429996676782456833 | CREATE STATS | CREATE STATISTICS __auto__ FROM [66] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-02-27 19:16:11.771999 | 2019-02-27 19:16:11.775159 | 2019-02-27 19:16:13.308078 | 2019-02-27 19:16:13.308079 | 1 | | 1
429996676018601985 | CREATE STATS | CREATE STATISTICS __auto__ FROM [65] AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-02-27 19:16:11.538883 | 2019-02-27 19:16:11.542195 | 2019-02-27 19:16:11.762671 | 2019-02-27 19:16:11.762672 | 1 | | 1
~~~
Every time the `CREATE STATISTICS` statement is executed, it kicks off a background job. This is true for queries issued by your application as well as queries issued by the [automatic stats feature](cost-based-optimizer.html#table-statistics).

To view statistics jobs, there are two options:

1. Use [`SHOW JOBS`](show-jobs.html) to see all statistics jobs that were created by user queries (i.e., someone entering `CREATE STATISTICS` at the SQL prompt or via application code):

{% include copy-clipboard.html %}
~~~ sql
> SELECT * FROM [SHOW JOBS] WHERE job_type LIKE '%CREATE STATS%';
~~~

~~~
job_id | job_type | description | statement | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id
--------------------+--------------+------------------------------------------------------------------+-----------+-----------+-----------+----------------+----------------------------+----------------------------+----------------------------+----------------------------+--------------------+-------+----------------
441281249412743169 | CREATE STATS | CREATE STATISTICS salary_stats FROM employees.public.salaries | | root | succeeded | | 2019-04-08 15:52:30.040531 | 2019-04-08 15:52:30.046646 | 2019-04-08 15:52:32.757519 | 2019-04-08 15:52:32.757519 | 1 | | 1
441281163978637313 | CREATE STATS | CREATE STATISTICS employee_stats FROM employees.public.employees | | root | succeeded | | 2019-04-08 15:52:03.968099 | 2019-04-08 15:52:03.972557 | 2019-04-08 15:52:05.168809 | 2019-04-08 15:52:05.168809 | 1 | | 1
(2 rows)
~~~

2. Use `SHOW AUTOMATIC JOBS` to see statistics jobs that were created by the [automatic statistics feature](cost-based-optimizer.html#table-statistics):

{% include copy-clipboard.html %}
~~~ sql
> SELECT * FROM [SHOW AUTOMATIC JOBS] WHERE job_type LIKE '%CREATE STATS%';
~~~

~~~
job_id | job_type | description | statement | user_name | status | running_status | created | started | finished | modified | fraction_completed | error | coordinator_id
--------------------+-------------------+------------------------------------------------------------+-------------------------------------------------------------------------------------------+-----------+-----------+----------------+----------------------------+----------------------------+----------------------------+----------------------------+--------------------+-------+----------------
441280366254850049 | AUTO CREATE STATS | Table statistics refresh for employees.public.departments | CREATE STATISTICS __auto__ FROM [55] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-08 15:48:00.522119 | 2019-04-08 15:48:00.52663 | 2019-04-08 15:48:00.541608 | 2019-04-08 15:48:00.541608 | 1 | | 1
441280364809289729 | AUTO CREATE STATS | Table statistics refresh for employees.public.titles | CREATE STATISTICS __auto__ FROM [60] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-08 15:48:00.080971 | 2019-04-08 15:48:00.083117 | 2019-04-08 15:48:00.515766 | 2019-04-08 15:48:00.515767 | 1 | | 1
441280356286201857 | AUTO CREATE STATS | Table statistics refresh for employees.public.salaries | CREATE STATISTICS __auto__ FROM [59] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-08 15:47:57.479929 | 2019-04-08 15:47:57.482235 | 2019-04-08 15:48:00.075025 | 2019-04-08 15:48:00.075025 | 1 | | 1
441280352161693697 | AUTO CREATE STATS | Table statistics refresh for employees.public.employees | CREATE STATISTICS __auto__ FROM [58] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-08 15:47:56.221223 | 2019-04-08 15:47:56.223664 | 2019-04-08 15:47:57.474159 | 2019-04-08 15:47:57.474159 | 1 | | 1
441280352070434817 | AUTO CREATE STATS | Table statistics refresh for employees.public.dept_manager | CREATE STATISTICS __auto__ FROM [57] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-08 15:47:56.193375 | 2019-04-08 15:47:56.195813 | 2019-04-08 15:47:56.215114 | 2019-04-08 15:47:56.215114 | 1 | | 1
441280350791401473 | AUTO CREATE STATS | Table statistics refresh for employees.public.dept_emp | CREATE STATISTICS __auto__ FROM [56] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-08 15:47:55.803052 | 2019-04-08 15:47:55.806071 | 2019-04-08 15:47:56.187153 | 2019-04-08 15:47:56.187154 | 1 | | 1
441279760786096129 | AUTO CREATE STATS | Table statistics refresh for test.public.kv | CREATE STATISTICS __auto__ FROM [53] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s' | root | succeeded | | 2019-04-08 15:44:55.747725 | 2019-04-08 15:44:55.754582 | 2019-04-08 15:44:55.775664 | 2019-04-08 15:44:55.775665 | 1 | | 1
(7 rows)
~~~

## See Also

Expand Down

0 comments on commit d18de4b

Please sign in to comment.