From ee681b7af7efdec317e83593fe3feb37e402c2e2 Mon Sep 17 00:00:00 2001 From: Rich Loveland Date: Mon, 17 Jun 2019 16:28:09 -0400 Subject: [PATCH] Update docs on automatic stats refresh rate ... and node restarts after stats deletion to clear caches. Summary of changes: - Add a new subsection to CBO page, 'Controlling statistics refresh rate', where we describe the cases when stats are refreshed in more detail. - To match the structure of the above, we break the instructions for deleting stats into a new section 'Turning off statistics' - Finally, tweak stats deletion instructions on CBO page and CREATE STATS page so both say that nodes must be restarted post-stats-deletion to clear caches. Fixes #4809, #4872. --- _includes/v19.1/misc/delete-statistics.md | 2 ++ _includes/v19.2/misc/delete-statistics.md | 2 ++ v19.1/cost-based-optimizer.md | 23 +++++++++++++++++++---- v19.2/cost-based-optimizer.md | 23 +++++++++++++++++++---- 4 files changed, 42 insertions(+), 8 deletions(-) diff --git a/_includes/v19.1/misc/delete-statistics.md b/_includes/v19.1/misc/delete-statistics.md index 1e573e68d1a..a568055e583 100644 --- a/_includes/v19.1/misc/delete-statistics.md +++ b/_includes/v19.1/misc/delete-statistics.md @@ -12,4 +12,6 @@ To delete a named set of statistics (e.g, one named "my_stats"), run a query lik > DELETE FROM system.table_statistics WHERE name = 'my_stats'; ~~~ +After deleting statistics, restart the nodes in your cluster to clear the statistics caches. + For more information about the `DELETE` statement, see [`DELETE`](delete.html). diff --git a/_includes/v19.2/misc/delete-statistics.md b/_includes/v19.2/misc/delete-statistics.md index 1e573e68d1a..a568055e583 100644 --- a/_includes/v19.2/misc/delete-statistics.md +++ b/_includes/v19.2/misc/delete-statistics.md @@ -12,4 +12,6 @@ To delete a named set of statistics (e.g, one named "my_stats"), run a query lik > DELETE FROM system.table_statistics WHERE name = 'my_stats'; ~~~ +After deleting statistics, restart the nodes in your cluster to clear the statistics caches. + For more information about the `DELETE` statement, see [`DELETE`](delete.html). diff --git a/v19.1/cost-based-optimizer.md b/v19.1/cost-based-optimizer.md index 69061ff1ce4..133e594d1f8 100644 --- a/v19.1/cost-based-optimizer.md +++ b/v19.1/cost-based-optimizer.md @@ -67,10 +67,20 @@ The cost-based optimizer can often find more performant query plans if it has ac For best query performance, most users should leave automatic statistics enabled with the default settings. The information provided in this section is useful for troubleshooting or performance tuning by advanced users. -To control how often the automatic statistics jobs run on your cluster, adjust the following [cluster settings](cluster-settings.html). They define the target number of rows in a table that should be stale before statistics on that table are refreshed. +#### Controlling statistics refresh rate -- `sql.stats.automatic_collection.fraction_stale_rows` -- `sql.stats.automatic_collection.min_stale_rows` +Statistics are refreshed in the following cases: + +1. When there are no statistics. +2. When it's been a long time since the last refresh, where "long time" is defined according to a moving average of the time across the last several refreshes. +3. After each mutation operation ([`INSERT`](insert.html), [`UPDATE`](update.html), or [`DELETE`](delete.html)), the probability of a refresh is calculated using a formula that takes the [cluster settings](cluster-settings.html) shown below as inputs. These settings define the target number of rows in a table that should be stale before statistics on that table are refreshed. + +| Setting | Details | +|------------------------------------------------------+--------------------------------------------------------------------------------------| +| `sql.stats.automatic_collection.fraction_stale_rows` | Target fraction of stale rows per table that will trigger a statistics refresh | +| `sql.stats.automatic_collection.min_stale_rows` | Target minimum number of stale rows per table that will trigger a statistics refresh | + +#### Turning off statistics If you need to turn off automatic statistics collection, follow the steps below: @@ -83,7 +93,12 @@ If you need to turn off automatic statistics collection, follow the steps below: 2. Use the [`SHOW STATISTICS`](show-statistics.html) statement to view automatically generated statistics. -3. Delete the automatically generated statistics using the instructions in [Delete statistics](create-statistics.html#delete-statistics). +3. Delete the automatically generated statistics using the following statement: + + {% include copy-clipboard.html %} + ~~~ sql + > DELETE FROM system.table_statistics WHERE true; + ~~~ 4. Restart the nodes in your cluster to clear the statistics caches. diff --git a/v19.2/cost-based-optimizer.md b/v19.2/cost-based-optimizer.md index 2e373bfbc2c..26bd0826932 100644 --- a/v19.2/cost-based-optimizer.md +++ b/v19.2/cost-based-optimizer.md @@ -67,10 +67,20 @@ By default, CockroachDB generates table statistics automatically as tables are u For best query performance, most users should leave automatic statistics enabled with the default settings. The information provided in this section is useful for troubleshooting or performance tuning by advanced users. -To control how often the automatic statistics jobs run on your cluster, adjust the following [cluster settings](cluster-settings.html). They define the target number of rows in a table that should be stale before statistics on that table are refreshed. +#### Controlling statistics refresh rate -- `sql.stats.automatic_collection.fraction_stale_rows` -- `sql.stats.automatic_collection.min_stale_rows` +Statistics are refreshed in the following cases: + +1. When there are no statistics. +2. When it's been a long time since the last refresh, where "long time" is defined according to a moving average of the time across the last several refreshes. +3. After each mutation operation ([`INSERT`](insert.html), [`UPDATE`](update.html), or [`DELETE`](delete.html)), the probability of a refresh is calculated using a formula that takes the [cluster settings](cluster-settings.html) shown below as inputs. These settings define the target number of rows in a table that should be stale before statistics on that table are refreshed. + +| Setting | Details | +|------------------------------------------------------+--------------------------------------------------------------------------------------| +| `sql.stats.automatic_collection.fraction_stale_rows` | Target fraction of stale rows per table that will trigger a statistics refresh | +| `sql.stats.automatic_collection.min_stale_rows` | Target minimum number of stale rows per table that will trigger a statistics refresh | + +#### Turning off statistics If you need to turn off automatic statistics collection, follow the steps below: @@ -83,7 +93,12 @@ If you need to turn off automatic statistics collection, follow the steps below: 2. Use the [`SHOW STATISTICS`](show-statistics.html) statement to view automatically generated statistics. -3. Delete the automatically generated statistics using the instructions in [Delete statistics](create-statistics.html#delete-statistics). +3. Delete the automatically generated statistics using the following statement: + + {% include copy-clipboard.html %} + ~~~ sql + > DELETE FROM system.table_statistics WHERE true; + ~~~ 4. Restart the nodes in your cluster to clear the statistics caches.