Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: introduce knobs for SQL stats to reduce the write traffic into system tables #78339

Closed
Azhng opened this issue Mar 23, 2022 · 1 comment · Fixed by #78446
Closed

sql: introduce knobs for SQL stats to reduce the write traffic into system tables #78339

Azhng opened this issue Mar 23, 2022 · 1 comment · Fixed by #78446
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@Azhng
Copy link
Contributor

Azhng commented Mar 23, 2022

Currently, SQL Stats writes to the system table whenever the in-memory store is full. This means potentially the rate of writing can be arbitrarily fast. We need to provide knobs to cap the write rate at some limit.

Jira issue: CRDB-14078

@Azhng Azhng added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 23, 2022
Azhng added a commit to Azhng/cockroach that referenced this issue Mar 24, 2022
Resolves cockroachdb#78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted.
* `sql.stats.flush.periodically_discard_when_disabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.

Release note: None
Azhng added a commit to Azhng/cockroach that referenced this issue Mar 24, 2022
Resolves cockroachdb#78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted.
* `sql.stats.flush.periodically_discard_when_disabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.

Release note: None
Azhng added a commit to Azhng/cockroach that referenced this issue Mar 25, 2022
Resolves cockroachdb#78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.

Release note: None
@blathers-crl
Copy link

blathers-crl bot commented Mar 28, 2022

Hi @maryliag, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@maryliag maryliag added the branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 label Mar 28, 2022
@maryliag maryliag removed GA-blocker branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 labels Mar 28, 2022
@Azhng Azhng changed the title sql: SQL stats should continue to collect new statistics even when flush is disabled sql: introduce knobs for SQL stats to reduce the write traffic into system tables Mar 28, 2022
Azhng added a commit to Azhng/cockroach that referenced this issue Mar 28, 2022
Resolves cockroachdb#78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted. By default
  this cluster setting is set to 0.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.
  By default, this cluster setting is set to false.

Release note: None
Azhng added a commit to Azhng/cockroach that referenced this issue Mar 28, 2022
Resolves cockroachdb#78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted. By default
  this cluster setting is set to 0.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.
  By default, this cluster setting is set to false.

This commit also updated the stmt_grouping_in_explicit_txn data driven
test to ensure the output order is deterministic.

Release note: None
craig bot pushed a commit that referenced this issue Mar 29, 2022
75126: roachprod: move default env. vars into config.go and remove the duplicates r=srosenberg a=srosenberg

Both, COCKROACH_ENABLE_RPC_COMPRESSION=false and
COCKROACH_UI_RELEASE_NOTES_SIGNUP_DISMISSED=true are passed into ENV_VARS of
the systemd startup script (start.sh) by default.
The same behavior is preserved _modulo_ duplication.
Since roachprod can now be initialized either via CLI or API,
default env. vars have been consolidated into config.go (used by both).

Release note: None

78333: sql: record index usage stats in index joins r=maryliag,Azhng a=xinhaoz

Fixes #76173

Previously, the construction of index joins skipped the
recording of the  primary key for index usage stats.
This commit records the use of the primary key if the
statement is not of the EXPLAIN variety.

Release note (bug fix): Index usage stats are now properly
captured for index joins.

----------------------------------
Example: We should expect to see a read from the pkey after an index join.
```
root@:26257/defaultdb> CREATE TABLE test (c1 INT, c2 INT, c3 INT, INDEX(c1, c2));
CREATE TABLE


Time: 120ms total (execution 120ms / network 0ms)

root@:26257/defaultdb> EXPLAIN select c3 from TEST WHERE c2 > 1 AND c1 = 3;
                                              info
------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • index join
  │ estimated row count: 1
  │ table: test@test_pkey
  │
  └── • scan
        estimated row count: 1 (100% of the table; stats collected 49 seconds ago)
        table: test@test_c1_c2_idx
        spans: [/3/2 - /3]

  index recommendations: 1
  1. type: index replacement
     SQL commands: CREATE INDEX ON test (c1, c2) STORING (c3); DROP INDEX test@test_c1_c2_idx;
(15 rows)


Time: 1ms total (execution 1ms / network 0ms)

root@:26257/defaultdb> select c3 from TEST WHERE c2 > 1 AND c1 = 3;
  c3
------
(0 rows)


Time: 1ms total (execution 1ms / network 0ms)

root@:26257/defaultdb> select index_name, index_type, total_reads from crdb_internal.index_usage_statistics AS us JOIN crdb_internal.table_indexes ti ON us.index_id = ti.index_id AND us.table_id = ti.descriptor_id ORDER BY total_reads asc;
    index_name   | index_type | total_reads
-----------------+------------+--------------
  test_pkey      | primary    |           1
  test_c1_c2_idx | secondary  |           1
(2 rows)


Time: 7ms total (execution 6ms / network 0ms)
```

78446: sql: new SQL Stats cluster settings to improve write traffic r=Azhng a=Azhng

Resolves #78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.

This commit also updated the stmt_grouping_in_explicit_txn data driven
test to ensure the output order is deterministic.

Release note: None

78526: ci: add scripts for bazel-based `ui test`, `ui lint` ci jobs r=rail,jsbarag a=rickystewart

These scripts just do the same logic that `dev` does for the same
functions.

Release note: None

Co-authored-by: Stan Rosenberg <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Azhng <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@craig craig bot closed this as completed in f996b30 Mar 29, 2022
blathers-crl bot pushed a commit that referenced this issue Mar 29, 2022
Resolves #78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted. By default
  this cluster setting is set to 0.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.
  By default, this cluster setting is set to false.

This commit also updated the stmt_grouping_in_explicit_txn data driven
test to ensure the output order is deterministic.

Release note: None
blathers-crl bot pushed a commit that referenced this issue Mar 29, 2022
Resolves #78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted. By default
  this cluster setting is set to 0.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.
  By default, this cluster setting is set to false.

This commit also updated the stmt_grouping_in_explicit_txn data driven
test to ensure the output order is deterministic.

Release note: None
Azhng added a commit to Azhng/cockroach that referenced this issue Mar 29, 2022
Resolves cockroachdb#78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted. By default
  this cluster setting is set to 0.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.
  By default, this cluster setting is set to false.

This commit also updated the stmt_grouping_in_explicit_txn data driven
test to ensure the output order is deterministic.

Release note: None
fqazi pushed a commit to fqazi/cockroach that referenced this issue Apr 4, 2022
Resolves cockroachdb#78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted. By default
  this cluster setting is set to 0.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.
  By default, this cluster setting is set to false.

This commit also updated the stmt_grouping_in_explicit_txn data driven
test to ensure the output order is deterministic.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants