Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: index usage stats doesn't capture pk scan during index join #76173

Closed
Azhng opened this issue Feb 7, 2022 · 3 comments · Fixed by #78333
Closed

sql: index usage stats doesn't capture pk scan during index join #76173

Azhng opened this issue Feb 7, 2022 · 3 comments · Fixed by #78333
Assignees
Labels
A-sql-observability Related to observability of the SQL layer C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@Azhng
Copy link
Contributor

Azhng commented Feb 7, 2022

Suppose we have a plan as follow:

root@:26257/defaultdb> explain SELECT c1,c2,c3
    FROM scandirection
    WHERE big_id BETWEEN 10 and 20;
                                         info
--------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • index join
  │ estimated row count: 0
  │ table: scandirection@primary
  │
  └── • scan
        estimated row count: 0 (<0.01% of the table; stats collected 23 minutes ago)
        table: scandirection@idx_rev
        spans: [/20 - /10]

The index usage virtual table doesn't reflect the scan on the pk

root@:26257/defaultdb> SELECT ti.descriptor_name AS table_name, ti.index_name, total_reads, last_read
FROM crdb_internal.index_usage_statistics AS us
JOIN crdb_internal.table_indexes ti ON us.index_id = ti.index_id AND us.table_id = ti.descriptor_id
ORDER BY total_reads ASC;
   table_name   | index_name | total_reads |           last_read
----------------+------------+-------------+--------------------------------
  scandirection | primary    |           0 | NULL
  b2            | idx_big_id |           0 | NULL
  scandirection | idx_fw     |           0 | NULL
  contend       | primary    |           0 | NULL
  u             | primary    |           0 | NULL
  b1            | primary    |           0 | NULL
  b2            | primary    |           0 | NULL
  o             | primary    |           0 | NULL
  order         | primary    |           0 | NULL
  a             | primary    |           0 | NULL
  scandirection | idx_rev    |        2001 | 2022-02-07 17:01:01.901473+00

Jira issue: CRDB-13010

@Azhng Azhng added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-observability Related to observability of the SQL layer T-sql-observability labels Feb 7, 2022
@Azhng
Copy link
Contributor Author

Azhng commented Feb 7, 2022

Seems like during ConstructIndexJoin, the Scan operation on the Pk is planned directly without going through the ConstructScan method, (where the index usage is captured).

This means that we are definitely missing the usage stats during index join.

@glennfawcett
Copy link

Additionally, the total_reads column is useful but what I need even more is the total_read_rows to show how many rows were retrieved by for each index and pk.

@kevin-v-ngo
Copy link

Additionally, the total_reads column is useful but what I need even more is the total_read_rows to show how many rows were retrieved by for each index and pk.

Thanks for the feedback @glennfawcett! This is an improvement we're tracking. To help understand the scenario a bit more, what would you do/action you'd take knowing the number of rows retrieved for a particular index?

@xinhaoz xinhaoz assigned xinhaoz and unassigned THardy98 Mar 22, 2022
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Mar 23, 2022
Fixes cockroachdb#76173

Previously, the construction of index joins skipped the
recording of the  primary key for index usage stats.
This commit records the use of the primary key if the
statement is not of the EXPLAIN variety.

Release note (bug fix): Index usage stats are now properly
captured for index joins.
craig bot pushed a commit that referenced this issue Mar 29, 2022
75126: roachprod: move default env. vars into config.go and remove the duplicates r=srosenberg a=srosenberg

Both, COCKROACH_ENABLE_RPC_COMPRESSION=false and
COCKROACH_UI_RELEASE_NOTES_SIGNUP_DISMISSED=true are passed into ENV_VARS of
the systemd startup script (start.sh) by default.
The same behavior is preserved _modulo_ duplication.
Since roachprod can now be initialized either via CLI or API,
default env. vars have been consolidated into config.go (used by both).

Release note: None

78333: sql: record index usage stats in index joins r=maryliag,Azhng a=xinhaoz

Fixes #76173

Previously, the construction of index joins skipped the
recording of the  primary key for index usage stats.
This commit records the use of the primary key if the
statement is not of the EXPLAIN variety.

Release note (bug fix): Index usage stats are now properly
captured for index joins.

----------------------------------
Example: We should expect to see a read from the pkey after an index join.
```
root@:26257/defaultdb> CREATE TABLE test (c1 INT, c2 INT, c3 INT, INDEX(c1, c2));
CREATE TABLE


Time: 120ms total (execution 120ms / network 0ms)

root@:26257/defaultdb> EXPLAIN select c3 from TEST WHERE c2 > 1 AND c1 = 3;
                                              info
------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • index join
  │ estimated row count: 1
  │ table: test@test_pkey
  │
  └── • scan
        estimated row count: 1 (100% of the table; stats collected 49 seconds ago)
        table: test@test_c1_c2_idx
        spans: [/3/2 - /3]

  index recommendations: 1
  1. type: index replacement
     SQL commands: CREATE INDEX ON test (c1, c2) STORING (c3); DROP INDEX test@test_c1_c2_idx;
(15 rows)


Time: 1ms total (execution 1ms / network 0ms)

root@:26257/defaultdb> select c3 from TEST WHERE c2 > 1 AND c1 = 3;
  c3
------
(0 rows)


Time: 1ms total (execution 1ms / network 0ms)

root@:26257/defaultdb> select index_name, index_type, total_reads from crdb_internal.index_usage_statistics AS us JOIN crdb_internal.table_indexes ti ON us.index_id = ti.index_id AND us.table_id = ti.descriptor_id ORDER BY total_reads asc;
    index_name   | index_type | total_reads
-----------------+------------+--------------
  test_pkey      | primary    |           1
  test_c1_c2_idx | secondary  |           1
(2 rows)


Time: 7ms total (execution 6ms / network 0ms)
```

78446: sql: new SQL Stats cluster settings to improve write traffic r=Azhng a=Azhng

Resolves #78339

Previously, SQL Stats flushed to system table as soon as the in-memory
buffer is full. This means  the size of the system tables that back SQL
Stats could grow faster than the cleanup job. Additionally, when the
SQL Stats flush is disabled, the SQL Stats is unable to collect any more
new statement / transaction statistics when the in-memory store is full.

This commit introduces two non-public cluster settings:
* `sql.stats.flush.minimum_interval`: this setting limits minimum interval
  between each flush operation. If a flush operation is triggered sooner
  than what is allowed by the minimum interval, (e.g. when the in-memory
  SQL Stats store is full), the flush operation is aborted.
* `sql.stats.flush.force_cleanup.enabled`: which allows the
  in-memory SQL Stats to be cleared at the interval specified by
  `sql.stats.flush.interval`, even if the SQL Stats flush is disabled.

This commit also updated the stmt_grouping_in_explicit_txn data driven
test to ensure the output order is deterministic.

Release note: None

78526: ci: add scripts for bazel-based `ui test`, `ui lint` ci jobs r=rail,jsbarag a=rickystewart

These scripts just do the same logic that `dev` does for the same
functions.

Release note: None

Co-authored-by: Stan Rosenberg <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Azhng <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@craig craig bot closed this as completed in a57aa7c Mar 29, 2022
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Apr 1, 2022
Fixes cockroachdb#76173

Previously, the construction of index joins skipped the
recording of the  primary key for index usage stats.
This commit records the use of the primary key if the
statement is not of the EXPLAIN variety.

Release note (bug fix): Index usage stats are now properly
captured for index joins.
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Apr 1, 2022
Fixes cockroachdb#76173

Previously, the construction of index joins skipped the
recording of the  primary key for index usage stats.
This commit records the use of the primary key if the
statement is not of the EXPLAIN variety.

Release note (bug fix): Index usage stats are now properly
captured for index joins.
fqazi pushed a commit to fqazi/cockroach that referenced this issue Apr 4, 2022
Fixes cockroachdb#76173

Previously, the construction of index joins skipped the
recording of the  primary key for index usage stats.
This commit records the use of the primary key if the
statement is not of the EXPLAIN variety.

Release note (bug fix): Index usage stats are now properly
captured for index joins.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-observability Related to observability of the SQL layer C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants