-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
117677: server: wait for all KV nodes to observe tenant stop request r=stevendanna a=stevendanna In order to make it possible for a caller to know that no new SQL requests will be served by a tenant after a STOP command, this PR adds a new STOPPING state. While nodes are in the STOPPING state, KV requests will be rejected. We then wait for all nodes to observe the STOPPING state before transitioning the cluster to NONE. Epic: none Release note: None 118504: roachtest: create secure clusters by default r=renatolabs a=DarrylWong Previously, the default for roachtests was to create insecure clusters. This change now switches the default to create secure clusters, allowing tests to opt out as needed. To support this, the following changes were made: 1. Refactor PGUrl functions to use PGURLOptions instead of tenant/SQLInstance, which were usually not specified anyway. This change allows up to add an authMode option, which we currently only use to specify root user authentication but in the future can use to allow for non root authentication. 2. Add helpers to extract session ID and create an HTTP Client with the extracted session ID. This lets us create HTTP Requests to secure clusters. 3. Explicitly specify a connection string with a root certificate or a certs directory when connecting to a cluster. If not specified, the connection will be rejected as the default behavior is to authenticate with root and no certificates. Many tests already did this through specifying `{pgurl}`. Follow up work would be to start using non root user everywhere now that we are running on secure clusters and have the ability to easily generate non root urls. Release note: None Epic: None Informs: #63145 119007: catalog: fix tracking of id and name state r=rafiss a=rafiss This allows us to re-enable the COMMENT ON statements in the schemachange workload. The previous code only updated the entries that were already in the byIDState map. However, some descriptor IDs may not be in that map, so instead we should add everything we just read into the map. fixes #116795 Release note (bug fix): Fixed a bug where COMMENT statements could fail with an "unexpected value" error if multiple COMMENT statements were running concurrently. 119012: technotes: add tech notes for sql statistics r=maryliag a=maryliag Add the tech notes of the values generated for statement and transaction statistics. Part Of CRDB-35839 Release note: None Co-authored-by: Steven Danna <[email protected]> Co-authored-by: DarrylWong <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: maryliag <[email protected]>
- Loading branch information
Showing
114 changed files
with
1,462 additions
and
561 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,281 @@ | ||
# Statistics Generation | ||
Last Update: February 2024 | ||
|
||
Original author: maryliag | ||
|
||
This document details the statistics collected for both Statements and | ||
Transactions, going over all columns on system.statement_statistics and | ||
system.transaction_statistics. | ||
|
||
Table of Contents: | ||
|
||
- [Statement](#statement) | ||
- [Key Values](#key-values) | ||
- [Metadata](#metadata) | ||
- [Statistics](#statistics) | ||
- [Computed Columns](#computed-columns) | ||
- [Extra](#extra) | ||
- [Transaction](#transaction) | ||
- [Key Values](#key-values-1) | ||
- [Metadata](#metadata-1) | ||
- [Statistics](#statistics-1) | ||
- [Computed Columns](#computed-columns-1) | ||
- [Extra](#extra-1) | ||
|
||
|
||
## Statement | ||
The columns stored per statement can be divided in: key values, metadata, | ||
statistics e computed columns. | ||
|
||
### Key Values | ||
These are the values used as keys on the table and their combination will be unique | ||
for each new row. | ||
|
||
- `aggregated_ts`: the timestamp in which the statistics are being grouped. The | ||
default period is 1h, meaning all executions on 1:xx are saved with | ||
`aggregated_ts` of `1:00` and so on. Ex: `2024-02-07 15:00:00+00` | ||
- `fingerprint_id`: the fingerprint ID of the statement, which is generated by | ||
hashing the statement fingerprint (the query with the constants redacted), its | ||
database and failure status, and if it was part of an implicit transaction. Ex: | ||
`\x0a5cc4970527253a` | ||
- `transaction_fingerprint_id`: the transaction fingerprint ID of the transaction | ||
to which this statement was part of. Ex: `\xa53f79db832692e5` | ||
- `plan_hash`: the hash of the plan gist (described below). Ex: `\x92fb94a0f3ac5a36` | ||
- `app_name`: the application that ran the statement. Ex: `insights` | ||
- `node_id`: the ID of the gateway node in which the statement was executed. | ||
This value will be set to 0 if we disable the collection of gateway node with | ||
ths cluster setting `sql.metrics.statement_details.gateway_node.enabled`. | ||
Disabling this value help customers that have a lot of nodes and will generate | ||
a lot of unique rows for the same statement fingerprint. Ex: `1` | ||
|
||
### Metadata | ||
Metadata is a JSONB column with some information about the execution. The keys | ||
of this JSONB are: | ||
|
||
- `db`: the database on which the statement was executed on. Ex: `insights` | ||
- `distsql`: if was executed as a distributed SQL. Ex: `true` | ||
- `failed`: if the statement failed to be executed. Ex: `false` | ||
- `fullScan`: if the statement had a `FULL SCAN` as part of the plan. Ex: `true` | ||
- `implicitTxn`: if the statement was part of an implicit transaction. Ex: `true` | ||
- `query`: the statement itself with the parameters redacted. Ex: | ||
`SELECT balance FROM insights_workload_table_1 ORDER BY balance DESC LIMIT $1` | ||
- `querySummary`: the shorter version of the statement that is used on the | ||
Console UI, to improve readability on tables with a lot of information, such | ||
as the table on SQL Activity pages. Ex: `SELECT balance FROM insights_workload_table_1` | ||
- `stmtType`: the type of the statement[^1]. Ex: `TypeDML` | ||
- `vec`: if was vectorized. Ex: `true` | ||
|
||
### Statistics | ||
The statistics column is a JSONB that can be divided in 2 groups: executions and | ||
statistics. The majority of the numeric values on this column have its value as | ||
an object with `mean` and a `sqDiff` (square difference). | ||
|
||
The values from `execution_statistics` are only populated using sampling: | ||
|
||
- `cnt`: how many times this fingerprint was sampled on the aggregation period. | ||
This is helpful to know how many execution these statistics are based on. Ex: `3` | ||
- `contentionTime`: the time the statement was in contention with other statements. This time | ||
does not include the time it takes to stream results back to the client. | ||
Ex: `{"mean": 0.0035941523333333334, "sqDiff": 0.0000036056329785926688}` | ||
- `cpuSQLNanos`: the SQL CPU time spent executing. It does not include SQL planning time nor | ||
KV execution time. Ex: `{"mean": 193173, "sqDiff": 767794898}` | ||
- `maxDiskUsage`: the maximum disk usage. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `maxMemUsage`: the maximum memory usage. Ex: `{"mean": 2.2528E+5, "sqDiff": 0}` | ||
- `networkBytes`: the amount of data transferred over the network (e.g., between regions and | ||
nodes). If this value is 0, the statement was executed on a single node. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `networkMsgs`: the amount of network messages over the network. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `mvccIteratorStats`: statistics by the MVCC iterator. Ex: `{"blockBytes": {"mean": 0, "sqDiff": 0}, | ||
"blockBytesInCache": {"mean": 0, "sqDiff": 0}, | ||
"keyBytes": {"mean": 0, "sqDiff": 0}, | ||
"pointCount": {"mean": 1006, "sqDiff": 0}, | ||
"pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, | ||
"rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, | ||
"rangeKeyCount": {"mean": 0, "sqDiff": 0}, | ||
"rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, | ||
"seekCount": {"mean": 1E+1, "sqDiff": 0}, | ||
"seekCountInternal": {"mean": 1E+1, "sqDiff": 0}, | ||
"stepCount": {"mean": 1E+3, "sqDiff": 0}, | ||
"stepCountInternal": {"mean": 1006, "sqDiff": 0}, | ||
"valueBytes": {"mean": 149912, "sqDiff": 0}}` | ||
|
||
|
||
The values from `statistics` are updated on every execution: | ||
|
||
- `bytesRead`: the aggregation of all bytes read from disk across all operators for this | ||
statement. Ex: `{"mean": 173717.41176470587, "sqDiff": 38420.117647061816}` | ||
- `cnt`: amount of executions of this statement. Notice this value will always be greater or | ||
equal the `cnt` on `execution_statistics`. Ex: `17` | ||
- `firstAttemptCnt`: amount of first attempts of this statement. The difference between this | ||
value and the `cnt` will be the amount of retries. Ex: `17` | ||
- `idleLat`: the time between statement executions in a transaction. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `indexes`: the list of indexes used by this statement. The format will be `tableID@indexID`. | ||
Ex: `["115@1"]` | ||
- `lastErrorCode`: the last error code if the statement failed. Ex: `XXUUU` | ||
- `lastExecAt`: the timestamp of the last execution withing this aggregated timestamp of the | ||
statement. Ex: `2024-02-07T15:37:58.201843Z` | ||
- `latencyInfo`: information about latency. The values for `max` and `min` are based on all | ||
execution of the statement within the aggregated timestamp, and the percentiles are sampled | ||
and can include a period longer than the aggregated timestamp of the statement. | ||
Ex: `{"max": 2.82403375, "min": 0.002160625, "p50": 0.007772875, "p90": 0.007772875, "p99": 0.007772875}` | ||
- `maxRetries`: amount of retries. Ex: `0` | ||
- `nodes`: a list of all nodes that were part of the statement execution. Note that even if we | ||
disable the gateway node column, it will still show the value here. Ex: `[1]` | ||
- `numRows`: number of rows affected by this statement. Ex: `{"mean": 365.70588235294116, "sqDiff": 1104041.5294117648}` | ||
- `ovhLat`: overhead latency, includes transaction and retry management, error checking, etc. | ||
Ex: `{"mean": 0.000003968117647044402, "sqDiff": 2.116522637650619E-10}` | ||
- `parseLat`: the time to transform the SQL string into an abstract syntax tree (AST). | ||
Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `planGists`: the plan gist used by the statement. Plan gist is a fingerprint for a query | ||
plan, representing the logical query plan with items like constants removed and table names | ||
replaced with table IDs. A plan gist can be decoded to a plan. Ex: `["AgHmAQIAAgAAABgGAg=="]` | ||
- `planLat`: the time to transform the AST into a logical query plan. | ||
Ex: `{"mean": 0.0005751862941176471, "sqDiff": 0.00003654328361436952}` | ||
- `regions`: all regions in which the statement was executed. Ex: `["us-east1"]` | ||
- `rowsRead`: the amount of rows read by the statement. Ex: `{"mean": 1E+3, "sqDiff": 0}` | ||
- `rowsWritten`: the amount of rows written by the statement. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `runLat`: the time to run the statement and fetch or compute the result rows. | ||
Ex: `{"mean": 0.37770100005882346, "sqDiff": 9.522893593797042}` | ||
- `svcLat`: the time to service the query, from start of parse to end of execute. | ||
Ex: `{"mean": 0.37828015447058816, "sqDiff": 9.518556061599973}` | ||
|
||
|
||
There is also a value for `index_recommendations` as a array of strings, that exists both as | ||
part of the `statistics` column and as its own column to improve on `select`s by this value. | ||
Ex: `{"creation : CREATE INDEX ON insights.public.insights_workload_table_1 (balance DESC);"}` | ||
|
||
|
||
|
||
### Computed columns | ||
These are values that exist on the statistics column, but are computed again | ||
here to be their own columns. This is helpful so indexes can be created | ||
directly on these columns, which is helpful when running `select`s with filters | ||
on these values. All the indexes filter out `app_name` that start with `$ internal`. | ||
The definition of those values (when not described) are the | ||
same from the items with the same name listed above. | ||
|
||
- `indexes_usage`: this column have a `INVERTED INDEX`. Ex: `["115@1"]` | ||
- `execution_count`. Ex: `17` | ||
- `service_latency`. Ex: `0.37828015447058816` | ||
- `cpu_sql_nanos`. Ex: `193173` | ||
- `contention_time`. Ex: `0.0035941523333333334` | ||
- `p99_latency`. Ex: `0.007772875` | ||
- `total_estimated_execution_time`: the value os the count multiplied by the | ||
mean service latency. Ex: `6.430762625999999` | ||
|
||
### Extra | ||
There are a few other columns that don't qualify as any of the categories above. | ||
|
||
- `agg_interval`: the interval used to aggregate the executions. Ex: `01:00:00` | ||
- `plan`: a sampled plan. As of 23.1+ this column is no longer populated to improve | ||
performance. The value can be populated again by enabling the cluster setting | ||
`sql.metrics.statement_details.plan_collection.enabled`. Ex: `{"Children": [], "Name": ""}` (with sampling disabled) | ||
|
||
## Transaction | ||
The columns stored per transaction can be divided in: key values, metadata, | ||
statistics e computed columns. | ||
|
||
### Key Values | ||
These are the values used as keys on the table and their combination will be unique | ||
for each new row. | ||
|
||
- `aggregated_ts`: the timestamp in which the statistics are being grouped. The | ||
default period is 1h, meaning all executions on 1:xx are saved with | ||
`aggregated_ts` of `1:00` and so on. Ex: `2024-02-06 18:00:00+00` | ||
- `fingerprint_id`: the fingerprint ID of the transaction, which is generated | ||
by hashing the individual statement fingerprint IDs that comprised the | ||
transaction. Ex: `\xab97426ea38ee52d` | ||
- `app_name`: the application the ran the transaction. Ex: `$ cockroach demo` | ||
- `node_id`: the ID of the gateway node in which the statement was executed. | ||
This value will be set to 0 if we disable the collection of gateway node with | ||
ths cluster setting `sql.metrics.statement_details.gateway_node.enabled`. | ||
Disabling this value help customers that have a lot of nodes and will generate | ||
a lot of unique rows for the same statement fingerprint. Ex: `1` | ||
|
||
|
||
### Metadata | ||
Metadata is a JSONB column with some information about the execution. | ||
|
||
- `stmtFingerprintIDs`: a list of all statement fingerprint IDs that were part of | ||
this transaction. We limit the list size with the cluster setting | ||
`sql.metrics.transaction_details.max_statement_ids` (default 1000). Ex: `["ab75c017ed028514"]` | ||
|
||
|
||
### Statistics | ||
The statistics column is a JSONB that can be divided in 2 groups: executions and | ||
statistics. The majority of the numeric values on this column have its value as | ||
an object with `mean` and a `sqDiff` (square difference). | ||
|
||
The values from `execution_statistics` are only populated using sampling: | ||
|
||
- `cnt`: how many times this fingerprint was sampled on the aggregation period. | ||
This is helpful to know how many execution these statistics are based on. Ex: `1` | ||
- `contentionTime`: the total time that all the statements of this transaction were in | ||
contention with other statements. This time does not include the time it takes to stream | ||
results back to the client. | ||
Ex: `{"mean": 0.0035941523333333334, "sqDiff": 0.0000036056329785926688}` | ||
- `cpuSQLNanos`: the SQL CPU time spent by all statements of this transaction executing. It | ||
does not include SQL planning time nor KV execution time. Ex: `{"mean": 193173, "sqDiff": 767794898}` | ||
- `maxDiskUsage`: the maximum disk usage. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `maxMemUsage`: the maximum memory usage. Ex: `{"mean": 2.2528E+5, "sqDiff": 0}` | ||
- `networkBytes`: the amount of data transferred over the network (e.g., between regions and | ||
nodes). If this value is 0, the transaction was executed on a single node. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `networkMsgs`: the amount of network messages over the network. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `mvccIteratorStats`: statistics by the MVCC iterator. Ex: `{"blockBytes": {"mean": 0, "sqDiff": 0}, | ||
"blockBytesInCache": {"mean": 0, "sqDiff": 0}, | ||
"keyBytes": {"mean": 0, "sqDiff": 0}, | ||
"pointCount": {"mean": 1006, "sqDiff": 0}, | ||
"pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, | ||
"rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, | ||
"rangeKeyCount": {"mean": 0, "sqDiff": 0}, | ||
"rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, | ||
"seekCount": {"mean": 1E+1, "sqDiff": 0}, | ||
"seekCountInternal": {"mean": 1E+1, "sqDiff": 0}, | ||
"stepCount": {"mean": 1E+3, "sqDiff": 0}, | ||
"stepCountInternal": {"mean": 1006, "sqDiff": 0}, | ||
"valueBytes": {"mean": 149912, "sqDiff": 0}}` | ||
|
||
|
||
The values from `statistics` are updated on every execution: | ||
|
||
|
||
- `bytesRead`: the aggregation of all bytes read from disk across all operators for this | ||
transaction. Ex: `{"mean": 173717.41176470587, "sqDiff": 38420.117647061816}` | ||
- `cnt`: amount of executions of this transaction. Notice this value will always be greater or | ||
equal the `cnt` on `execution_statistics`. Ex: `17` | ||
- `commitLat`: the time to commit the transaction. Ex: `{"mean": 0.0000020357142857142854, "sqDiff": 1.8496728571428566E-12}` | ||
- `idleLat`: the time between statement executions of the transaction. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `maxRetries`: amount of retries. Ex: `0` | ||
- `numRows`: number of rows affected by this transaction. Ex: `{"mean": 365.70588235294116, "sqDiff": 1104041.5294117648}` | ||
- `retryLat`: the time spent retrying the transaction. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `rowsRead`: the amount of rows read by the transaction. Ex: `{"mean": 1E+3, "sqDiff": 0}` | ||
- `rowsWritten`: the amount of rows written by the transaction. Ex: `{"mean": 0, "sqDiff": 0}` | ||
- `svcLat`: the time to service the transaction, from start of parse to end of execute. | ||
Ex: `{"mean": 0.37828015447058816, "sqDiff": 9.518556061599973}` | ||
|
||
### Computed columns | ||
These are values that exist on the statistics column, but are computed again | ||
here to be their own columns. This is helpful so indexes can be created | ||
directly on these columns, which is helpful when running `select`s with filters | ||
on these values. All the indexes filter out `app_name` that start with `$ internal`. | ||
The definition of those values (when not described) are the | ||
same from the items with the same name listed above. | ||
|
||
- `execution_count`. Ex: `6` | ||
- `service_latency`. Ex: `0.049141375` | ||
- `cpu_sql_nanos`. Ex: `577084` | ||
- `contention_time`. Ex: `0` | ||
- `p99_latency`. Ex: `0` | ||
- `total_estimated_execution_time`: the value os the count multiplied by the | ||
mean service latency. Ex: `0.049141375` | ||
|
||
### Extra | ||
There is another column that don't qualify as any of the categories above. | ||
|
||
- `agg_interval`: the interval used to aggregate the executions. Ex: `01:00:00` | ||
|
||
|
||
[^1]: A statement type can be: | ||
1. DDL (Data Definition Language): deals with database schemas and descriptions. | ||
2. DML (Data Manipulation Language) deals with data manipulation and it is used to store, modify, retrieve, delete and update data in a database. | ||
3. DCL (Data Control Language) deals with commands such as GRANT and mostly concerned with rights, permissions and other controls of the database system. | ||
4. TCL (Transaction Control Language) deals with a transaction within a database. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.