Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
117677: server: wait for all KV nodes to observe tenant stop request r=stevendanna a=stevendanna

In order to make it possible for a caller to know that no new SQL
requests will be served by a tenant after a STOP command, this PR adds
a new STOPPING state. While nodes are in the STOPPING state, KV
requests will be rejected. We then wait for all nodes to observe the
STOPPING state before transitioning the cluster to NONE.

Epic: none

Release note: None

118504: roachtest: create secure clusters by default r=renatolabs a=DarrylWong

Previously, the default for roachtests was to create insecure clusters. This change now switches the default to create secure clusters, allowing tests to opt out as needed. To support this, the following changes were made:

1. Refactor PGUrl functions to use PGURLOptions instead of tenant/SQLInstance, which were usually not specified anyway. This change allows up to add an authMode option, which we currently only use to specify root user authentication but in the future can use to allow for non root authentication.
2. Add helpers to extract session ID and create an HTTP Client with the extracted session ID. This lets us create HTTP Requests to secure clusters.
3. Explicitly specify a connection string with a root certificate or a certs directory when connecting to a cluster. If not specified, the connection will be rejected as the default behavior is to authenticate with root and no certificates. Many tests already did this through specifying `{pgurl}`.

Follow up work would be to start using non root user everywhere now that we are running on secure clusters and have the ability to easily generate non root urls.

Release note: None
Epic: None
Informs: #63145

119007: catalog: fix tracking of id and name state r=rafiss a=rafiss

This allows us to re-enable the COMMENT ON statements in the
schemachange workload.

The previous code only updated the entries that were already in the
byIDState map. However, some descriptor IDs may not be in that map, so
instead we should add everything we just read into the map.

fixes #116795
Release note (bug fix): Fixed a bug where COMMENT statements could
fail with an "unexpected value" error if multiple COMMENT statements
were running concurrently.


119012: technotes: add tech notes for sql statistics r=maryliag a=maryliag

Add the tech notes of the values generated for statement and transaction statistics.

Part Of CRDB-35839

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: DarrylWong <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: maryliag <[email protected]>
  • Loading branch information
5 people committed Feb 9, 2024
5 parents c954900 + e9c00be + 7c3d18a + 978a826 + c2df6bf commit 5c84682
Show file tree
Hide file tree
Showing 114 changed files with 1,462 additions and 561 deletions.
76 changes: 76 additions & 0 deletions docs/generated/http/full.md
Original file line number Diff line number Diff line change
Expand Up @@ -1629,6 +1629,82 @@ Tier represents one level of the locality hierarchy.



## TenantServiceStatus

`GET /_status/tenant_service_status`

TenantServiceStatus returns the current service and data state of
the given tenant as known to the server orchestrator, which may
differ from the database state.

Support status: [reserved](#support-status)

#### Request Parameters







| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| node_id | [string](#cockroach.server.serverpb.TenantServiceStatusRequest-string) | | | [reserved](#support-status) |
| tenant_id | [uint64](#cockroach.server.serverpb.TenantServiceStatusRequest-uint64) | | | [reserved](#support-status) |







#### Response Parameters







| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| status_by_node_id | [TenantServiceStatusResponse.StatusByNodeIdEntry](#cockroach.server.serverpb.TenantServiceStatusResponse-cockroach.server.serverpb.TenantServiceStatusResponse.StatusByNodeIdEntry) | repeated | | [reserved](#support-status) |
| errors_by_node_id | [TenantServiceStatusResponse.ErrorsByNodeIdEntry](#cockroach.server.serverpb.TenantServiceStatusResponse-cockroach.server.serverpb.TenantServiceStatusResponse.ErrorsByNodeIdEntry) | repeated | | [reserved](#support-status) |






<a name="cockroach.server.serverpb.TenantServiceStatusResponse-cockroach.server.serverpb.TenantServiceStatusResponse.StatusByNodeIdEntry"></a>
#### TenantServiceStatusResponse.StatusByNodeIdEntry



| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| key | [int32](#cockroach.server.serverpb.TenantServiceStatusResponse-int32) | | | |
| value | [cockroach.multitenant.SQLInfo](#cockroach.server.serverpb.TenantServiceStatusResponse-cockroach.multitenant.SQLInfo) | | | |





<a name="cockroach.server.serverpb.TenantServiceStatusResponse-cockroach.server.serverpb.TenantServiceStatusResponse.ErrorsByNodeIdEntry"></a>
#### TenantServiceStatusResponse.ErrorsByNodeIdEntry



| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| key | [int32](#cockroach.server.serverpb.TenantServiceStatusResponse-int32) | | | |
| value | [string](#cockroach.server.serverpb.TenantServiceStatusResponse-string) | | | |






## TenantRanges

`GET /_status/tenant_ranges`
Expand Down
281 changes: 281 additions & 0 deletions docs/tech-notes/observability/statistics_generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
# Statistics Generation
Last Update: February 2024

Original author: maryliag

This document details the statistics collected for both Statements and
Transactions, going over all columns on system.statement_statistics and
system.transaction_statistics.

Table of Contents:

- [Statement](#statement)
- [Key Values](#key-values)
- [Metadata](#metadata)
- [Statistics](#statistics)
- [Computed Columns](#computed-columns)
- [Extra](#extra)
- [Transaction](#transaction)
- [Key Values](#key-values-1)
- [Metadata](#metadata-1)
- [Statistics](#statistics-1)
- [Computed Columns](#computed-columns-1)
- [Extra](#extra-1)


## Statement
The columns stored per statement can be divided in: key values, metadata,
statistics e computed columns.

### Key Values
These are the values used as keys on the table and their combination will be unique
for each new row.

- `aggregated_ts`: the timestamp in which the statistics are being grouped. The
default period is 1h, meaning all executions on 1:xx are saved with
`aggregated_ts` of `1:00` and so on. Ex: `2024-02-07 15:00:00+00`
- `fingerprint_id`: the fingerprint ID of the statement, which is generated by
hashing the statement fingerprint (the query with the constants redacted), its
database and failure status, and if it was part of an implicit transaction. Ex:
`\x0a5cc4970527253a`
- `transaction_fingerprint_id`: the transaction fingerprint ID of the transaction
to which this statement was part of. Ex: `\xa53f79db832692e5`
- `plan_hash`: the hash of the plan gist (described below). Ex: `\x92fb94a0f3ac5a36`
- `app_name`: the application that ran the statement. Ex: `insights`
- `node_id`: the ID of the gateway node in which the statement was executed.
This value will be set to 0 if we disable the collection of gateway node with
ths cluster setting `sql.metrics.statement_details.gateway_node.enabled`.
Disabling this value help customers that have a lot of nodes and will generate
a lot of unique rows for the same statement fingerprint. Ex: `1`

### Metadata
Metadata is a JSONB column with some information about the execution. The keys
of this JSONB are:

- `db`: the database on which the statement was executed on. Ex: `insights`
- `distsql`: if was executed as a distributed SQL. Ex: `true`
- `failed`: if the statement failed to be executed. Ex: `false`
- `fullScan`: if the statement had a `FULL SCAN` as part of the plan. Ex: `true`
- `implicitTxn`: if the statement was part of an implicit transaction. Ex: `true`
- `query`: the statement itself with the parameters redacted. Ex:
`SELECT balance FROM insights_workload_table_1 ORDER BY balance DESC LIMIT $1`
- `querySummary`: the shorter version of the statement that is used on the
Console UI, to improve readability on tables with a lot of information, such
as the table on SQL Activity pages. Ex: `SELECT balance FROM insights_workload_table_1`
- `stmtType`: the type of the statement[^1]. Ex: `TypeDML`
- `vec`: if was vectorized. Ex: `true`

### Statistics
The statistics column is a JSONB that can be divided in 2 groups: executions and
statistics. The majority of the numeric values on this column have its value as
an object with `mean` and a `sqDiff` (square difference).

The values from `execution_statistics` are only populated using sampling:

- `cnt`: how many times this fingerprint was sampled on the aggregation period.
This is helpful to know how many execution these statistics are based on. Ex: `3`
- `contentionTime`: the time the statement was in contention with other statements. This time
does not include the time it takes to stream results back to the client.
Ex: `{"mean": 0.0035941523333333334, "sqDiff": 0.0000036056329785926688}`
- `cpuSQLNanos`: the SQL CPU time spent executing. It does not include SQL planning time nor
KV execution time. Ex: `{"mean": 193173, "sqDiff": 767794898}`
- `maxDiskUsage`: the maximum disk usage. Ex: `{"mean": 0, "sqDiff": 0}`
- `maxMemUsage`: the maximum memory usage. Ex: `{"mean": 2.2528E+5, "sqDiff": 0}`
- `networkBytes`: the amount of data transferred over the network (e.g., between regions and
nodes). If this value is 0, the statement was executed on a single node. Ex: `{"mean": 0, "sqDiff": 0}`
- `networkMsgs`: the amount of network messages over the network. Ex: `{"mean": 0, "sqDiff": 0}`
- `mvccIteratorStats`: statistics by the MVCC iterator. Ex: `{"blockBytes": {"mean": 0, "sqDiff": 0},
"blockBytesInCache": {"mean": 0, "sqDiff": 0},
"keyBytes": {"mean": 0, "sqDiff": 0},
"pointCount": {"mean": 1006, "sqDiff": 0},
"pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0},
"rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0},
"rangeKeyCount": {"mean": 0, "sqDiff": 0},
"rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0},
"seekCount": {"mean": 1E+1, "sqDiff": 0},
"seekCountInternal": {"mean": 1E+1, "sqDiff": 0},
"stepCount": {"mean": 1E+3, "sqDiff": 0},
"stepCountInternal": {"mean": 1006, "sqDiff": 0},
"valueBytes": {"mean": 149912, "sqDiff": 0}}`


The values from `statistics` are updated on every execution:

- `bytesRead`: the aggregation of all bytes read from disk across all operators for this
statement. Ex: `{"mean": 173717.41176470587, "sqDiff": 38420.117647061816}`
- `cnt`: amount of executions of this statement. Notice this value will always be greater or
equal the `cnt` on `execution_statistics`. Ex: `17`
- `firstAttemptCnt`: amount of first attempts of this statement. The difference between this
value and the `cnt` will be the amount of retries. Ex: `17`
- `idleLat`: the time between statement executions in a transaction. Ex: `{"mean": 0, "sqDiff": 0}`
- `indexes`: the list of indexes used by this statement. The format will be `tableID@indexID`.
Ex: `["115@1"]`
- `lastErrorCode`: the last error code if the statement failed. Ex: `XXUUU`
- `lastExecAt`: the timestamp of the last execution withing this aggregated timestamp of the
statement. Ex: `2024-02-07T15:37:58.201843Z`
- `latencyInfo`: information about latency. The values for `max` and `min` are based on all
execution of the statement within the aggregated timestamp, and the percentiles are sampled
and can include a period longer than the aggregated timestamp of the statement.
Ex: `{"max": 2.82403375, "min": 0.002160625, "p50": 0.007772875, "p90": 0.007772875, "p99": 0.007772875}`
- `maxRetries`: amount of retries. Ex: `0`
- `nodes`: a list of all nodes that were part of the statement execution. Note that even if we
disable the gateway node column, it will still show the value here. Ex: `[1]`
- `numRows`: number of rows affected by this statement. Ex: `{"mean": 365.70588235294116, "sqDiff": 1104041.5294117648}`
- `ovhLat`: overhead latency, includes transaction and retry management, error checking, etc.
Ex: `{"mean": 0.000003968117647044402, "sqDiff": 2.116522637650619E-10}`
- `parseLat`: the time to transform the SQL string into an abstract syntax tree (AST).
Ex: `{"mean": 0, "sqDiff": 0}`
- `planGists`: the plan gist used by the statement. Plan gist is a fingerprint for a query
plan, representing the logical query plan with items like constants removed and table names
replaced with table IDs. A plan gist can be decoded to a plan. Ex: `["AgHmAQIAAgAAABgGAg=="]`
- `planLat`: the time to transform the AST into a logical query plan.
Ex: `{"mean": 0.0005751862941176471, "sqDiff": 0.00003654328361436952}`
- `regions`: all regions in which the statement was executed. Ex: `["us-east1"]`
- `rowsRead`: the amount of rows read by the statement. Ex: `{"mean": 1E+3, "sqDiff": 0}`
- `rowsWritten`: the amount of rows written by the statement. Ex: `{"mean": 0, "sqDiff": 0}`
- `runLat`: the time to run the statement and fetch or compute the result rows.
Ex: `{"mean": 0.37770100005882346, "sqDiff": 9.522893593797042}`
- `svcLat`: the time to service the query, from start of parse to end of execute.
Ex: `{"mean": 0.37828015447058816, "sqDiff": 9.518556061599973}`


There is also a value for `index_recommendations` as a array of strings, that exists both as
part of the `statistics` column and as its own column to improve on `select`s by this value.
Ex: `{"creation : CREATE INDEX ON insights.public.insights_workload_table_1 (balance DESC);"}`



### Computed columns
These are values that exist on the statistics column, but are computed again
here to be their own columns. This is helpful so indexes can be created
directly on these columns, which is helpful when running `select`s with filters
on these values. All the indexes filter out `app_name` that start with `$ internal`.
The definition of those values (when not described) are the
same from the items with the same name listed above.

- `indexes_usage`: this column have a `INVERTED INDEX`. Ex: `["115@1"]`
- `execution_count`. Ex: `17`
- `service_latency`. Ex: `0.37828015447058816`
- `cpu_sql_nanos`. Ex: `193173`
- `contention_time`. Ex: `0.0035941523333333334`
- `p99_latency`. Ex: `0.007772875`
- `total_estimated_execution_time`: the value os the count multiplied by the
mean service latency. Ex: `6.430762625999999`

### Extra
There are a few other columns that don't qualify as any of the categories above.

- `agg_interval`: the interval used to aggregate the executions. Ex: `01:00:00`
- `plan`: a sampled plan. As of 23.1+ this column is no longer populated to improve
performance. The value can be populated again by enabling the cluster setting
`sql.metrics.statement_details.plan_collection.enabled`. Ex: `{"Children": [], "Name": ""}` (with sampling disabled)

## Transaction
The columns stored per transaction can be divided in: key values, metadata,
statistics e computed columns.

### Key Values
These are the values used as keys on the table and their combination will be unique
for each new row.

- `aggregated_ts`: the timestamp in which the statistics are being grouped. The
default period is 1h, meaning all executions on 1:xx are saved with
`aggregated_ts` of `1:00` and so on. Ex: `2024-02-06 18:00:00+00`
- `fingerprint_id`: the fingerprint ID of the transaction, which is generated
by hashing the individual statement fingerprint IDs that comprised the
transaction. Ex: `\xab97426ea38ee52d`
- `app_name`: the application the ran the transaction. Ex: `$ cockroach demo`
- `node_id`: the ID of the gateway node in which the statement was executed.
This value will be set to 0 if we disable the collection of gateway node with
ths cluster setting `sql.metrics.statement_details.gateway_node.enabled`.
Disabling this value help customers that have a lot of nodes and will generate
a lot of unique rows for the same statement fingerprint. Ex: `1`


### Metadata
Metadata is a JSONB column with some information about the execution.

- `stmtFingerprintIDs`: a list of all statement fingerprint IDs that were part of
this transaction. We limit the list size with the cluster setting
`sql.metrics.transaction_details.max_statement_ids` (default 1000). Ex: `["ab75c017ed028514"]`


### Statistics
The statistics column is a JSONB that can be divided in 2 groups: executions and
statistics. The majority of the numeric values on this column have its value as
an object with `mean` and a `sqDiff` (square difference).

The values from `execution_statistics` are only populated using sampling:

- `cnt`: how many times this fingerprint was sampled on the aggregation period.
This is helpful to know how many execution these statistics are based on. Ex: `1`
- `contentionTime`: the total time that all the statements of this transaction were in
contention with other statements. This time does not include the time it takes to stream
results back to the client.
Ex: `{"mean": 0.0035941523333333334, "sqDiff": 0.0000036056329785926688}`
- `cpuSQLNanos`: the SQL CPU time spent by all statements of this transaction executing. It
does not include SQL planning time nor KV execution time. Ex: `{"mean": 193173, "sqDiff": 767794898}`
- `maxDiskUsage`: the maximum disk usage. Ex: `{"mean": 0, "sqDiff": 0}`
- `maxMemUsage`: the maximum memory usage. Ex: `{"mean": 2.2528E+5, "sqDiff": 0}`
- `networkBytes`: the amount of data transferred over the network (e.g., between regions and
nodes). If this value is 0, the transaction was executed on a single node. Ex: `{"mean": 0, "sqDiff": 0}`
- `networkMsgs`: the amount of network messages over the network. Ex: `{"mean": 0, "sqDiff": 0}`
- `mvccIteratorStats`: statistics by the MVCC iterator. Ex: `{"blockBytes": {"mean": 0, "sqDiff": 0},
"blockBytesInCache": {"mean": 0, "sqDiff": 0},
"keyBytes": {"mean": 0, "sqDiff": 0},
"pointCount": {"mean": 1006, "sqDiff": 0},
"pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0},
"rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0},
"rangeKeyCount": {"mean": 0, "sqDiff": 0},
"rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0},
"seekCount": {"mean": 1E+1, "sqDiff": 0},
"seekCountInternal": {"mean": 1E+1, "sqDiff": 0},
"stepCount": {"mean": 1E+3, "sqDiff": 0},
"stepCountInternal": {"mean": 1006, "sqDiff": 0},
"valueBytes": {"mean": 149912, "sqDiff": 0}}`


The values from `statistics` are updated on every execution:


- `bytesRead`: the aggregation of all bytes read from disk across all operators for this
transaction. Ex: `{"mean": 173717.41176470587, "sqDiff": 38420.117647061816}`
- `cnt`: amount of executions of this transaction. Notice this value will always be greater or
equal the `cnt` on `execution_statistics`. Ex: `17`
- `commitLat`: the time to commit the transaction. Ex: `{"mean": 0.0000020357142857142854, "sqDiff": 1.8496728571428566E-12}`
- `idleLat`: the time between statement executions of the transaction. Ex: `{"mean": 0, "sqDiff": 0}`
- `maxRetries`: amount of retries. Ex: `0`
- `numRows`: number of rows affected by this transaction. Ex: `{"mean": 365.70588235294116, "sqDiff": 1104041.5294117648}`
- `retryLat`: the time spent retrying the transaction. Ex: `{"mean": 0, "sqDiff": 0}`
- `rowsRead`: the amount of rows read by the transaction. Ex: `{"mean": 1E+3, "sqDiff": 0}`
- `rowsWritten`: the amount of rows written by the transaction. Ex: `{"mean": 0, "sqDiff": 0}`
- `svcLat`: the time to service the transaction, from start of parse to end of execute.
Ex: `{"mean": 0.37828015447058816, "sqDiff": 9.518556061599973}`

### Computed columns
These are values that exist on the statistics column, but are computed again
here to be their own columns. This is helpful so indexes can be created
directly on these columns, which is helpful when running `select`s with filters
on these values. All the indexes filter out `app_name` that start with `$ internal`.
The definition of those values (when not described) are the
same from the items with the same name listed above.

- `execution_count`. Ex: `6`
- `service_latency`. Ex: `0.049141375`
- `cpu_sql_nanos`. Ex: `577084`
- `contention_time`. Ex: `0`
- `p99_latency`. Ex: `0`
- `total_estimated_execution_time`: the value os the count multiplied by the
mean service latency. Ex: `0.049141375`

### Extra
There is another column that don't qualify as any of the categories above.

- `agg_interval`: the interval used to aggregate the executions. Ex: `01:00:00`


[^1]: A statement type can be:
1. DDL (Data Definition Language): deals with database schemas and descriptions.
2. DML (Data Manipulation Language) deals with data manipulation and it is used to store, modify, retrieve, delete and update data in a database.
3. DCL (Data Control Language) deals with commands such as GRANT and mostly concerned with rights, permissions and other controls of the database system.
4. TCL (Transaction Control Language) deals with a transaction within a database.
2 changes: 1 addition & 1 deletion pkg/bench/rttanalysis/testdata/benchmark_expectations
Original file line number Diff line number Diff line change
Expand Up @@ -114,5 +114,5 @@ exp,benchmark
3,UDFResolution/select_from_udf
1,VirtualTableQueries/select_crdb_internal.invalid_objects_with_1_fk
1,VirtualTableQueries/select_crdb_internal.tables_with_1_fk
11,VirtualTableQueries/virtual_table_cache_with_point_lookups
5,VirtualTableQueries/virtual_table_cache_with_point_lookups
15,VirtualTableQueries/virtual_table_cache_with_schema_change
5 changes: 4 additions & 1 deletion pkg/ccl/backupccl/backup_tenant_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,10 @@ func TestBackupTenantImportingTable(t *testing.T) {
}
// Destroy the tenant, then restore it.
tSrv.AppStopper().Stop(ctx)
if _, err := sqlDB.DB.ExecContext(ctx, "ALTER TENANT [10] STOP SERVICE; DROP TENANT [10] IMMEDIATE"); err != nil {
if _, err := sqlDB.DB.ExecContext(ctx, "ALTER TENANT [10] STOP SERVICE"); err != nil {
t.Fatal(err)
}
if _, err := sqlDB.DB.ExecContext(ctx, "DROP TENANT [10] IMMEDIATE"); err != nil {
t.Fatal(err)
}
if _, err := sqlDB.DB.ExecContext(ctx, "RESTORE TENANT 10 FROM $1", dst); err != nil {
Expand Down
Loading

0 comments on commit 5c84682

Please sign in to comment.