Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all execution stats to the cloud telemetry logging channel #84729

Open
12 of 14 tasks
kevin-v-ngo opened this issue Jul 20, 2022 · 4 comments
Open
12 of 14 tasks

Add all execution stats to the cloud telemetry logging channel #84729

kevin-v-ngo opened this issue Jul 20, 2022 · 4 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability

Comments

@kevin-v-ngo
Copy link

kevin-v-ngo commented Jul 20, 2022

We collect execution statistics and surface them in crdb_internal.statement_statistics. We should also capture these statistics in our cloud telemetry logging channel for query events. A few fields that are missing include:

  • maxDiskUsage - Maximum temporary disk usage that occurred while executing this statement. This is set in cases where a query had to spill to disk
  • networkBytes - Number of bytes sent over the network.
  • bytesRead -  Number of bytes read from disk (done as part of sql: add estimated and actual statistics to telemetry logging #85169)
  • networkMsgs - Number of messages sent over the network
  • Retries - Total number of internal retries
  • nodes - Ordered list of nodes IDs on which the statement was executed
  • numRows - Number of rows returned or observed
  • ovhLat - Difference between svcLat and the sum of parseLat+planLat+runLat latencies
  • parseLat - Time to transform the SQL string into an abstract syntax tree (AST)
  • planLat - Time to transform the AST into a logical query plan
  • rowsRead - number of rows read from disk (done as part of sql: add estimated and actual statistics to telemetry logging #85169)
  • rowsWritten - number of rows written to disk (done as part of sql: add estimated and actual statistics to telemetry logging #85169)
  • runLat - time to run the query and fetch or compute the result rows
  • svcLat - time to service the query, from start of parse to end of execute

The full list is here: https://www.cockroachlabs.com/docs/stable/crdb-internal.html#statistics-column

Related issue: #84718

Jira issue: CRDB-17832

Epic CRDB-32141

@kevin-v-ngo kevin-v-ngo added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 20, 2022
@kevin-v-ngo
Copy link
Author

Related issue: #84718

@THardy98 THardy98 self-assigned this Sep 7, 2022
THardy98 pushed a commit to THardy98/cockroach that referenced this issue Sep 7, 2022
Partially addresses: cockroachdb#84729

This change adds:
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages

fields to the `SampledQuery` telemetry log.

Release justification: low risk, high benefit changes to existing
functionality

Release note (sql change): This change adds::
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages fields to the `SampledQuery` telemetry log.
THardy98 pushed a commit to THardy98/cockroach that referenced this issue Sep 7, 2022
Partially addresses: cockroachdb#84729

This change adds:
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages

fields to the `SampledQuery` telemetry log.

Release justification: low risk, high benefit changes to existing
functionality

Release note (sql change): This change adds::
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages fields to the `SampledQuery` telemetry log.
THardy98 pushed a commit to THardy98/cockroach that referenced this issue Sep 8, 2022
Partially addresses: cockroachdb#84729

This change adds:
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages

fields to the `SampledQuery` telemetry log.

Release justification: low risk, high benefit changes to existing
functionality

Release note (sql change): This change adds::
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages fields to the `SampledQuery` telemetry log.
craig bot pushed a commit that referenced this issue Sep 8, 2022
86813: pkg/keys: implement SafeFormatter for roachpb.Key r=andreimatei a=abarganier

Currently, when a key is logged, the entirety of the pretty-printed
key is redacted, hindering observability when parsing through redacted
logs (something that will become more common with upcoming compliance
requirements).

For example, prior to this patch, a pretty-printed key would appear
in the following way for the unredacted/redacted cases, respectively:

	- unredacted: ‹/Table/42/1222/"index key"›
	- redacted:   ‹x›

This patch addresses this by implementing the SafeFormatter interface
for `roachpb.Key` and `roachpb.RKey`, yielding the following result
when looking at the same example above:

	- unredacted: /Table/42/1222/‹"index key"›
	- redacted:   /Table/42/1222/‹x›

While the index key itself remains redacted, the ability to see the
specific table, index, and in the case of tenant tables, the tenant
itself, provides much better observability into which table & index
a log line is referring to than before.

Note that this implementation is only partial. It currently only
supports keys that fall in the `/Table` keyspace for application
tenants and system tenants. Keyspaces such as Meta1, Meta2, Local,
etc. are not yet supported, but can be added with much more ease
in the future now that the example has been set.

Finally, we remove the `maxLen` and related truncation logic from
`StringWithDirs`, as this is no longer used. Furthermore, the
truncation was invalid as it could have truncated a utf-8
sequence in the wrong place, making the result invalid utf-8.

This PR is a continuation of the work originally done by `@kzh` in
#67065. See the original PR for some initial discussions. 

Release note (security update): redacted logs will now reveal
pretty-printed keys, except for the index key values themselves.
For example `/Table/42/1222/‹x›` will be shown instead of `‹x›`
(which was shown previously). This improved redaction is available
for the `/Table` keyspace for both system and application tenants.
Other keyspaces such as `/Meta1`, `/Meta2`, `/Local`, etc. are not
yet supported.

Release justification: low risk, high benefit observability changes

Addresses #86316

87494: authors: add leon.fattakhov to authors r=coolcom200 a=coolcom200

Release note: None
Release justification: non-production code change.

87516: gc: end to end integration test r=tbg a=aliher1911

Currently tests are covering various components, but there's no end to end flow test that would verify that GC will issue GC requests that would successfully remove all relevant data.
This commit adds a TestCluster based test to verify GC behaviour.

Release justification: test only change
Release note: None
Fixes #84988

87527: sql: add query level execution stats to sampled query log r=THardy98 a=THardy98

Partially addresses: #84729

This change adds:
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages

fields to the `SampledQuery` telemetry log.

Release justification: low risk, high benefit changes to existing functionality

Release note (sql change): This change adds::
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages fields to the `SampledQuery` telemetry log.

87617: workload/schemachange: propagate worker ID to log r=ajwerner a=ajwerner

The schemachange workload can run with concurrency. It logs what workers do. It was hard to understand what different workers are doing because we never populated the worker ID.

Release note: None

Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Leon Fattakhov <[email protected]>
Co-authored-by: Oleg Afanasyev <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Sep 8, 2022
Partially addresses: #84729

This change adds:
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages

fields to the `SampledQuery` telemetry log.

Release justification: low risk, high benefit changes to existing
functionality

Release note (sql change): This change adds::
- network bytes sent
- maximum memory usage
- maximum disk usage
- KV bytes read
- KV rows read
- network messages fields to the `SampledQuery` telemetry log.
@kevin-v-ngo
Copy link
Author

Retries would've been helpful to understand workload suffering from contention (write after read, read uncertainty window, closed timestamp).

@j82w
Copy link
Contributor

j82w commented May 25, 2023

All MVCC and latency info added in: #102820

@j82w
Copy link
Contributor

j82w commented May 25, 2023

Nodes needs to wait on: #102774

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability
Projects
None yet
Development

No branches or pull requests

3 participants