Skip to content

Commit

Permalink
sql: add stats to telemetry sampled query event
Browse files Browse the repository at this point in the history
This adds the MVCC stats, cpu usage, and latency information to the
telemetry sampled query event. This is needed to make it match the
data available in the system statement_statistics table.

Epic: none
Closes: #89887

Release note (sql change): Adds the MVCC stats, cpu usage, and latency
information to the telemetry sampled query event.
  • Loading branch information
j82w committed May 5, 2023
1 parent 1d8e254 commit cb69f03
Show file tree
Hide file tree
Showing 8 changed files with 488 additions and 27 deletions.
22 changes: 22 additions & 0 deletions docs/generated/eventlog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2901,6 +2901,28 @@ contains common SQL event/execution details.
| `TotalScanRowsWithoutForecastsEstimate` | Total number of rows read by all scans in the query, as estimated by the optimizer without using forecasts. | no |
| `NanosSinceStatsForecasted` | The greatest quantity of nanoseconds that have passed since the forecast time (or until the forecast time, if it is in the future, in which case it will be negative) for any table with forecasted stats scanned by this query. | no |
| `Indexes` | The list of indexes used by this query. | no |
| `CpuTimeNanos` | Collects the cumulative CPU time spent executing SQL operations in nanoseconds. Currently, it is only collected for statements without mutations that have a vectorized plan. | no |
| `KvGrpcCalls` | The number of grpc calls done to get data form KV nodes | no |
| `KvTimeNanos` | Cumulated time spent waiting for a KV request. This includes disk IO time and potentially network time (if any of the keys are not local). | no |
| `ServiceLatencyNanos` | The time to service the query, from start of parse to end of execute. | no |
| `OverheadLatencyNanos` | The difference between service latency and the sum of parse latency + plan latency + run latency . | no |
| `RunLatencyNanos` | The time to run the query and fetch or compute the result rows. | no |
| `PlanLatencyNanos` | The time to transform the AST into a logical query plan. | no |
| `IdleLatencyNanos` | The time between statement executions in a transaction | no |
| `ParseLatencyNanos` | The time to transform the SQL string into an abstract syntax tree (AST). | no |
| `MvccStepCount` | StepCount collects the number of times the iterator moved forward or backward over the DB's underlying storage keyspace. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `MvccStepCountInternal` | StepCountInternal collects the number of times the iterator moved forward or backward over LSM internal keys. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `MvccSeekCount` | SeekCount collects the number of times the iterator moved to a specific key/value pair in the DB's underlying storage keyspace. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `MvccSeekCountInternal` | SeekCountInternal collects the number of times the iterator moved to a specific LSM internal key. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `MvccBlockBytes` | BlockBytes collects the bytes in the loaded SSTable data blocks. For details, see pebble.InternalIteratorStats. | no |
| `MvccBlockBytesInCache` | BlockBytesInCache collects the subset of BlockBytes in the block cache. For details, see pebble.InternalIteratorStats. | no |
| `MvccKeyBytes` | KeyBytes collects the bytes in keys that were iterated over. For details, see pebble.InternalIteratorStats. | no |
| `MvccValueBytes` | ValueBytes collects the bytes in values that were iterated over. For details, see pebble.InternalIteratorStats. | no |
| `MvccPointCount` | PointCount collects the count of point keys iterated over. For details, see pebble.InternalIteratorStats. | no |
| `MvccPointsCoveredByRangeTombstones` | PointsCoveredByRangeTombstones collects the count of point keys that were iterated over that were covered by range tombstones. For details, see pebble.InternalIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `MvccRangeKeyCount` | RangeKeyCount collects the count of range keys encountered during iteration. For details, see pebble.RangeKeyIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `MvccRangeKeyContainedPoints` | RangeKeyContainedPoints collects the count of point keys encountered within the bounds of a range key. For details, see pebble.RangeKeyIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `MvccRangeKeySkippedPoints` | RangeKeySkippedPoints collects the count of the subset of ContainedPoints point keys that were skipped during iteration due to range-key masking. For details, see pkg/storage/engine.go, pebble.RangeKeyIteratorStats, and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |


#### Common fields
Expand Down
12 changes: 11 additions & 1 deletion pkg/sql/conn_executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -2753,6 +2753,7 @@ func (ex *connExecutor) execCopyOut(
ex.server.TelemetryLoggingMetrics,
stmtFingerprintID,
&stats,
ex.statsCollector,
)
}()

Expand Down Expand Up @@ -2986,7 +2987,16 @@ func (ex *connExecutor) execCopyIn(
ex.planner.CurrentDatabase(),
)
var stats topLevelQueryStats
ex.planner.maybeLogStatement(ctx, ex.executorType, true, int(ex.state.mu.autoRetryCounter), ex.extraTxnState.txnCounter, numInsertedRows, 0 /* bulkJobId */, copyErr, ex.statsCollector.PhaseTimes().GetSessionPhaseTime(sessionphase.SessionQueryReceived), &ex.extraTxnState.hasAdminRoleCache, ex.server.TelemetryLoggingMetrics, stmtFingerprintID, &stats)
ex.planner.maybeLogStatement(ctx, ex.executorType, true,
int(ex.state.mu.autoRetryCounter), ex.extraTxnState.txnCounter,
numInsertedRows, 0, /* bulkJobId */
copyErr,
ex.statsCollector.PhaseTimes().GetSessionPhaseTime(sessionphase.SessionQueryReceived),
&ex.extraTxnState.hasAdminRoleCache,
ex.server.TelemetryLoggingMetrics,
stmtFingerprintID,
&stats,
ex.statsCollector)
}()

var copyErr error
Expand Down
2 changes: 2 additions & 0 deletions pkg/sql/conn_executor_exec.go
Original file line number Diff line number Diff line change
Expand Up @@ -1518,6 +1518,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
ex.server.TelemetryLoggingMetrics,
ppInfo.dispatchToExecutionEngine.stmtFingerprintID,
ppInfo.dispatchToExecutionEngine.queryStats,
ex.statsCollector,
)
},
})
Expand All @@ -1544,6 +1545,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
ex.server.TelemetryLoggingMetrics,
stmtFingerprintID,
&stats,
ex.statsCollector,
)
}
}()
Expand Down
78 changes: 62 additions & 16 deletions pkg/sql/exec_log.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/pgwire/pgerror"
"github.com/cockroachdb/cockroach/pkg/sql/privilege"
"github.com/cockroachdb/cockroach/pkg/sql/sem/tree"
"github.com/cockroachdb/cockroach/pkg/sql/sqlstats"
"github.com/cockroachdb/cockroach/pkg/util/envutil"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/log/eventpb"
Expand Down Expand Up @@ -165,10 +166,11 @@ func (p *planner) maybeLogStatement(
telemetryLoggingMetrics *TelemetryLoggingMetrics,
stmtFingerprintID appstatspb.StmtFingerprintID,
queryStats *topLevelQueryStats,
statsCollector sqlstats.StatsCollector,
) {
p.maybeLogStatementInternal(ctx, execType, isCopy, numRetries, txnCounter,
rows, bulkJobId, err, queryReceived, hasAdminRoleCache,
telemetryLoggingMetrics, stmtFingerprintID, queryStats,
telemetryLoggingMetrics, stmtFingerprintID, queryStats, statsCollector,
)
}

Expand All @@ -185,7 +187,8 @@ func (p *planner) maybeLogStatementInternal(
hasAdminRoleCache *HasAdminRoleCache,
telemetryMetrics *TelemetryLoggingMetrics,
stmtFingerprintID appstatspb.StmtFingerprintID,
queryStats *topLevelQueryStats,
topLevelQueryStats *topLevelQueryStats,
statsCollector sqlstats.StatsCollector,
) {
// Note: if you find the code below crashing because p.execCfg == nil,
// do not add a test "if p.execCfg == nil { do nothing }" !
Expand Down Expand Up @@ -424,18 +427,38 @@ func (p *planner) maybeLogStatementInternal(
txnID = p.txn.ID().String()
}

var stats execstats.QueryLevelStats
if queryLevelStats, ok := p.instrumentation.GetQueryLevelStats(); ok {
stats = *queryLevelStats
var queryLevelStats execstats.QueryLevelStats
if stats, ok := p.instrumentation.GetQueryLevelStats(); ok {
queryLevelStats = *stats
}

stats = telemetryMetrics.getQueryLevelStats(stats)
queryLevelStats = telemetryMetrics.getQueryLevelStats(queryLevelStats)
indexRecs := make([]string, 0, len(p.curPlan.instrumentation.indexRecs))
for _, rec := range p.curPlan.instrumentation.indexRecs {
indexRecs = append(indexRecs, rec.SQL)
}

phaseTimes := statsCollector.PhaseTimes()

// Collect the statistics.
idleLatRaw := phaseTimes.GetIdleLatency(statsCollector.PreviousPhaseTimes())
idleLatNanos := idleLatRaw.Nanoseconds()
runLatRaw := phaseTimes.GetRunLatency()
runLatNanos := runLatRaw.Nanoseconds()
parseLatNanos := phaseTimes.GetParsingLatency().Nanoseconds()
planLatNanos := phaseTimes.GetPlanningLatency().Nanoseconds()
// We want to exclude any overhead to reduce possible confusion.
svcLatRaw := phaseTimes.GetServiceLatencyNoOverhead()
svcLatNanos := svcLatRaw.Nanoseconds()

// processing latency: contributing towards SQL results.
processingLatNanos := parseLatNanos + planLatNanos + runLatNanos

// overhead latency: txn/retry management, error checking, etc
execOverheadNanos := svcLatNanos - processingLatNanos

skippedQueries := telemetryMetrics.resetSkippedQueryCount()

sampledQuery := eventpb.SampledQuery{
CommonSQLExecDetails: execDetails,
SkippedQueries: skippedQueries,
Expand All @@ -452,9 +475,9 @@ func (p *planner) maybeLogStatementInternal(
OutputRowsEstimate: p.curPlan.instrumentation.outputRows,
StatsAvailable: p.curPlan.instrumentation.statsAvailable,
NanosSinceStatsCollected: int64(p.curPlan.instrumentation.nanosSinceStatsCollected),
BytesRead: queryStats.bytesRead,
RowsRead: queryStats.rowsRead,
RowsWritten: queryStats.rowsWritten,
BytesRead: topLevelQueryStats.bytesRead,
RowsRead: topLevelQueryStats.rowsRead,
RowsWritten: topLevelQueryStats.rowsWritten,
InnerJoinCount: int64(p.curPlan.instrumentation.joinTypeCounts[descpb.InnerJoin]),
LeftOuterJoinCount: int64(p.curPlan.instrumentation.joinTypeCounts[descpb.LeftOuterJoin]),
FullOuterJoinCount: int64(p.curPlan.instrumentation.joinTypeCounts[descpb.FullOuterJoin]),
Expand All @@ -470,22 +493,45 @@ func (p *planner) maybeLogStatementInternal(
InvertedJoinCount: int64(p.curPlan.instrumentation.joinAlgorithmCounts[exec.InvertedJoin]),
ApplyJoinCount: int64(p.curPlan.instrumentation.joinAlgorithmCounts[exec.ApplyJoin]),
ZigZagJoinCount: int64(p.curPlan.instrumentation.joinAlgorithmCounts[exec.ZigZagJoin]),
ContentionNanos: stats.ContentionTime.Nanoseconds(),
ContentionNanos: queryLevelStats.ContentionTime.Nanoseconds(),
Regions: p.curPlan.instrumentation.regions,
NetworkBytesSent: stats.NetworkBytesSent,
MaxMemUsage: stats.MaxMemUsage,
MaxDiskUsage: stats.MaxDiskUsage,
KVBytesRead: stats.KVBytesRead,
KVRowsRead: stats.KVRowsRead,
NetworkMessages: stats.NetworkMessages,
NetworkBytesSent: queryLevelStats.NetworkBytesSent,
MaxMemUsage: queryLevelStats.MaxMemUsage,
MaxDiskUsage: queryLevelStats.MaxDiskUsage,
KVBytesRead: queryLevelStats.KVBytesRead,
KVRowsRead: queryLevelStats.KVRowsRead,
KvTimeNanos: queryLevelStats.KVTime.Nanoseconds(),
KvGrpcCalls: queryLevelStats.KVBatchRequestsIssued,
NetworkMessages: queryLevelStats.NetworkMessages,
CpuTimeNanos: queryLevelStats.CPUTime.Nanoseconds(),
IndexRecommendations: indexRecs,
Indexes: p.curPlan.instrumentation.indexesUsed,
ScanCount: int64(p.curPlan.instrumentation.scanCounts[exec.ScanCount]),
ScanWithStatsCount: int64(p.curPlan.instrumentation.scanCounts[exec.ScanWithStatsCount]),
ScanWithStatsForecastCount: int64(p.curPlan.instrumentation.scanCounts[exec.ScanWithStatsForecastCount]),
TotalScanRowsWithoutForecastsEstimate: p.curPlan.instrumentation.totalScanRowsWithoutForecasts,
NanosSinceStatsForecasted: int64(p.curPlan.instrumentation.nanosSinceStatsForecasted),
IdleLatencyNanos: idleLatNanos,
ServiceLatencyNanos: svcLatNanos,
RunLatencyNanos: runLatNanos,
PlanLatencyNanos: planLatNanos,
ParseLatencyNanos: parseLatNanos,
OverheadLatencyNanos: execOverheadNanos,
MvccBlockBytes: queryLevelStats.MvccBlockBytes,
MvccBlockBytesInCache: queryLevelStats.MvccBlockBytesInCache,
MvccKeyBytes: queryLevelStats.MvccKeyBytes,
MvccPointCount: queryLevelStats.MvccPointCount,
MvccPointsCoveredByRangeTombstones: queryLevelStats.MvccPointsCoveredByRangeTombstones,
MvccRangeKeyContainedPoints: queryLevelStats.MvccRangeKeyContainedPoints,
MvccRangeKeyCount: queryLevelStats.MvccRangeKeyCount,
MvccRangeKeySkippedPoints: queryLevelStats.MvccRangeKeySkippedPoints,
MvccSeekCountInternal: queryLevelStats.MvccSeeksInternal,
MvccSeekCount: queryLevelStats.MvccSeeks,
MvccStepCountInternal: queryLevelStats.MvccStepsInternal,
MvccStepCount: queryLevelStats.MvccSteps,
MvccValueBytes: queryLevelStats.MvccValueBytes,
}

p.logOperationalEventsOnlyExternally(ctx, isCopy, &sampledQuery)
} else {
telemetryMetrics.incSkippedQueryCount()
Expand Down
1 change: 0 additions & 1 deletion pkg/sql/telemetry_logging.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ func (t *TelemetryLoggingMetrics) maybeUpdateLastEmittedTime(
defer t.mu.Unlock()

lastEmittedTime := t.mu.lastEmittedTime

if float64(newTime.Sub(lastEmittedTime))*1e-9 >= requiredSecondsElapsed {
t.mu.lastEmittedTime = newTime
return true
Expand Down
Loading

0 comments on commit cb69f03

Please sign in to comment.