Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: sql: add stats to telemetry sampled query event #103841

Merged
merged 1 commit into from
May 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/generated/eventlog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2901,6 +2901,28 @@ contains common SQL event/execution details.
| `TotalScanRowsWithoutForecastsEstimate` | Total number of rows read by all scans in the query, as estimated by the optimizer without using forecasts. | no |
| `NanosSinceStatsForecasted` | The greatest quantity of nanoseconds that have passed since the forecast time (or until the forecast time, if it is in the future, in which case it will be negative) for any table with forecasted stats scanned by this query. | no |
| `Indexes` | The list of indexes used by this query. | no |
| `CpuTimeNanos` | Collects the cumulative CPU time spent executing SQL operations in nanoseconds. Currently, it is only collected for statements without mutations that have a vectorized plan. | no |
| `KvGrpcCalls` | The number of grpc calls done to get data form KV nodes | no |
| `KvTimeNanos` | Cumulated time spent waiting for a KV request. This includes disk IO time and potentially network time (if any of the keys are not local). | no |
| `ServiceLatencyNanos` | The time to service the query, from start of parse to end of execute. | no |
| `OverheadLatencyNanos` | The difference between service latency and the sum of parse latency + plan latency + run latency . | no |
| `RunLatencyNanos` | The time to run the query and fetch or compute the result rows. | no |
| `PlanLatencyNanos` | The time to transform the AST into a logical query plan. | no |
| `IdleLatencyNanos` | The time between statement executions in a transaction | no |
| `ParseLatencyNanos` | The time to transform the SQL string into an abstract syntax tree (AST). | no |
| `MvccStepCount` | StepCount collects the number of times the iterator moved forward or backward over the DB's underlying storage keyspace. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `MvccStepCountInternal` | StepCountInternal collects the number of times the iterator moved forward or backward over LSM internal keys. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `MvccSeekCount` | SeekCount collects the number of times the iterator moved to a specific key/value pair in the DB's underlying storage keyspace. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `MvccSeekCountInternal` | SeekCountInternal collects the number of times the iterator moved to a specific LSM internal key. For details, see pkg/storage/engine.go and pkg/sql/opt/exec/factory.go. | no |
| `MvccBlockBytes` | BlockBytes collects the bytes in the loaded SSTable data blocks. For details, see pebble.InternalIteratorStats. | no |
| `MvccBlockBytesInCache` | BlockBytesInCache collects the subset of BlockBytes in the block cache. For details, see pebble.InternalIteratorStats. | no |
| `MvccKeyBytes` | KeyBytes collects the bytes in keys that were iterated over. For details, see pebble.InternalIteratorStats. | no |
| `MvccValueBytes` | ValueBytes collects the bytes in values that were iterated over. For details, see pebble.InternalIteratorStats. | no |
| `MvccPointCount` | PointCount collects the count of point keys iterated over. For details, see pebble.InternalIteratorStats. | no |
| `MvccPointsCoveredByRangeTombstones` | PointsCoveredByRangeTombstones collects the count of point keys that were iterated over that were covered by range tombstones. For details, see pebble.InternalIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `MvccRangeKeyCount` | RangeKeyCount collects the count of range keys encountered during iteration. For details, see pebble.RangeKeyIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `MvccRangeKeyContainedPoints` | RangeKeyContainedPoints collects the count of point keys encountered within the bounds of a range key. For details, see pebble.RangeKeyIteratorStats and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |
| `MvccRangeKeySkippedPoints` | RangeKeySkippedPoints collects the count of the subset of ContainedPoints point keys that were skipped during iteration due to range-key masking. For details, see pkg/storage/engine.go, pebble.RangeKeyIteratorStats, and https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/mvcc-range-tombstones.md. | no |


#### Common fields
Expand Down
12 changes: 11 additions & 1 deletion pkg/sql/conn_executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -2753,6 +2753,7 @@ func (ex *connExecutor) execCopyOut(
ex.server.TelemetryLoggingMetrics,
stmtFingerprintID,
&stats,
ex.statsCollector,
)
}()

Expand Down Expand Up @@ -2986,7 +2987,16 @@ func (ex *connExecutor) execCopyIn(
ex.planner.CurrentDatabase(),
)
var stats topLevelQueryStats
ex.planner.maybeLogStatement(ctx, ex.executorType, true, int(ex.state.mu.autoRetryCounter), ex.extraTxnState.txnCounter, numInsertedRows, 0 /* bulkJobId */, copyErr, ex.statsCollector.PhaseTimes().GetSessionPhaseTime(sessionphase.SessionQueryReceived), &ex.extraTxnState.hasAdminRoleCache, ex.server.TelemetryLoggingMetrics, stmtFingerprintID, &stats)
ex.planner.maybeLogStatement(ctx, ex.executorType, true,
int(ex.state.mu.autoRetryCounter), ex.extraTxnState.txnCounter,
numInsertedRows, 0, /* bulkJobId */
copyErr,
ex.statsCollector.PhaseTimes().GetSessionPhaseTime(sessionphase.SessionQueryReceived),
&ex.extraTxnState.hasAdminRoleCache,
ex.server.TelemetryLoggingMetrics,
stmtFingerprintID,
&stats,
ex.statsCollector)
}()

var copyErr error
Expand Down
2 changes: 2 additions & 0 deletions pkg/sql/conn_executor_exec.go
Original file line number Diff line number Diff line change
Expand Up @@ -1518,6 +1518,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
ex.server.TelemetryLoggingMetrics,
ppInfo.dispatchToExecutionEngine.stmtFingerprintID,
ppInfo.dispatchToExecutionEngine.queryStats,
ex.statsCollector,
)
},
})
Expand All @@ -1544,6 +1545,7 @@ func (ex *connExecutor) dispatchToExecutionEngine(
ex.server.TelemetryLoggingMetrics,
stmtFingerprintID,
&stats,
ex.statsCollector,
)
}
}()
Expand Down
78 changes: 62 additions & 16 deletions pkg/sql/exec_log.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/pgwire/pgerror"
"github.com/cockroachdb/cockroach/pkg/sql/privilege"
"github.com/cockroachdb/cockroach/pkg/sql/sem/tree"
"github.com/cockroachdb/cockroach/pkg/sql/sqlstats"
"github.com/cockroachdb/cockroach/pkg/util/envutil"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/log/eventpb"
Expand Down Expand Up @@ -165,10 +166,11 @@ func (p *planner) maybeLogStatement(
telemetryLoggingMetrics *TelemetryLoggingMetrics,
stmtFingerprintID appstatspb.StmtFingerprintID,
queryStats *topLevelQueryStats,
statsCollector sqlstats.StatsCollector,
) {
p.maybeLogStatementInternal(ctx, execType, isCopy, numRetries, txnCounter,
rows, bulkJobId, err, queryReceived, hasAdminRoleCache,
telemetryLoggingMetrics, stmtFingerprintID, queryStats,
telemetryLoggingMetrics, stmtFingerprintID, queryStats, statsCollector,
)
}

Expand All @@ -185,7 +187,8 @@ func (p *planner) maybeLogStatementInternal(
hasAdminRoleCache *HasAdminRoleCache,
telemetryMetrics *TelemetryLoggingMetrics,
stmtFingerprintID appstatspb.StmtFingerprintID,
queryStats *topLevelQueryStats,
topLevelQueryStats *topLevelQueryStats,
statsCollector sqlstats.StatsCollector,
) {
// Note: if you find the code below crashing because p.execCfg == nil,
// do not add a test "if p.execCfg == nil { do nothing }" !
Expand Down Expand Up @@ -424,18 +427,38 @@ func (p *planner) maybeLogStatementInternal(
txnID = p.txn.ID().String()
}

var stats execstats.QueryLevelStats
if queryLevelStats, ok := p.instrumentation.GetQueryLevelStats(); ok {
stats = *queryLevelStats
var queryLevelStats execstats.QueryLevelStats
if stats, ok := p.instrumentation.GetQueryLevelStats(); ok {
queryLevelStats = *stats
}

stats = telemetryMetrics.getQueryLevelStats(stats)
queryLevelStats = telemetryMetrics.getQueryLevelStats(queryLevelStats)
indexRecs := make([]string, 0, len(p.curPlan.instrumentation.indexRecs))
for _, rec := range p.curPlan.instrumentation.indexRecs {
indexRecs = append(indexRecs, rec.SQL)
}

phaseTimes := statsCollector.PhaseTimes()

// Collect the statistics.
idleLatRaw := phaseTimes.GetIdleLatency(statsCollector.PreviousPhaseTimes())
idleLatNanos := idleLatRaw.Nanoseconds()
runLatRaw := phaseTimes.GetRunLatency()
runLatNanos := runLatRaw.Nanoseconds()
parseLatNanos := phaseTimes.GetParsingLatency().Nanoseconds()
planLatNanos := phaseTimes.GetPlanningLatency().Nanoseconds()
// We want to exclude any overhead to reduce possible confusion.
svcLatRaw := phaseTimes.GetServiceLatencyNoOverhead()
svcLatNanos := svcLatRaw.Nanoseconds()

// processing latency: contributing towards SQL results.
processingLatNanos := parseLatNanos + planLatNanos + runLatNanos

// overhead latency: txn/retry management, error checking, etc
execOverheadNanos := svcLatNanos - processingLatNanos

skippedQueries := telemetryMetrics.resetSkippedQueryCount()

sampledQuery := eventpb.SampledQuery{
CommonSQLExecDetails: execDetails,
SkippedQueries: skippedQueries,
Expand All @@ -452,9 +475,9 @@ func (p *planner) maybeLogStatementInternal(
OutputRowsEstimate: p.curPlan.instrumentation.outputRows,
StatsAvailable: p.curPlan.instrumentation.statsAvailable,
NanosSinceStatsCollected: int64(p.curPlan.instrumentation.nanosSinceStatsCollected),
BytesRead: queryStats.bytesRead,
RowsRead: queryStats.rowsRead,
RowsWritten: queryStats.rowsWritten,
BytesRead: topLevelQueryStats.bytesRead,
RowsRead: topLevelQueryStats.rowsRead,
RowsWritten: topLevelQueryStats.rowsWritten,
InnerJoinCount: int64(p.curPlan.instrumentation.joinTypeCounts[descpb.InnerJoin]),
LeftOuterJoinCount: int64(p.curPlan.instrumentation.joinTypeCounts[descpb.LeftOuterJoin]),
FullOuterJoinCount: int64(p.curPlan.instrumentation.joinTypeCounts[descpb.FullOuterJoin]),
Expand All @@ -470,22 +493,45 @@ func (p *planner) maybeLogStatementInternal(
InvertedJoinCount: int64(p.curPlan.instrumentation.joinAlgorithmCounts[exec.InvertedJoin]),
ApplyJoinCount: int64(p.curPlan.instrumentation.joinAlgorithmCounts[exec.ApplyJoin]),
ZigZagJoinCount: int64(p.curPlan.instrumentation.joinAlgorithmCounts[exec.ZigZagJoin]),
ContentionNanos: stats.ContentionTime.Nanoseconds(),
ContentionNanos: queryLevelStats.ContentionTime.Nanoseconds(),
Regions: p.curPlan.instrumentation.regions,
NetworkBytesSent: stats.NetworkBytesSent,
MaxMemUsage: stats.MaxMemUsage,
MaxDiskUsage: stats.MaxDiskUsage,
KVBytesRead: stats.KVBytesRead,
KVRowsRead: stats.KVRowsRead,
NetworkMessages: stats.NetworkMessages,
NetworkBytesSent: queryLevelStats.NetworkBytesSent,
MaxMemUsage: queryLevelStats.MaxMemUsage,
MaxDiskUsage: queryLevelStats.MaxDiskUsage,
KVBytesRead: queryLevelStats.KVBytesRead,
KVRowsRead: queryLevelStats.KVRowsRead,
KvTimeNanos: queryLevelStats.KVTime.Nanoseconds(),
KvGrpcCalls: queryLevelStats.KVBatchRequestsIssued,
NetworkMessages: queryLevelStats.NetworkMessages,
CpuTimeNanos: queryLevelStats.CPUTime.Nanoseconds(),
IndexRecommendations: indexRecs,
Indexes: p.curPlan.instrumentation.indexesUsed,
ScanCount: int64(p.curPlan.instrumentation.scanCounts[exec.ScanCount]),
ScanWithStatsCount: int64(p.curPlan.instrumentation.scanCounts[exec.ScanWithStatsCount]),
ScanWithStatsForecastCount: int64(p.curPlan.instrumentation.scanCounts[exec.ScanWithStatsForecastCount]),
TotalScanRowsWithoutForecastsEstimate: p.curPlan.instrumentation.totalScanRowsWithoutForecasts,
NanosSinceStatsForecasted: int64(p.curPlan.instrumentation.nanosSinceStatsForecasted),
IdleLatencyNanos: idleLatNanos,
ServiceLatencyNanos: svcLatNanos,
RunLatencyNanos: runLatNanos,
PlanLatencyNanos: planLatNanos,
ParseLatencyNanos: parseLatNanos,
OverheadLatencyNanos: execOverheadNanos,
MvccBlockBytes: queryLevelStats.MvccBlockBytes,
MvccBlockBytesInCache: queryLevelStats.MvccBlockBytesInCache,
MvccKeyBytes: queryLevelStats.MvccKeyBytes,
MvccPointCount: queryLevelStats.MvccPointCount,
MvccPointsCoveredByRangeTombstones: queryLevelStats.MvccPointsCoveredByRangeTombstones,
MvccRangeKeyContainedPoints: queryLevelStats.MvccRangeKeyContainedPoints,
MvccRangeKeyCount: queryLevelStats.MvccRangeKeyCount,
MvccRangeKeySkippedPoints: queryLevelStats.MvccRangeKeySkippedPoints,
MvccSeekCountInternal: queryLevelStats.MvccSeeksInternal,
MvccSeekCount: queryLevelStats.MvccSeeks,
MvccStepCountInternal: queryLevelStats.MvccStepsInternal,
MvccStepCount: queryLevelStats.MvccSteps,
MvccValueBytes: queryLevelStats.MvccValueBytes,
}

p.logOperationalEventsOnlyExternally(ctx, isCopy, &sampledQuery)
} else {
telemetryMetrics.incSkippedQueryCount()
Expand Down
1 change: 0 additions & 1 deletion pkg/sql/telemetry_logging.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ func (t *TelemetryLoggingMetrics) maybeUpdateLastEmittedTime(
defer t.mu.Unlock()

lastEmittedTime := t.mu.lastEmittedTime

if float64(newTime.Sub(lastEmittedTime))*1e-9 >= requiredSecondsElapsed {
t.mu.lastEmittedTime = newTime
return true
Expand Down
Loading