-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add contention_events to cluster_execution_insights #90660
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, nice! Insights layer LGTM, would like to hear especially from queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits from me!
As Matthew mentioned, get a review from queries to make sure there isn't any other places that you should add the new events
Reviewed 9 of 21 files at r1, 3 of 7 files at r2, 6 of 9 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w)
pkg/sql/crdb_internal.go
line 6409 at r3 (raw file):
exec_node_ids INT[] NOT NULL, contention INTERVAL, contention_events JSONB,
nit: misalignment here
pkg/sql/execinfrapb/component_stats.proto
line 141 at r3 (raw file):
optional util.optional.Uint num_internal_seeks = 8 [(gogoproto.nullable) = false]; // contention_events hit at the statement level
nit: period
pkg/sql/execstats/stats.go
line 36 at r3 (raw file):
// contention time from the given recording or, if the recording is nil, from // the tracing span from the context. All contention events found in the trace // are included.
can you update this comment to reflect the new return values?
pkg/sql/sqlstats/insights/insights.proto
line 104 at r3 (raw file):
google.protobuf.Duration contention = 18 [(gogoproto.stdduration) = true]; repeated string index_recommendations = 19; // contention_events hit at the statement level
nit: period
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go
line 298 at r3 (raw file):
// "type": "object", // [{ // "blocking_txn_id": { "type": "string" },
is this right? you're using camelCase on all other keys and I can see your encodeJSON also using camelCase for this key
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go
line 301 at r3 (raw file):
// "durationMs": { "type": "number" }, // "rawIndexID": { "type": "number" }, // "tableId": { "type": "number" }
nit: misalignment
curious: why "raw" index id? seems like you use just indexID
as the key on your encodeJSON function
Previously, maryliag (Marylia Gutierrez) wrote…
I copied an outdated json. I will fix it. |
Previously, maryliag (Marylia Gutierrez) wrote…
Nope, the code is correct. I accidentally copied the wrong json. It's now fixed. |
Previously, maryliag (Marylia Gutierrez) wrote…
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @j82w)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have some nits, but otherwise.
Reviewed 9 of 21 files at r1, 3 of 7 files at r2, 4 of 9 files at r3, 6 of 7 files at r4, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @j82w)
pkg/sql/crdb_internal.go
line 6494 at r4 (raw file):
contentionEvents := tree.DNull if insight.Statement.ContentionEvents != nil && len(insight.Statement.ContentionEvents) > 0 {
nit: no need for the non-nil check - len
and cap
of nil
slice is 0.
pkg/sql/crdb_internal.go
line 6539 at r4 (raw file):
)) if err != nil {
nit: hm, we have some inconsistency in how we're handling errors in this function - in some cases we use CombineErrors
and don't short-circuit, in others we short-circuit. I think we should choose one or the other.
pkg/sql/colexecop/operator.go
line 79 at r4 (raw file):
// GetCumulativeContentionTime returns the amount of time KV reads spent // contending. It must be safe for concurrent use. GetCumulativeContentionTime() (time.Duration, []roachpb.ContentionEvent)
nit: we should rename this method, e.g. to GetContentionInfo
.
pkg/sql/execstats/traceanalyzer.go
line 127 at r4 (raw file):
NetworkMessages int64 ContentionTime time.Duration ContentionEvent []roachpb.ContentionEvent
nit: do ContentionEvents
.
pkg/sql/execstats/traceanalyzer.go
line 161 at r4 (raw file):
s.NetworkMessages += other.NetworkMessages s.ContentionTime += other.ContentionTime if len(other.ContentionEvent) > 0 {
nit: you can omit the length check here too and just use append
- it'll have an exactly the same behavior.
pkg/sql/execstats/traceanalyzer.go
line 254 at r4 (raw file):
a.nodeLevelStats.KVTimeGroupedByNode[instanceID] += stats.KV.KVTime.Value() a.nodeLevelStats.ContentionTimeGroupedByNode[instanceID] += stats.KV.ContentionTime.Value() if len(stats.KV.ContentionEvents) > 0 {
nit: ditto
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go
line 297 at r4 (raw file):
// "title": "system.statement_statistics.contention_events", // "type": "object", // [{
nit: misalignment - seems like a couple of spaces are missing.
The original contention column will remain to make query operations faster. The events are being put into a json column because it's possible there could be multiple blocking events for a single statement. The json column avoids the complexity of adding another table and keeping it in sync with the insights table. The table can be joined with index_columns and tables to get the database name, table name, and index name that contention occurred on. This does not contain the blocking statement information, and the blocking fingerprint id. closes #88561 Release note (sql change): Adds contention_events to cluster_execution_insights. This is used to see which transaction is blocking the specific statement.
bors r+ |
Build failed (retrying...): |
Build succeeded: |
The original contention column will remain
to make query operations faster. The events
are being put into a json column because it's
possible there could be multiple blocking events
for a single statement. The json column avoids the complexity of adding another table and keeping it
in sync with the insights table.
The table can be joined with index_columns and tables to get the database name, table name, and index name that contention occurred on. This does not contain the blocking statement information, and the blocking fingerprint id.
closes: #88561
Release note (sql change): Adds contention_events
to cluster_execution_insights. This is used
to see which transaction is blocking the specific
statement.