Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: contention duration not recorded properly in SQL Stats #78675

Closed
Azhng opened this issue Mar 28, 2022 · 3 comments · Fixed by #79175
Closed

sql: contention duration not recorded properly in SQL Stats #78675

Azhng opened this issue Mar 28, 2022 · 3 comments · Fixed by #79175
Labels
A-sql-observability Related to observability of the SQL layer C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@Azhng
Copy link
Contributor

Azhng commented Mar 28, 2022

Discovered while working on #73566.

When a contention event is experienced, the contention event is recorded in the contention event registry. However, the same event is not reflected in the SQL Stats.

Observed using the following injected code:

conn_executor_exec.go

@@ -2188,8 +2188,27 @@ func (ex *connExecutor) recordTransactionFinish(
        }
        ex.metrics.EngineMetrics.SQLTxnLatency.RecordValue(txnTime.Nanoseconds())

+       s := "internal"
+       if ex.executorType == executorTypeExec {
+               s = "external"
+       }
+
        if contentionDuration := ex.extraTxnState.accumulatedStats.ContentionTime.Nanoseconds(); contentionDuration > 0 {
                ex.metrics.EngineMetrics.SQLContendedTxns.Inc(1)
+               if ex.executorType == executorTypeExec {
+                       fmt.Printf("[DEBUG] (%s)(%s): recording contention: %d (%s) | n%s\n",
+                               s,
+                               ev.txnID.String(),
+                               ex.metrics.EngineMetrics.SQLContendedTxns.Count(),
+                               ex.extraTxnState.accumulatedStats.ContentionTime.String(),
+                               ex.server.cfg.NodeID.SQLInstanceID().String())
+               }
+       } else {
+               if ex.executorType == executorTypeExec {
+                       fmt.Printf("[DEBUG]: (%s|n%s)(%s) no contention observed: %s\n",
+                               s, ex.server.cfg.NodeID.SQLInstanceID().String(),
+                               ev.txnID.String(), ex.extraTxnState.accumulatedStats.ContentionTime.String())
+               }
        }

        ex.txnIDCacheWriter.Record(contentionpb.ResolvedTxnID{

In distsql_running.go

--- a/pkg/sql/distsql_running.go
+++ b/pkg/sql/distsql_running.go
@@ -1004,6 +1004,7 @@ func (r *DistSQLReceiver) pushMeta(meta *execinfrapb.ProducerMetadata) execinfra
                                        contentionEvent.WaitingTxnID = r.txn.ID()
                                }
                                r.contentionRegistry.AddContentionEvent(contentionEvent)
+                               fmt.Printf("[DEBUG]: contention event for txnID: %s, %s\n", r.txn.ID(), contentionEvent.BlockingEvent.Duration.String())
                        })
                }
        }

The following log entry shows the discrepancy:

[DEBUG]: contention event for txnID: 789b206f-b507-4f25-ad24-c11a62f664db, 6.571134ms
[DEBUG]: (external|n2)(789b206f-b507-4f25-ad24-c11a62f664db) no contention observed: 0s

Same discrepancy can be observed in crdb_internal.transaction_contention_events table and the SQL Stats tables.

Jira issue: CRDB-14212

@Azhng Azhng added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-observability Related to observability of the SQL layer T-sql-observability labels Mar 28, 2022
@Azhng
Copy link
Contributor Author

Azhng commented Mar 28, 2022

Likely a bug within the trace analyzer

@Azhng
Copy link
Contributor Author

Azhng commented Mar 28, 2022

Trace Analyzer only analyze the ComponentStats data embedded in the traces.

Contention events is embedded in the trace as roachpb.ContentionEvent. This caused trace analyzer to skip over contention events in the tracing data, which in turn caused the contention events not to be bubbled up to the SQL Stats level. (both RecordStatementStats() and RecordTransactionStats() are not properly recording contention stats.

Though it seems like there's another code path that bubbles up contention stats to SQL Stats somehow. 😕.

@Azhng
Copy link
Contributor Author

Azhng commented Mar 29, 2022

Investigated further with help from @yuzefovich. Seems like in the default SET tracing=on, it's either

  • GetQueryLevelStats() is not being executed, or
  • Contention Event is not being added to the ComponentStats in the vectorized stats collector.

Using EXPLAIN ANALYZE can see the contention stats show up for statement stats, but not transaction stats.

Azhng added a commit to Azhng/cockroach that referenced this issue Mar 31, 2022
Resolves cockroachdb#78675

Previously, contention stats was not collected for SQL Stats even when
tracing was enabled. This was caused by two bugs:
1. instrumentationHelper skipping analyzing traces when tracing is enabled
2. transaction statistics ignore traces when tracing is turned on at the
higher level.

This commit ensures that contention stats is collected when tracing is
turned on.

Release note (bug fix): Contention statistics are now being collected for
SQL Stats when tracing is enabled.
Azhng added a commit to Azhng/cockroach that referenced this issue Apr 7, 2022
Resolves cockroachdb#78675

Previously, contention stats was not collected for SQL Stats even when
tracing was enabled. This was caused by two bugs:
1. instrumentationHelper skipping analyzing traces when tracing is enabled
2. transaction statistics ignore traces when tracing is turned on at the
higher level.

This commit ensures that contention stats is collected when tracing is
turned on.

Release note (bug fix): Contention statistics are now being collected for
SQL Stats when tracing is enabled.
@craig craig bot closed this as completed in 3582a90 Apr 11, 2022
blathers-crl bot pushed a commit that referenced this issue May 5, 2022
Resolves #78675

Previously, contention stats was not collected for SQL Stats even when
tracing was enabled. This was caused by two bugs:
1. instrumentationHelper skipping analyzing traces when tracing is enabled
2. transaction statistics ignore traces when tracing is turned on at the
higher level.

This commit ensures that contention stats is collected when tracing is
turned on.

Release note (bug fix): Contention statistics are now being collected for
SQL Stats when tracing is enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-observability Related to observability of the SQL layer C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants