You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In DB Console -> Metrics -> SQL of a mostly idling cluster
In an idling cluster where practically all sql is internal, the number of active statements appears to be greater than the number of open transactions.
An open transaction can execute a statement or do nothing. So at no point the number of active (executing) statements can exceed the number of transactions.
Apparently some of the internal transactions are not properly reported/reflected in the metric. As
Per Yahor, “Open SQL transactions” we distinguish between external and internal txns (i.e. there are sql.txns.open and sql.txns.open.internal metrics) whereas for Active SQL Statements we don’t have this distinction. The execution engine currently is not aware whether the query is external or not, but the connExecutor (responsible for txn handling) is aware.
The cleanest an most helpful way to fix the issue might be rename the currentsql.txns.open to a newsql.txns.open.external and add a newsql.txns.open that would be SUM(sql.txns.open.internal + sql.txns.open.external).
Or just change the graph so it charts the currentsql.txns.open + sql.txns.open.internal?
Environment:
CockroachDB version 21.1.7
Additional context
What was the impact?
Astute customers who pay attention to details get slightly confused, specifically when the cluster has low concurrency user workload. Note that low concurrency user workload does not mean the cluster in not used. To the opposite - we ran into this because the background system jobs drove excessively high cpu, impacting the user workload.
So ability to accurately account external/internal/total is super essential for troubleshooting.
In DB Console -> Metrics -> SQL of a mostly idling cluster
In an idling cluster where practically all sql is internal, the number of active statements appears to be greater than the number of open transactions.
An open transaction can execute a statement or do nothing. So at no point the number of active (executing) statements can exceed the number of transactions.
Apparently some of the internal transactions are not properly reported/reflected in the metric. As
Per Yahor, “Open SQL transactions” we distinguish between external and internal txns (i.e. there are
sql.txns.open
andsql.txns.open.internal
metrics) whereas for Active SQL Statements we don’t have this distinction. The execution engine currently is not aware whether the query is external or not, but the connExecutor (responsible for txn handling) is aware.The cleanest an most helpful way to fix the issue might be rename the current
sql.txns.open
to a newsql.txns.open.external
and add a newsql.txns.open
that would beSUM(sql.txns.open.internal + sql.txns.open.external)
.Or just change the graph so it charts the current
sql.txns.open
+sql.txns.open.internal
?Environment:
Additional context
What was the impact?
Astute customers who pay attention to details get slightly confused, specifically when the cluster has low concurrency user workload. Note that low concurrency user workload does not mean the cluster in not used. To the opposite - we ran into this because the background system jobs drove excessively high cpu, impacting the user workload.
So ability to accurately account external/internal/total is super essential for troubleshooting.
Jira issue: CRDB-9997
The text was updated successfully, but these errors were encountered: