sql: surface KV contention statistics #55243

asubiotto · 2020-10-06T11:58:48Z

A high priority for the KV team this release (21.1) is to make it easier for users to understand what is happening with their queries at the KV layer. Since the SQL Execution team is also responsible for query observability, a goal for 21.1 is to surface contention statistics provided by the KV team.

At a general level, we aim to surface these contention statistics in two ways:

On a per-statement/transaction level accessed through the transactions page.
On a cluster level, using a “contention heatmap”.

Design Proposal

This section outlines the work needed on the backend. It does not explicitly lay out what will happen on the frontend but takes as reference the customer needs outlined above to guide the work.

At a basic level, SQL Execution will have access to the following protobufs emitted after completion of a statement:

type ContentionEvent struct {
  Key roachpb.Key
  Txn roachpb.Transaction // the “other” transaction, i.e. what transaction the current transaction is contending with. 
  Duration time.Duration // time spent contending on the key against that txn in that one instance
  Outcome enum // pushed?
}

There is more detail about how this protobuf will be produced in #55583.

Per-statement

These contention events will be used to enrich existing EXPLAIN ANALYZE plans. Any operator that sends KV requests will intercept these contention events and keep track of the cumulative time spent contending and output this as another execution stat. These contention seconds can be aggregated across a whole query much as other top-level stats are and can provide a percentage of time that the query spends contending with other transactions by calculating the ratio of cumulative time spent contending with respect to the cumulative execution time (i.e. addition of cumulative disk time + cumulative network time + cumulative execution time). This stat will be surfaced through EXPLAIN ANALYZE and the respective statement’s page.

The transaction IDs that the statement contended with will also be stored by the operator although these will not be displayed in EXPLAIN ANALYZE, only surfaced to the statements page with enough information to display a list where each row links to the transaction page that was contending with that statement. TODO(asubiotto): It’s unclear at this moment how to tie a transaction ID back to a transaction page since the transactions in the admin UI seem to be keyed by a hash of statement strings but I haven’t looked into it too much.

Global View

Global contention statistics show the operator what the state of contention is in a cluster. Since each statement provides a small piece of this puzzle, the contention events will be sent back to the gateway as metadata and processed by the distsql receiver on that gateway, similar to what happens when the range cache is updated when a misplanned range is encountered.

Contention View

The contention view is primarily a time series metric that gives the operator the data to determine whether the cluster is suffering from contention or not, but no more. The goal of this view is to be a litmus test for whether further investigation is needed by looking at the contention map and specific statement contention data. The proposal is to surface the number of contended queries per second. This should be relatively simple to plumb down as a timeseries metric to the distsql receiver to be incremented if the query receives at least one contention event.

Contention Map

This map view is still pending design, but we can start work on the backend given the general idea. This “global” view should also be queryable via a virtual table. This kind of API is reminiscent of other features like SHOW QUERIES, where each server has a local view of its queries and there is a cluster-level API through the DB console as well as SQL shell to query the global state.

The proposal in this case is to keep an in-memory contention registry keyed by table ID/indexID pair that is updated by the DistSQL receiver when receiving contention events. When a StatusServer receives a request for a global contention view, it will broadcast a contention request to each node for its local contention view, and merge it with its own local view.

Diving a bit deeper, what should this contention registry look like? It depends on the questions we want to answer. At a high level, we want to be able to answer which tables are experiencing contention and allow the user to dive deeper into a table to understand what key/row range is experiencing contention and which transactions are responsible for this contention. Therefore, the proposal is a top-level map keyed by a table ID with a value struct that looks something like:

type tableContentionInfo struct {
    contentionEvents uint64
    cumulativeContentionTime time.Duration
    orderedKeyContentionInfo []keyContentionInfo
}

type keyContentionInfo struct {
    key roachpb.Key
    contendingTransactions []roachpb.Transaction
}

This map cannot grow unboundedly, so an eviction policy needs to be put into place. As a simple starter, we can also keep track of the last contention event for a given table and use an LRU policy with a maximum size for the map. In a given map entry, we could similarly use an LRU policy for keys.

Merging these structs should be relatively straightforward.

Observations and remaining questions

When talking about a global contention view, we’ve talked about showing contended spans, but contention events only produce keys. I’ll follow up with KV to define what we’re going to do here better, otherwise I’m guessing we can slightly modify the contention event and have the table reader or cfetcher annotate the contention proto with the span that was passed to the fetcher and produced this contention event.
Need to figure out how to use transaction IDs to get richer transaction information.

Work Items

Add per-operator sum of contention time to EXPLAIN ANALYZE + testing cluster setting that generates mock contention events (rowexec/colexec: add cumulative contention time to EXPLAIN ANALYZE #56612).
Add the sum of these as a calculation to TraceAnalyzer for further bubbling these up to the DB console (execstats: calculate total contention time #56797).
Add a global contention view timeseries metric that the distSQLReceiver will increment if at least one contention event is observed (sql: add global contended queries per second timeseries metric #56798).
Implement contention registry (sql: implement contention registry and expose through StatusServer and virtual table #57114).
Hook up real contention data instead of the mock events (sql: replace mock roachpb.ContentionEvents with real ones #59639).
Expose a global contention view through a virtual table (sql: implement contention registry and expose through StatusServer and virtual table #57114)

Out of scope

Getting information for currently hanging requests is out of scope. This will be exposed to users through a debug page by the KV team, so the information will be available. The proposal here is to surface the contention statistics the KV team will provide through execution metadata. High contention is usually a performance problem, not a stuck queries problem so it will be enough for users to be able to visualize the contention information for queries that have completed.

The text was updated successfully, but these errors were encountered:

asubiotto · 2020-10-06T11:59:13Z

cc @jordanlewis @tbg

tbg · 2020-10-23T08:59:31Z

@glennfawcett mentioned interest in this at the per-stmt level as well. Was this considered as well?

awoods187 · 2020-10-23T12:46:34Z

Yes. The requirement is to surface contention information on both a per statement and to aggregate this information on a per cluster basis. We haven't conducted all of our discovery yet but we will get increasingly more specific.

tbg · 2020-10-23T13:21:31Z

Just to clarify, you mean both per-stmt, per-txn, and global, right? As opposed to per-stmt xor per-txn.

asubiotto · 2020-11-12T13:34:10Z

Updated the issue with a draft design proposal

asubiotto · 2021-02-25T16:22:51Z

I declare this meta-issue complete 🔨 We'll track the implementation of a UI global contention map separately, but a user can currently view all contention events through a virtual table thanks to @yuzefovich's latest work.

asubiotto added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-observability A-sql-observability Related to observability of the SQL layer labels Oct 6, 2020

asubiotto added the meta-issue Contains a list of several other issues. label Oct 6, 2020

This was referenced Oct 15, 2020

concurrency: produce tracing metadata #55583

Closed

tracing: introduce structured always-on tracing #55584

Closed

rmloveland mentioned this issue Dec 2, 2020

Maybe document the KV layer portion of "Always-on EXPLAIN ANALYZE" cockroachdb/docs#9044

Open

asubiotto closed this as completed Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: surface KV contention statistics #55243

sql: surface KV contention statistics #55243

asubiotto commented Oct 6, 2020 •

edited by yuzefovich

Loading

asubiotto commented Oct 6, 2020

tbg commented Oct 23, 2020

awoods187 commented Oct 23, 2020

tbg commented Oct 23, 2020

asubiotto commented Nov 12, 2020

asubiotto commented Feb 25, 2021

sql: surface KV contention statistics #55243

sql: surface KV contention statistics #55243

Comments

asubiotto commented Oct 6, 2020 • edited by yuzefovich Loading

Design Proposal

Per-statement

Global View

Contention View

Contention Map

Observations and remaining questions

Work Items

Out of scope

asubiotto commented Oct 6, 2020

tbg commented Oct 23, 2020

awoods187 commented Oct 23, 2020

tbg commented Oct 23, 2020

asubiotto commented Nov 12, 2020

asubiotto commented Feb 25, 2021

asubiotto commented Oct 6, 2020 •

edited by yuzefovich

Loading