sql: surface KV contention statistics #55243
Labels
A-kv-observability
A-sql-observability
Related to observability of the SQL layer
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
meta-issue
Contains a list of several other issues.
A high priority for the KV team this release (21.1) is to make it easier for users to understand what is happening with their queries at the KV layer. Since the SQL Execution team is also responsible for query observability, a goal for 21.1 is to surface contention statistics provided by the KV team.
At a general level, we aim to surface these contention statistics in two ways:
Design Proposal
This section outlines the work needed on the backend. It does not explicitly lay out what will happen on the frontend but takes as reference the customer needs outlined above to guide the work.
At a basic level, SQL Execution will have access to the following protobufs emitted after completion of a statement:
There is more detail about how this protobuf will be produced in #55583.
Per-statement
These contention events will be used to enrich existing EXPLAIN ANALYZE plans. Any operator that sends KV requests will intercept these contention events and keep track of the cumulative time spent contending and output this as another execution stat. These contention seconds can be aggregated across a whole query much as other top-level stats are and can provide a percentage of time that the query spends contending with other transactions by calculating the ratio of cumulative time spent contending with respect to the cumulative execution time (i.e. addition of cumulative disk time + cumulative network time + cumulative execution time). This stat will be surfaced through EXPLAIN ANALYZE and the respective statement’s page.
The transaction IDs that the statement contended with will also be stored by the operator although these will not be displayed in EXPLAIN ANALYZE, only surfaced to the statements page with enough information to display a list where each row links to the transaction page that was contending with that statement. TODO(asubiotto): It’s unclear at this moment how to tie a transaction ID back to a transaction page since the transactions in the admin UI seem to be keyed by a hash of statement strings but I haven’t looked into it too much.
Global View
Global contention statistics show the operator what the state of contention is in a cluster. Since each statement provides a small piece of this puzzle, the contention events will be sent back to the gateway as metadata and processed by the distsql receiver on that gateway, similar to what happens when the range cache is updated when a misplanned range is encountered.
Contention View
The contention view is primarily a time series metric that gives the operator the data to determine whether the cluster is suffering from contention or not, but no more. The goal of this view is to be a litmus test for whether further investigation is needed by looking at the contention map and specific statement contention data. The proposal is to surface the number of contended queries per second. This should be relatively simple to plumb down as a timeseries metric to the distsql receiver to be incremented if the query receives at least one contention event.
Contention Map
This map view is still pending design, but we can start work on the backend given the general idea. This “global” view should also be queryable via a virtual table. This kind of API is reminiscent of other features like
SHOW QUERIES
, where each server has a local view of its queries and there is a cluster-level API through the DB console as well as SQL shell to query the global state.The proposal in this case is to keep an in-memory contention registry keyed by table ID/indexID pair that is updated by the DistSQL receiver when receiving contention events. When a
StatusServer
receives a request for a global contention view, it will broadcast a contention request to each node for its local contention view, and merge it with its own local view.Diving a bit deeper, what should this contention registry look like? It depends on the questions we want to answer. At a high level, we want to be able to answer which tables are experiencing contention and allow the user to dive deeper into a table to understand what key/row range is experiencing contention and which transactions are responsible for this contention. Therefore, the proposal is a top-level map keyed by a table ID with a value struct that looks something like:
This map cannot grow unboundedly, so an eviction policy needs to be put into place. As a simple starter, we can also keep track of the last contention event for a given table and use an LRU policy with a maximum size for the map. In a given map entry, we could similarly use an LRU policy for keys.
Merging these structs should be relatively straightforward.
Observations and remaining questions
When talking about a global contention view, we’ve talked about showing contended spans, but contention events only produce keys. I’ll follow up with KV to define what we’re going to do here better, otherwise I’m guessing we can slightly modify the contention event and have the table reader or cfetcher annotate the contention proto with the span that was passed to the fetcher and produced this contention event.
Need to figure out how to use transaction IDs to get richer transaction information.
Work Items
Out of scope
Getting information for currently hanging requests is out of scope. This will be exposed to users through a debug page by the KV team, so the information will be available. The proposal here is to surface the contention statistics the KV team will provide through execution metadata. High contention is usually a performance problem, not a stuck queries problem so it will be enough for users to be able to visualize the contention information for queries that have completed.
The text was updated successfully, but these errors were encountered: