tsdb/ui: store-specific metrics don't filter by node (or store) correctly #102967
Labels
A-observability-inf
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
O-testcluster
Issues found or occurred on a test cluster, i.e. a long-running internal cluster
P-1
Issues/test failures with a fix SLA of 1 month
T-observability
Is your feature request related to a problem? Please describe.
Timeseries is stored under KV, like the rest of our data in CRDB.
A timeseries key contains a
source
field, which indicates information about which piece of infrastructure that metric is relevant to. For example, fornodeID = 789
, you'd have a key such as:The trailing
789
is thesource
component of the key. In this case, it tells us this metric was sourced from node ID 789.However, we also use this source field in the key to indicate which store ID the metric originated from. Metrics like
cr.store.raft.commandsapplied
use the store ID in the source field, in place of the node ID. So, for this metric originating from store ID 456, on node ID 789, the key would look like:This unfortunately creates problems with our timeseries chart UIs in DB Console. Both in our metric dashboards, as well as the custom timeseries chart tool in the advanced debug page, we have a
Node ID
filter dropdown that allows users to filter metrics to specific node IDs.In practice, what this does is it sets a filter on the query request to that node ID as the
source
to look for in the TSDB key.So, if you can imagine the following setup:
For a store-specific metric like
cr.store.raft.commandsapplied
, when you set the node ID filter in the UI toNodeID 1
, it's setting the source filter on the query request to1
. This means that the server is looking for keys that fit the following format:However, given what we know about these store-specific metrics, and our above example, the only keys available for this metric will look like (one for each store ID):
This means that the NodeID = 1 filter set in the request will come back empty. Effectively, the node ID filter in DB Console is broken for both metric dashboards and the custom timeseries chart tool for store-specific metrics.
Describe the solution you'd like
In the above example, if we filter the chart to NodeID = 1 for a store-specific, we should get back an aggregate of all the store metrics that exist on that node. To keep with our example, that means you'd want an aggregate of the following two keys:
Additionally, it might be a good idea to introduce a separate filter for store ID. Store-specific metrics are prefixed with
cr.store.*
, so we could conditionally show this filter dropdown in the UI depending on whether the metric is store-specific.Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
This discussion came from a bug reported during an escalation here: https://cockroachlabs.slack.com/archives/C01CNRP6TSN/p1683629621213809
Jira issue: CRDB-27762
The text was updated successfully, but these errors were encountered: