ui: `nodes_ui` endpoint response should be reduced in size for very large clusters #129408

dhartunian · 2024-08-21T14:57:22Z

Today, the nodes_ui endpoint that serves DB Console can balloon in size quite severely if the cluster has hundreds of nodes. We observed this on a cluster with 100s of dead nodes which made this payload grow to 37MiB.

This payload contains may pieces of information that are likely not immediately necessary to the function of DB Console. We should either reduce the amount of info here, or break it up into separate requests so that we can quickly load the nodes list on the overview page and get the app functional quickly when there are 100s of nodes.

Jira issue: CRDB-41527

The text was updated successfully, but these errors were encountered:

The /_status/nodes_ui grpc API is used by many db-console pages to show node data relevant information. This API is extremely heavy and includes all node and node store related metrics. To give some perspective, the current drt-scale cluster's nodes_ui API call has a payload of size of ~8.4MB. As a result, this request is taking ~2.75s to complete in db-console. As a partial remedy to this, this patch will filter down the node and node store metrics to only return metrics needed by db-console. This list of metrics was determined by the `MetricsConstants` variable defined here: https://github.com/cockroachdb/cockroach/blob/d5f328ea6f3efd8fbe631c97d59f7b74307d22f9/pkg/ui/workspaces/db-console/src/util/proto.ts#L55 This patch does not include any changes to the underlying data in KV, meaning the full NodeStatus objects (which includes the metrics) are still fetched from KV and unmarshalled. That being said, this patch reduces the cost of the full metrics payload back into a serverpb.NodeResponse protobuf, sending it over the wire, and decoding it into json. Testing locally with a demo tpcc cluster with 20 nodes, the payload of nodes_ui on a new cluster was around 530kb before this change, and 8kb after. Resolves: cockroachdb#129408 Epic: None Release note (performance improvement): the /_status/nodes_ui API no longer returns unnecessary metrics in its response. This decreases the payload size of the API and improves the load time of various db-console pages and components.

This reverts commit 35d00d5. Fixes: cockroachdb#129408 Epic: none Release note: none

136005: server: reapply "server: decrease nodes_ui response size" r=kyle-a-wong a=kyle-a-wong This reverts commit 35d00d5. This commit was originally reverted because it broke the customs chart component. This component was previously dependent on the nodes_ui metrics to populate the list of queryable metrics from TSDB. This was fixed in #135705, so this commit can be reapplied Fixes: #129408 Epic: none Release note: none Co-authored-by: Kyle Wong <[email protected]>

dhartunian added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) P-3 Issues/test failures with no fix SLA T-observability labels Aug 21, 2024

vidit-bhat added the O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster label Aug 21, 2024

exalate-issue-sync bot assigned kyle-a-wong Nov 12, 2024

kyle-a-wong mentioned this issue Nov 14, 2024

server: decrease nodes_ui response size #135186

Merged

craig bot closed this as completed in f927757 Nov 14, 2024

kyle-a-wong mentioned this issue Nov 14, 2024

release-24.3: server: decrease nodes_ui response size #135209

Merged

exalate-issue-sync bot reopened this Nov 18, 2024

kyle-a-wong mentioned this issue Nov 20, 2024

observability: long time to load the Cluster Overview and Hot Ranges #134200

Closed

kyle-a-wong added a commit to kyle-a-wong/cockroach that referenced this issue Nov 22, 2024

Reapply "server: decrease nodes_ui response size"

e23a530

This reverts commit 35d00d5. Fixes: cockroachdb#129408 Epic: none Release note: none

kyle-a-wong mentioned this issue Nov 22, 2024

server: reapply "server: decrease nodes_ui response size" #136005

Merged

craig bot closed this as completed in 30cbc00 Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ui: `nodes_ui` endpoint response should be reduced in size for very large clusters #129408

ui: `nodes_ui` endpoint response should be reduced in size for very large clusters #129408

dhartunian commented Aug 21, 2024 •

edited by cockroach-jira-scripts

Loading

ui: nodes_ui endpoint response should be reduced in size for very large clusters #129408

ui: nodes_ui endpoint response should be reduced in size for very large clusters #129408

Comments

dhartunian commented Aug 21, 2024 • edited by cockroach-jira-scripts Loading

ui: `nodes_ui` endpoint response should be reduced in size for very large clusters #129408

ui: `nodes_ui` endpoint response should be reduced in size for very large clusters #129408

dhartunian commented Aug 21, 2024 •

edited by cockroach-jira-scripts

Loading