server: add `node_id` label to _status/vars output #98640

dhartunian · 2023-03-14T22:25:57Z

Previously, the output of the prometheus metrics via _status/ vars did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well.

This change adds a node_id prometheus label to the metrics we output that matches the current node's ID. Since _status/vars is output from a single node there is only ever one single value that's appropriate here.

Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process.

Resolves: #94763
Epic: None

Release note (ops change): Prometheus metrics available at the _status/vars path now contain a node_id label that identifies the node they were scraped from.

cockroach-teamcity · 2023-03-14T22:26:08Z

This change is

aadityasondhi

I think you will have to update a bunch of tests with the new NodeID label (CI failures)

Reviewable status: complete! 1 of 0 LGTMs obtained

dhartunian · 2023-03-20T17:38:56Z

bors r=aadityasondhi

craig · 2023-03-20T19:49:16Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-03-20T23:03:13Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-03-21T02:59:52Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-03-21T05:07:20Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-03-21T08:59:24Z

Build failed:

Bazel Essential CI (Cockroach)

knz · 2023-03-21T09:00:33Z

bors r=aadityasondhi single on

craig · 2023-03-21T10:52:11Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-03-21T13:09:34Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

adityamaru · 2023-03-21T13:42:50Z

@dhartunian the batch seems to be failing on TestMetricsRecorderLabels:

 recorder_test.go:240: recorder did not yield expected time series collection; diff:
         [[0].Source: "1" != "7" [0].Datapoints[0].Value: 1 != 7 [1].Source: "1-123" != "7-123" [2].Source: "1" != "7" [3].Source: "1-123" != "7-123"]

dhartunian · 2023-03-21T14:02:26Z

bors r-

craig · 2023-03-21T14:02:32Z

Canceled.

aadityasondhi · 2023-03-21T14:11:17Z

Sorry, I think my tsdb changes landed prior to this. The source fields now have the Tenant IDs in them. #98077

Previously, the output of the prometheus metrics via `_status/ vars` did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well. This change adds a `node_id` prometheus label to the metrics we output that matches the current node's ID. Since `_status/vars` is output from a single node there is only ever one single value that's appropriate here. Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process. Resolves: cockroachdb#94763 Epic: None Release note (ops change): Prometheus metrics available at the `_status/vars` path now contain a `node_id` label that identifies the node they were scraped from.

dhartunian · 2023-03-22T13:54:22Z

bors r=aadityasondhi

craig · 2023-03-22T15:07:46Z

Build succeeded:

Bazel Essential CI (Cockroach)

blathers-crl · 2023-03-22T15:08:01Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

Backport to branch 23.1.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

dhartunian requested a review from a team March 14, 2023 22:25

dhartunian requested review from a team as code owners March 14, 2023 22:25

aadityasondhi approved these changes Mar 15, 2023

View reviewed changes

dhartunian force-pushed the add-node-label-to-status-vars branch 2 times, most recently from d9c0d1c to 24fe9df Compare March 20, 2023 15:11

dhartunian force-pushed the add-node-label-to-status-vars branch from 24fe9df to f57e9f6 Compare March 21, 2023 14:44

dhartunian added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Mar 21, 2023

dhartunian force-pushed the add-node-label-to-status-vars branch from f57e9f6 to 2353a6a Compare March 21, 2023 22:15

craig bot merged commit 1e2ea17 into cockroachdb:master Mar 22, 2023

blathers-crl bot mentioned this pull request Mar 22, 2023

release-23.1: server: add node_id label to _status/vars output #99235

Merged

cockroach-teamcity mentioned this pull request Mar 23, 2023

PR #98640 - server: add node_id label to _status/vars output cockroachdb/docs#16581

Closed

dhartunian deleted the add-node-label-to-status-vars branch February 5, 2024 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add `node_id` label to _status/vars output #98640

server: add `node_id` label to _status/vars output #98640

dhartunian commented Mar 14, 2023

cockroach-teamcity commented Mar 14, 2023

aadityasondhi left a comment

dhartunian commented Mar 20, 2023

craig bot commented Mar 20, 2023

craig bot commented Mar 20, 2023

craig bot commented Mar 21, 2023

craig bot commented Mar 21, 2023

craig bot commented Mar 21, 2023

knz commented Mar 21, 2023

craig bot commented Mar 21, 2023

craig bot commented Mar 21, 2023

adityamaru commented Mar 21, 2023

dhartunian commented Mar 21, 2023

craig bot commented Mar 21, 2023

aadityasondhi commented Mar 21, 2023

dhartunian commented Mar 22, 2023

craig bot commented Mar 22, 2023

blathers-crl bot commented Mar 22, 2023

server: add node_id label to _status/vars output #98640

server: add node_id label to _status/vars output #98640

Conversation

dhartunian commented Mar 14, 2023

cockroach-teamcity commented Mar 14, 2023

aadityasondhi left a comment

Choose a reason for hiding this comment

dhartunian commented Mar 20, 2023

craig bot commented Mar 20, 2023

craig bot commented Mar 20, 2023

craig bot commented Mar 21, 2023

craig bot commented Mar 21, 2023

craig bot commented Mar 21, 2023

knz commented Mar 21, 2023

craig bot commented Mar 21, 2023

craig bot commented Mar 21, 2023

adityamaru commented Mar 21, 2023

dhartunian commented Mar 21, 2023

craig bot commented Mar 21, 2023

aadityasondhi commented Mar 21, 2023

dhartunian commented Mar 22, 2023

craig bot commented Mar 22, 2023

blathers-crl bot commented Mar 22, 2023

server: add `node_id` label to _status/vars output #98640

server: add `node_id` label to _status/vars output #98640