Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add node_id label to _status/vars output #98640

Merged
merged 1 commit into from
Mar 22, 2023

Conversation

dhartunian
Copy link
Collaborator

Previously, the output of the prometheus metrics via _status/ vars did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well.

This change adds a node_id prometheus label to the metrics we output that matches the current node's ID. Since _status/vars is output from a single node there is only ever one single value that's appropriate here.

Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process.

Resolves: #94763
Epic: None

Release note (ops change): Prometheus metrics available at the _status/vars path now contain a node_id label that identifies the node they were scraped from.

@dhartunian dhartunian requested a review from a team March 14, 2023 22:25
@dhartunian dhartunian requested review from a team as code owners March 14, 2023 22:25
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I think you will have to update a bunch of tests with the new NodeID label (CI failures)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@dhartunian dhartunian force-pushed the add-node-label-to-status-vars branch 2 times, most recently from d9c0d1c to 24fe9df Compare March 20, 2023 15:11
@dhartunian
Copy link
Collaborator Author

bors r=aadityasondhi

@craig
Copy link
Contributor

craig bot commented Mar 20, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 20, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build failed:

@knz
Copy link
Contributor

knz commented Mar 21, 2023

bors r=aadityasondhi single on

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Build failed (retrying...):

@adityamaru
Copy link
Contributor

@dhartunian the batch seems to be failing on TestMetricsRecorderLabels:

 recorder_test.go:240: recorder did not yield expected time series collection; diff:
         [[0].Source: "1" != "7" [0].Datapoints[0].Value: 1 != 7 [1].Source: "1-123" != "7-123" [2].Source: "1" != "7" [3].Source: "1-123" != "7-123"]

@dhartunian
Copy link
Collaborator Author

bors r-

@craig
Copy link
Contributor

craig bot commented Mar 21, 2023

Canceled.

@aadityasondhi
Copy link
Collaborator

Sorry, I think my tsdb changes landed prior to this. The source fields now have the Tenant IDs in them. #98077

@dhartunian dhartunian force-pushed the add-node-label-to-status-vars branch from 24fe9df to f57e9f6 Compare March 21, 2023 14:44
@dhartunian dhartunian added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Mar 21, 2023
Previously, the output of the prometheus metrics via `_status/
vars` did not include any node labels. This caused challenges for
customers who want to monitor large clusters as it requires additional
configuration on the scrape- side to ensure a node ID is added to the
metrics. This can be challenging to deal with when nodes come and go
in a cluster and the scrape configuration must change as well.

This change adds a `node_id` prometheus label to the metrics we
output that matches the current node's ID. Since `_status/vars` is
output from a single node there is only ever one single value that's
appropriate here.

Secondary tenants will mark their metrics with either the nodeID of
the shared- process system tenant, or the instanceID of the tenant
process.

Resolves: cockroachdb#94763
Epic: None

Release note (ops change): Prometheus metrics available at the
`_status/vars` path now contain a `node_id` label that identifies the
node they were scraped from.
@dhartunian dhartunian force-pushed the add-node-label-to-status-vars branch from f57e9f6 to 2353a6a Compare March 21, 2023 22:15
@dhartunian
Copy link
Collaborator Author

bors r=aadityasondhi

@craig
Copy link
Contributor

craig bot commented Mar 22, 2023

Build succeeded:

@craig craig bot merged commit 1e2ea17 into cockroachdb:master Mar 22, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 22, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add node label to prometheus metrics
5 participants