-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1: server: add node_id
label to _status/vars output
#99235
Conversation
Previously, the output of the prometheus metrics via `_status/ vars` did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well. This change adds a `node_id` prometheus label to the metrics we output that matches the current node's ID. Since `_status/vars` is output from a single node there is only ever one single value that's appropriate here. Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process. Resolves: #94763 Epic: None Release note (ops change): Prometheus metrics available at the `_status/vars` path now contain a `node_id` label that identifies the node they were scraped from.
4c428dd
to
23b5755
Compare
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
This change has been requested for a long time and will improve debugging quite a bit for anyone who scrapes metrics off of nodes and wants to remember what the node ID was without having to do additional work. Additionally, our 3rd party metrics integrations have been asking for this to improve their ingest process as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI Failure appears to be a flake/unrelated.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)
Backport 1/1 commits from #98640 on behalf of @dhartunian.
/cc @cockroachdb/release
Previously, the output of the prometheus metrics via
_status/ vars
did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well.This change adds a
node_id
prometheus label to the metrics we output that matches the current node's ID. Since_status/vars
is output from a single node there is only ever one single value that's appropriate here.Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process.
Resolves: #94763
Epic: None
Release note (ops change): Prometheus metrics available at the
_status/vars
path now contain anode_id
label that identifies the node they were scraped from.Release justification: low-risk high impact addition of a feature that should be in the dot zero.