-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node
label to prometheus metrics
#94763
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
node_id
label to prometheus metricsnode
label to prometheus metrics
If possible, |
Thanks for filing this @fanny-jiang. You're not the only one who has asked for this feature. We've got a busy roadmap at the moment but will follow up once we take a look at implementation options in a month or two. |
Previously, the output of the prometheus metrics via `_status/ vars` did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well. This change adds a `node_id` prometheus label to the metrics we output that matches the current node's ID. Since `_status/vars` is output from a single node there is only ever one single value that's appropriate here. Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process. Resolves: cockroachdb#94763 Epic: None Release note (ops change): Prometheus metrics available at the `_status/vars` path now contain a `node_id` label that identifies the node they were scraped from.
Previously, the output of the prometheus metrics via `_status/ vars` did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well. This change adds a `node_id` prometheus label to the metrics we output that matches the current node's ID. Since `_status/vars` is output from a single node there is only ever one single value that's appropriate here. Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process. Resolves: cockroachdb#94763 Epic: None Release note (ops change): Prometheus metrics available at the `_status/vars` path now contain a `node_id` label that identifies the node they were scraped from.
98640: server: add `node_id` label to _status/vars output r=aadityasondhi a=dhartunian Previously, the output of the prometheus metrics via `_status/ vars` did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well. This change adds a `node_id` prometheus label to the metrics we output that matches the current node's ID. Since `_status/vars` is output from a single node there is only ever one single value that's appropriate here. Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process. Resolves: #94763 Epic: None Release note (ops change): Prometheus metrics available at the `_status/vars` path now contain a `node_id` label that identifies the node they were scraped from. 99143: multitenant: NewIterator connector infinite retry loop r=stevendanna a=ecwall Fixes #98822 This change fixes an infinite retry loop in `Connector.NewIterator` that would occur when the `GetRangeDescriptors` stream returned an auth error. An example trigger would be passing in a span that was outside of the calling tenant's keyspace. Now `NewIterator` correctly propagates auth errors up to the caller. Release note: None Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Evan Wall <[email protected]>
Previously, the output of the prometheus metrics via `_status/ vars` did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well. This change adds a `node_id` prometheus label to the metrics we output that matches the current node's ID. Since `_status/vars` is output from a single node there is only ever one single value that's appropriate here. Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process. Resolves: #94763 Epic: None Release note (ops change): Prometheus metrics available at the `_status/vars` path now contain a `node_id` label that identifies the node they were scraped from.
Previously, the output of the prometheus metrics via `_status/ vars` did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well. This change adds a `node_id` prometheus label to the metrics we output that matches the current node's ID. Since `_status/vars` is output from a single node there is only ever one single value that's appropriate here. Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process. Resolves: cockroachdb#94763 Epic: None Release note (ops change): Prometheus metrics available at the `_status/vars` path now contain a `node_id` label that identifies the node they were scraped from.
Is your feature request related to a problem? Please describe.
Currently,
node_id
is submitted as a gauge metric. It is not present as a label for any of the emitted prometheus metrics from the_status/vars
monitoring endpoint. The CockroachDB Dedicated cloud Datadog integration tags metrics by node, whereas the self-hosted CockroachDB prometheus-based integration does not tag its metrics by node because the node label is not present.Having the
node_id
as a label will help with monitoring CockroachDB health by node.Describe the solution you'd like
Submit a
node
label with each node-specific prometheus metric.Describe alternatives you've considered
Transform the
node_id
metric value into a tag within the integration and apply this tag to all the ingested metrics. There is a concern that thenode
tag may get applied to the wrong metrics and this approach is hacky.Additional context
Motivation is for the self-hosted CockroachDB Datadog integration to have feature parity with the CockroachDB Dedicated integration.
Jira issue: CRDB-23129
The text was updated successfully, but these errors were encountered: