Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node label to prometheus metrics #94763

Closed
fanny-jiang opened this issue Jan 5, 2023 · 3 comments · Fixed by #98640
Closed

Add node label to prometheus metrics #94763

fanny-jiang opened this issue Jan 5, 2023 · 3 comments · Fixed by #98640
Assignees
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Comments

@fanny-jiang
Copy link

fanny-jiang commented Jan 5, 2023

Is your feature request related to a problem? Please describe.
Currently, node_id is submitted as a gauge metric. It is not present as a label for any of the emitted prometheus metrics from the _status/vars monitoring endpoint. The CockroachDB Dedicated cloud Datadog integration tags metrics by node, whereas the self-hosted CockroachDB prometheus-based integration does not tag its metrics by node because the node label is not present.

Having the node_id as a label will help with monitoring CockroachDB health by node.

Describe the solution you'd like
Submit a node label with each node-specific prometheus metric.

Describe alternatives you've considered
Transform the node_id metric value into a tag within the integration and apply this tag to all the ingested metrics. There is a concern that the node tag may get applied to the wrong metrics and this approach is hacky.

Additional context
Motivation is for the self-hosted CockroachDB Datadog integration to have feature parity with the CockroachDB Dedicated integration.

Jira issue: CRDB-23129

@fanny-jiang fanny-jiang added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 5, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jan 5, 2023

Hello, I am Blathers. I am here to help you get the issue triaged.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jan 5, 2023
@fanny-jiang fanny-jiang changed the title Add node_id label to prometheus metrics Add node label to prometheus metrics Jan 5, 2023
@fanny-jiang
Copy link
Author

fanny-jiang commented Jan 18, 2023

If possible, clusterName and/or clusterID, and region would be great too to have as labels.

@dhartunian
Copy link
Collaborator

Thanks for filing this @fanny-jiang. You're not the only one who has asked for this feature. We've got a busy roadmap at the moment but will follow up once we take a look at implementation options in a month or two.

dhartunian added a commit to dhartunian/cockroach that referenced this issue Mar 14, 2023
Previously, the output of the prometheus metrics via `_status/
vars` did not include any node labels. This caused challenges for
customers who want to monitor large clusters as it requires additional
configuration on the scrape- side to ensure a node ID is added to the
metrics. This can be challenging to deal with when nodes come and go
in a cluster and the scrape configuration must change as well.

This change adds a `node_id` prometheus label to the metrics we
output that matches the current node's ID. Since `_status/vars` is
output from a single node there is only ever one single value that's
appropriate here.

Secondary tenants will mark their metrics with either the nodeID of
the shared- process system tenant, or the instanceID of the tenant
process.

Resolves: cockroachdb#94763
Epic: None

Release note (ops change): Prometheus metrics available at the
`_status/vars` path now contain a `node_id` label that identifies the
node they were scraped from.
dhartunian added a commit to dhartunian/cockroach that referenced this issue Mar 20, 2023
Previously, the output of the prometheus metrics via `_status/
vars` did not include any node labels. This caused challenges for
customers who want to monitor large clusters as it requires additional
configuration on the scrape- side to ensure a node ID is added to the
metrics. This can be challenging to deal with when nodes come and go
in a cluster and the scrape configuration must change as well.

This change adds a `node_id` prometheus label to the metrics we
output that matches the current node's ID. Since `_status/vars` is
output from a single node there is only ever one single value that's
appropriate here.

Secondary tenants will mark their metrics with either the nodeID of
the shared- process system tenant, or the instanceID of the tenant
process.

Resolves: cockroachdb#94763
Epic: None

Release note (ops change): Prometheus metrics available at the
`_status/vars` path now contain a `node_id` label that identifies the
node they were scraped from.
craig bot pushed a commit that referenced this issue Mar 22, 2023
98640: server: add `node_id` label to _status/vars output r=aadityasondhi a=dhartunian

Previously, the output of the prometheus metrics via `_status/ vars` did not include any node labels. This caused challenges for customers who want to monitor large clusters as it requires additional configuration on the scrape- side to ensure a node ID is added to the metrics. This can be challenging to deal with when nodes come and go in a cluster and the scrape configuration must change as well.

This change adds a `node_id` prometheus label to the metrics we output that matches the current node's ID. Since `_status/vars` is output from a single node there is only ever one single value that's appropriate here.

Secondary tenants will mark their metrics with either the nodeID of the shared- process system tenant, or the instanceID of the tenant process.

Resolves: #94763
Epic: None

Release note (ops change): Prometheus metrics available at the `_status/vars` path now contain a `node_id` label that identifies the node they were scraped from.

99143: multitenant: NewIterator connector infinite retry loop r=stevendanna a=ecwall

Fixes #98822

This change fixes an infinite retry loop in `Connector.NewIterator` that would occur when the `GetRangeDescriptors` stream returned an auth error. An example trigger would be passing in a span that was outside of the calling tenant's keyspace.

Now `NewIterator` correctly propagates auth errors up to the caller.

Release note: None

Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
@craig craig bot closed this as completed in 2353a6a Mar 22, 2023
blathers-crl bot pushed a commit that referenced this issue Mar 22, 2023
Previously, the output of the prometheus metrics via `_status/
vars` did not include any node labels. This caused challenges for
customers who want to monitor large clusters as it requires additional
configuration on the scrape- side to ensure a node ID is added to the
metrics. This can be challenging to deal with when nodes come and go
in a cluster and the scrape configuration must change as well.

This change adds a `node_id` prometheus label to the metrics we
output that matches the current node's ID. Since `_status/vars` is
output from a single node there is only ever one single value that's
appropriate here.

Secondary tenants will mark their metrics with either the nodeID of
the shared- process system tenant, or the instanceID of the tenant
process.

Resolves: #94763
Epic: None

Release note (ops change): Prometheus metrics available at the
`_status/vars` path now contain a `node_id` label that identifies the
node they were scraped from.
xinhaoz pushed a commit to xinhaoz/cockroach that referenced this issue Mar 30, 2023
Previously, the output of the prometheus metrics via `_status/
vars` did not include any node labels. This caused challenges for
customers who want to monitor large clusters as it requires additional
configuration on the scrape- side to ensure a node ID is added to the
metrics. This can be challenging to deal with when nodes come and go
in a cluster and the scrape configuration must change as well.

This change adds a `node_id` prometheus label to the metrics we
output that matches the current node's ID. Since `_status/vars` is
output from a single node there is only ever one single value that's
appropriate here.

Secondary tenants will mark their metrics with either the nodeID of
the shared- process system tenant, or the instanceID of the tenant
process.

Resolves: cockroachdb#94763
Epic: None

Release note (ops change): Prometheus metrics available at the
`_status/vars` path now contain a `node_id` label that identifies the
node they were scraped from.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants