-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I know which resource attributes are important enough to identify a Prometheus Metrics Stream? #1782
Comments
All proposed solutions seem to assume that the answer to the first question, "can I ensure these (labels) are part of the label-set used in Prometheus metrics" is yes, but perhaps it should not be yes? In other words, instead of introducing conceptual hierarchy in the resource attributes handle the issue via Views that can apply |
@yurishkuro the notion of |
I support @jsuereth's option 2 above, where resources and/or attributes gain an additional property about being descriptive, non-descriptive, or identifying in nature. I agree with @yurishkuro's notion that users should have a standard way to do this themselves: they already do. The Prometheus server does this outside the process, and the way it does this requires a pull-based architecture. I believe users should be able to attach resources to pushed data from external sources, as with "service discovery" in Prometheus. This question is connected with #1298. I would put information about whether a resource is descriptive, non-descriptive, or identifying into a schema definition, that way it doesn't have to be carried around inside protocol data itself. To mimic the kind of resource attachment done in Prometheus for a pushed-based system, we would:
|
nice! +1 however, I can easily see individual deployments defining their own semantic conventions for resource properties that may not reflect/match the official OTEl conventions (i.e. extend them), so it would have to be custom schemas too? |
For the above, "deployments defining their own semantic conventions", I could imagine APIs to register attribute keys with their intended categorization. You would register your attribute before use, and then the SDK could emit a Data URL containing the dynamic schema in the |
Ok, let's focus on requirements of such a design here then, in addition to use cases. Use cases
Requirements
ProposalIf we agree these are the requirements, I'm not sure the Schema-based proposal will work given how resources are structured. Specifically, if the user appends an attribute to a Resource, and the schema only lists minimal-identifying attributes, we'd drop those additions. Instead, what if we annotate the descriptive attributes in the schema url? ALSO we don't have a mechanism to append user-attribtues to a Resource that supports schemas (well) afaik. If folks agree, I'll look into proposing pointwise fixes to these two things via:
|
This was discussed at length in today's Prometheus-WG SIG. @brian-brazil pointed us at this blog post, which explains why Prometheus does not wish us to automatically upgrade resource attributes to metric labels: https://www.robustperception.io/why-cant-i-use-the-nodename-of-a-machine-as-the-instance-label The discussion circled around whether it is correct to translate OTel resource attributes into the OpenMetrics INFO metric, specifically the "target" metric which is intended for this sort of use. See https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#supporting-target-metadata-in-both-push-based-and-pull-based-systems We resolved to move forward with this approach. Here's what it means for both parties: For OpenTelemetry, this is a simple resolution to a longstanding question. To implement a Prometheus "Pull" exporter, translate Resource into target information. This can be done for SDK exporters as well as for the OTel Collector's Pull exporter. For Prometheus, a user who is interested in probing the OTel resources from within a Prometheus server configuration may use all the available means to do so. This means recording rules must be used, as I understand it, which IMO means that it will be difficult but not impossible to get at the OTel resource information. All is well though, this is according to design. We discussed several speculative directions for both projects if this becomes a problem for either group. Considering Prometheus, if the user finds it difficult to broadly apply OTel resources and wants to do so, it's like we're inserting a new stage of relabeling into the Prometheus server. After target relabeling (which happens before scape and which determines Considering OpenTelemetry, we're already developing tools and/or have processors to upgrade resource attributes to metric attributes. Those should be optional configurations, not default behaviors. Summary: resolution is to use target information metrics for Prometheus Pull exporters. (cc @dashpole will I believe work on adding this into #2017, which (I believe) he is taking over from @jsuereth. |
Should we also translate resource attributes into |
@dashpole Yes I think the Collector should emit a target INFO metric corresponding with every distinct Resource, but we need a way to represent multiple targets in a single scrape. target_info{resource_hash=xxxx, first resource, ...} 1 I am not sure how a user will script the necessary relabeling on the Prometheus side, but at least we know this data can be used and joined (somehow). For an individual SDK, this isn't needed and the Prometheus-side should be simpler, however it still requires "metrics relabeling" (i.e., the second stage relabeling done by Prometheus, which follows "target relabeling"). |
My understanding is: target == endpoint that is scraped, so multiple targets-per-scrape doesn't really make sense to me. If we are talking about the collector's remote-write exporter (i.e. a single resource-per-target), the
This would only be problematic if two metrics with the same metric1:
resourceattributes:
job: job1
instance: instance1
service.name: service1
...
metric2:
resourceattributes:
job: job1
instance: instance1
service.name: service2
... I don't think that should be common (or even possible?), but in that case, it seems reasonable to pick one set of resource attributes to convert to an info metric. |
@dashpole That sounds right for data that originates from a Prometheus receiver. This leaves us with a question for OTLP data pushed into the collected and pulled via a Prometheus Pull exporter, how to join those metrics with their target info? I had written For pushed data, |
If a job and instance allow variable service.names, for example, and a metrics processor batches requests across time, you might end up with multiple definitions in a batch. This is solvable, but we should be clear about our intentions as it starts to sound like we're specifying a system for service discovery in a pushed metrics system. (Are we?) |
If you have multiple Even if there isn't a well-defined "up" metric, "job" and "instance" still enables a join across target_info and actual metrics, which I think is what matters here. What breaks if we don't have an "up" metric from the exporter in the push -> collector -> prom endpoint case?
I don't think we are doing service discovery, just passing identifying information in different formats. OTLP is simple, because resource is explicitly grouped with metrics. For a non-federated (application's) prometheus endpoint, the resource (target_info) is applicable to everything on the endpoint. For a federated endpoint (i.e. prometheus pull exporter), the resource is only applicable to streams that share the same target (job+instance) information. |
@dashpole your statements about "job" and "instance" sound correct. (I just wonder about the case of data arriving from OTLP.) Do you feel we have enough information to proceed? |
What are you trying to achieve?
In OpenTelemetry, we consider
Resource
and its attributes part of a metric stream's identity. When exporting, this means resource labels help define the timeseries for that metric.Prometheus (and many metrics databases) begin to have issues with high-cardinality metric streams (Where there are many unique key-value pairs for a given metric of the same name). For example, this presentation from 2019 covers a lot of great advice for how to avoid high-cardinality, when it is acceptable and rules of thumb.
Specifically, I'm looking at the following advice:
When I look at OpenTelemetry and exporting metrics to prometheus, I'd like to support the following scenarios:
service.name
ormy.custom.app.label
and expect it to apply to all telemetry for correlation, can I ensure exported metrics have this?Resource
detection for various environments, should I be providing all the added labels? E.g. look at k8s vs. processk8s
appears to be a minimal set of labels for various k8s resources and would be highly useful as labels.process
includes command line arguments, and could be highly cardinal / less important in a monitoring database vs. an event database.Thoughts
This leads me to believe there may be TWO use cases for resource labels within opentelemetry:
I see a few ways we can work towards solving this:
The text was updated successfully, but these errors were encountered: