Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I know which resource attributes are important enough to identify a Prometheus Metrics Stream? #1782

Closed
jsuereth opened this issue Jun 29, 2021 · 14 comments · Fixed by #2381
Assignees
Labels
area:semantic-conventions Related to semantic conventions spec:resource Related to the specification/resource directory

Comments

@jsuereth
Copy link
Contributor

What are you trying to achieve?

In OpenTelemetry, we consider Resource and its attributes part of a metric stream's identity. When exporting, this means resource labels help define the timeseries for that metric.

Prometheus (and many metrics databases) begin to have issues with high-cardinality metric streams (Where there are many unique key-value pairs for a given metric of the same name). For example, this presentation from 2019 covers a lot of great advice for how to avoid high-cardinality, when it is acceptable and rules of thumb.

Specifically, I'm looking at the following advice:

Does the label help someone woken up at 2am identify the problem?
Is there a predictable number of values for the label?
If the answer to the above questions is no, consider having the label on an event log instead of in Prometheus.

When I look at OpenTelemetry and exporting metrics to prometheus, I'd like to support the following scenarios:

  • If a user leverages OTEL_RESOURCE_ATTRIBUTES, can I ensure these are part of the label-set used in Prometheus metrics? E.g. if a user is adding service.name or my.custom.app.label and expect it to apply to all telemetry for correlation, can I ensure exported metrics have this?
  • When exporting to Prometheus and leveraging Resource detection for various environments, should I be providing all the added labels? E.g. look at k8s vs. process
    • k8s appears to be a minimal set of labels for various k8s resources and would be highly useful as labels.
    • process includes command line arguments, and could be highly cardinal / less important in a monitoring database vs. an event database.

Thoughts
This leads me to believe there may be TWO use cases for resource labels within opentelemetry:

  1. High fidelity query-able metadata
  2. Highly-Identifying targeted labels for time-series databases with high-cardinality restrictions.

I see a few ways we can work towards solving this:

  • We could limit resource attribute semantic conventions to the minimum identifying set to ease the problem TSDBs have.
  • We could add a feature to Resources where they have "identifying" and "descriptive" attributes as separate groups.
    • Identifying attributes would be the minimum set and always usable for metrics in TSDBs
    • Descriptive attributes would be something exporters can use when their backends handle high cardinality better (e.g. Tracing systems, influxdb, etc).
  • We could create a "common helper" or "SDK extension" that makes Resource -> Metric Attribute configuration similar across the various metric systems, and standardize on users solving the high-cardinality problem themselves.
@jsuereth jsuereth added area:semantic-conventions Related to semantic conventions spec:resource Related to the specification/resource directory labels Jun 29, 2021
@Oberon00 Oberon00 changed the title How do I know which resource attributes are important enough to idenitfy a Prometheus Metrics Stream? How do I know which resource attributes are important enough to identify a Prometheus Metrics Stream? Jun 29, 2021
@yurishkuro
Copy link
Member

All proposed solutions seem to assume that the answer to the first question, "can I ensure these (labels) are part of the label-set used in Prometheus metrics" is yes, but perhaps it should not be yes? In other words, instead of introducing conceptual hierarchy in the resource attributes handle the issue via Views that can apply customFilters(generalResource) -> metricsResource

@jsuereth
Copy link
Contributor Author

@yurishkuro the notion of customFilters(generalResource) -> metricsResource is my suggestion #3 (where users have a standard way to configure this themselves)

@jmacd
Copy link
Contributor

jmacd commented Jun 30, 2021

I support @jsuereth's option 2 above, where resources and/or attributes gain an additional property about being descriptive, non-descriptive, or identifying in nature.

I agree with @yurishkuro's notion that users should have a standard way to do this themselves: they already do. The Prometheus server does this outside the process, and the way it does this requires a pull-based architecture. I believe users should be able to attach resources to pushed data from external sources, as with "service discovery" in Prometheus. This question is connected with #1298.

I would put information about whether a resource is descriptive, non-descriptive, or identifying into a schema definition, that way it doesn't have to be carried around inside protocol data itself. To mimic the kind of resource attachment done in Prometheus for a pushed-based system, we would:

  1. Prepare a schema definition stating which attribute(s) are used as the minimum identifying set
  2. Include the minimum identifying set of attributes in the resource pushed by the SDK
  3. Use collector configuration, the schema_url, and identifying values to look up service discovery resources
  4. Allow relabeling the way Prometheus has.

@yurishkuro
Copy link
Member

I would put information about whether a resource is descriptive, non-descriptive, or identifying into a schema definition

nice! +1

however, I can easily see individual deployments defining their own semantic conventions for resource properties that may not reflect/match the official OTEl conventions (i.e. extend them), so it would have to be custom schemas too?

@jmacd
Copy link
Contributor

jmacd commented Jul 8, 2021

For the above, "deployments defining their own semantic conventions", I could imagine APIs to register attribute keys with their intended categorization. You would register your attribute before use, and then the SDK could emit a Data URL containing the dynamic schema in the schema_url attribute, for example.

@jsuereth
Copy link
Contributor Author

jsuereth commented Jul 13, 2021

Ok, let's focus on requirements of such a design here then, in addition to use cases.

Use cases

  1. A user can leverage OTEL_RESOURCE_ATTRIBUTES (or other out-of-process mechanism) to append labels to a resource. This enables a myriad of use cases including (but not limited to) canary rollouts where canary + past-release can be directly compared using resource attributes appended via the environment.
  2. An instrumentor or SDK author can provide a Resource detector that includes as many descriptive labels for easy querying of telemetry. These descriptive labels can be annotated as such via some mechanism.
  3. An exporter (specifically for metrics) that has limited ability to handle attribute cardinality can easily determine "core" resource labels for inclusion in metric stream vs. descriptive labels which can be dropped, preserving resource identity. Use can configure this via a simple "Include resource attributes in signal attributes" boolean.

Requirements

  • Resource detection, when authored, has some location where a "minimal" set of identifying attributes can be defined.
  • There is some mechanism (via collector or environment variables or other "external to SDK code") mechanism to add labels to a Resource. These attributes would be considered identifying/important.
  • Exporters (specifically Prometheus + other systems that fight high-cardinality issues) can easily determine "minmal" identifying attributes in addition to being able to preserve any user-appended attributes via "non resource detection" mechanisms.

Proposal

If we agree these are the requirements, I'm not sure the Schema-based proposal will work given how resources are structured. Specifically, if the user appends an attribute to a Resource, and the schema only lists minimal-identifying attributes, we'd drop those additions. Instead, what if we annotate the descriptive attributes in the schema url? ALSO we don't have a mechanism to append user-attribtues to a Resource that supports schemas (well) afaik. If folks agree, I'll look into proposing pointwise fixes to these two things via:

  1. Schemas defining descriptive resource labels in a way exporters can consume and use this (Including custom-schemas)
  2. A formal mechanism for users to append attributes to Resources via environment/config for various use cases that interacts well with schemas.

@jmacd
Copy link
Contributor

jmacd commented Jan 12, 2022

This was discussed at length in today's Prometheus-WG SIG.

@brian-brazil pointed us at this blog post, which explains why Prometheus does not wish us to automatically upgrade resource attributes to metric labels: https://www.robustperception.io/why-cant-i-use-the-nodename-of-a-machine-as-the-instance-label

The discussion circled around whether it is correct to translate OTel resource attributes into the OpenMetrics INFO metric, specifically the "target" metric which is intended for this sort of use. See https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#supporting-target-metadata-in-both-push-based-and-pull-based-systems

We resolved to move forward with this approach. Here's what it means for both parties:

For OpenTelemetry, this is a simple resolution to a longstanding question. To implement a Prometheus "Pull" exporter, translate Resource into target information. This can be done for SDK exporters as well as for the OTel Collector's Pull exporter.

For Prometheus, a user who is interested in probing the OTel resources from within a Prometheus server configuration may use all the available means to do so. This means recording rules must be used, as I understand it, which IMO means that it will be difficult but not impossible to get at the OTel resource information. All is well though, this is according to design.

We discussed several speculative directions for both projects if this becomes a problem for either group.

Considering Prometheus, if the user finds it difficult to broadly apply OTel resources and wants to do so, it's like we're inserting a new stage of relabeling into the Prometheus server. After target relabeling (which happens before scape and which determines up), but before metric relabeling (where anything is possible, but you're writing queries), could we imagine a third relabeling step that only applies to info? Each "target" info label would be presented w/ two underscores, , __name__ would be defined as per the convention, and relabeling would be allowed much like the original target relabeling. This is just an idea to make it easier to use info if it becomes a problem.

Considering OpenTelemetry, we're already developing tools and/or have processors to upgrade resource attributes to metric attributes. Those should be optional configurations, not default behaviors.

Summary: resolution is to use target information metrics for Prometheus Pull exporters. (cc @dashpole will I believe work on adding this into #2017, which (I believe) he is taking over from @jsuereth.

@dashpole
Copy link
Contributor

For OpenTelemetry, this is a simple resolution to a longstanding question. To implement a Prometheus "Pull" exporter, translate Resource into target information. This can be done for SDK exporters as well as for the OTel Collector's Pull exporter.

Should we also translate resource attributes into target_info for the collector's prometheus remote-write exporter? Currently, resource attributes other than job and instance are dropped.

@jmacd
Copy link
Contributor

jmacd commented Jan 13, 2022

@dashpole Yes I think the Collector should emit a target INFO metric corresponding with every distinct Resource, but we need a way to represent multiple targets in a single scrape.

target_info{resource_hash=xxxx, first resource, ...} 1
first resource metric1{...,resource_hash=xxxx} ...
first resource metric2{...,resource_hash=xxxx} ...
target_info{resource_hash=yyyy, second resource, ...} 1
second resource metric1{...,resource_hash=yyyy} ...
second resource metric2{...,resource_hash=yyyy} ...
...

I am not sure how a user will script the necessary relabeling on the Prometheus side, but at least we know this data can be used and joined (somehow).

For an individual SDK, this isn't needed and the Prometheus-side should be simpler, however it still requires "metrics relabeling" (i.e., the second stage relabeling done by Prometheus, which follows "target relabeling").

@dashpole
Copy link
Contributor

multiple targets in a single scrape

My understanding is: target == endpoint that is scraped, so multiple targets-per-scrape doesn't really make sense to me.

If we are talking about the collector's remote-write exporter (i.e. a single resource-per-target), the job and instance labels should allow joining with target info:

target_info{job=job1,instance=instance1,service_name=service1,...} 1
my_prometheus_metric1{...,job=job1,instance=instance1,...} ...
my_prometheus_metric2{...,job=job1,instance=instance1,...} ...
target_info{job=job2,instance=instance2,service_name=service2,...} 1
someone_elses_prometheus_metric1{...,job=job2,instance=instance2,...} ...
someone_elses_prometheus_metric2{...,job=job2,instance=instance2,...} ...

This would only be problematic if two metrics with the same job and instance resource labels came with conflicting resource information (sorry for bad otlp):

metric1:
  resourceattributes:
    job: job1
    instance: instance1
    service.name: service1
  ...
metric2:
  resourceattributes:
    job: job1
    instance: instance1
    service.name: service2
  ...

I don't think that should be common (or even possible?), but in that case, it seems reasonable to pick one set of resource attributes to convert to an info metric.

@jmacd
Copy link
Contributor

jmacd commented Jan 13, 2022

@dashpole That sounds right for data that originates from a Prometheus receiver. This leaves us with a question for OTLP data pushed into the collected and pulled via a Prometheus Pull exporter, how to join those metrics with their target info? I had written resource_hash above as a placeholder, but any schema will do. We could adopt "job" and "instance", but I would call that the Prometheus schema and would not use "job" and "instance" unless there is a well-defined "up" metric.

For pushed data, resource_hash could work, or another schema like service.name and service.instance.id would work.

@jmacd
Copy link
Contributor

jmacd commented Jan 13, 2022

I don't think that should be common (or even possible?), but in that case, it seems reasonable to pick one set of resource attributes to convert to an info metric.

If a job and instance allow variable service.names, for example, and a metrics processor batches requests across time, you might end up with multiple definitions in a batch. This is solvable, but we should be clear about our intentions as it starts to sound like we're specifying a system for service discovery in a pushed metrics system. (Are we?)

@dashpole
Copy link
Contributor

This leaves us with a question for OTLP data pushed into the collected and pulled via a Prometheus Pull exporter, how to join those metrics with their target info? I had written resource_hash above as a placeholder, but any schema will do. We could adopt "job" and "instance", but I would call that the Prometheus schema and would not use "job" and "instance" unless there is a well-defined "up" metric.

If you have multiple target_info metrics in an endpoint, I would expect the endpoint to have multiple targets (i.e. unique job+instance label combos), and match what a federated prometheus endpoint does. Because honor_labels specifically is designed to preserve target information from scraped endpoints, job+instance replicate what we would expect from a federated endpoint, and would make it easy for downstream prom servers to scrape it correctly.

Even if there isn't a well-defined "up" metric, "job" and "instance" still enables a join across target_info and actual metrics, which I think is what matters here. What breaks if we don't have an "up" metric from the exporter in the push -> collector -> prom endpoint case?

you might end up with multiple definitions in a batch. This is solvable, but we should be clear about our intentions as it starts to sound like we're specifying a system for service discovery in a pushed metrics system. (Are we?)

I don't think we are doing service discovery, just passing identifying information in different formats. OTLP is simple, because resource is explicitly grouped with metrics. For a non-federated (application's) prometheus endpoint, the resource (target_info) is applicable to everything on the endpoint. For a federated endpoint (i.e. prometheus pull exporter), the resource is only applicable to streams that share the same target (job+instance) information.

@jmacd
Copy link
Contributor

jmacd commented Jan 18, 2022

@dashpole your statements about "job" and "instance" sound correct. (I just wonder about the case of data arriving from OTLP.) Do you feel we have enough information to proceed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:resource Related to the specification/resource directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants