prefix should be consistent for internal metrics #9315

codeboten · 2024-01-18T18:38:13Z

Currently, metrics generated by the collector include the prefix otelcol as configured here:

opentelemetry-collector/service/internal/proctelemetry/config.go

Line 215 in 8131d31

otelprom.WithNamespace("otelcol"),

This is not the case when configuring metrics to be emitted via OTLP, as the prefix configuration is specific to prometheus. I'd like to propose the prefix be applied to the metric names, so that they remain consistent when users switch from one mechanism to another.

The text was updated successfully, but these errors were encountered:

codeboten · 2024-01-18T18:40:28Z

Note that there's also an OTEP that has specific metrics that may require a rename later on open-telemetry/oteps#238, but adding a prefix would still be useful in the short term.

bogdandrutu · 2024-01-18T22:38:47Z

My vote is to NOT do this, and to if needed use relevant names for the meter that produces these metrics.

codeboten · 2024-01-23T17:16:44Z

My vote is to NOT do this, and to if needed use relevant names for the meter that produces these metrics.

@bogdandrutu As an end-user, wouldn't this potentially cause a problem? In the sense that this would rely on my backend is to interpret the meter name and apply it as a prefix if I wanted to be able to continue using my dashboards? Or I would end up having to double generate the telemetry? And then cut-over once I'm satisfied w/ the level of historical data about my collector to.

Would an alternative here would be to strip the prefix from the prometheus generated metrics via feature gate move towards using target_info to provide a namespace? I'm not sure if this would work, maybe @dashpole has an idea here?

I guess ultimately, what I'm trying to achieve here is a recommended path moving forward for end-users that would like to transition from prom metrics to OTLP metrics (or maybe even use both together if there's a case for this, i don't know)

dashpole · 2024-01-23T19:34:18Z

Would an alternative here would be to strip the prefix from the prometheus generated metrics via feature gate move towards using target_info to provide a namespace? I'm not sure if this would work, maybe @dashpole has an idea here?

The instrumentation scope is the recommended mechanism to use to "namespace" metrics in prometheus going forward. It will appear as a label on prometheus metrics, and you will get an "otel_scope_info" metric if you add scope attributes.

But I would encourage us to try to move away from having an additional prefix only for prometheus metrics. I'm working to allow the same names to be used in otel + prometheus in the near future, at which point we should have the exact same names.

Edit: Note that i'm not arguing for or against a prefix for the metric. I'm just saying we should use the same name for prometheus and other exporters, rather than having special naming for prometheus.

jmacd · 2024-01-23T20:01:58Z

I went to look for supporting evidence in the specification for metrics.
The SDK spec says that meter name should be treated as namespaces for detecting duplicates, for example: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#meter
There's the idea that meter name is the name of the instrumentation library, which may vary according to instrumentation setup (in my interpretation): https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#get-a-meter

The way I understand scope name, in any case, is that it is meant to distinguish alternative producers of a set of metric names. The reason I like this proposal, to have a "otelcol" prefix placed into the metric name, not the scope name, is that OTel Collector is the one specifying the interpretation of the metrics. If I want to build my own alternate collector, let's say, with a different mechanism for the obsreport functionality, I should be able to do so and I will emit whatever are the specified metric names using a different scope name. I would not expect the consumer to interpret my alternate implementation of the standard metrics as different metrics.

The reason to support this issue, that we retain an otelcol prefix, is not only pragmatic. The most immediate source of confusion that we get from not using a prefix here, is that OTel SDKs require conventional metric names for all the same things that collectors do. If the collectors try to say that exporter.spans.sent means something specifically, then what will the OTel SDK do for its own conventions about counting spans that were exported successfully? It's the same idea, but the meaning of SDK drops and Collector drops is sufficiently different.

That is why, in OTEP 238, I propose that we use otelcol as a prefix for Collector pipeline metrics and otelsdk as a prefix for the SDK pipeline metrics, where the interpretation of the metric names is consistent but the meaning (an SDK dropping vs a collector dropping) is different.

bogdandrutu · 2024-02-05T18:22:30Z

I completely disagree with the name prefix for the same reason we don't want a "cpu.usage" metric to be named "otecol.cpu.usage" and "otelsdk.cpu.usage" those are wrong names in my opinion and we moving backwards from an ideal goal. I still strongly believe that otecol for our case should be part of the "scope/instrumentation" that signal that we have an otelcol instrumentation library that instruments this. For the same reason if (mylibrary) an http library instruments its own metric we will not name them mylibrary.http.request.count but http.request.count and the instrumentation scope will be mylibrary.

jmacd · 2024-02-06T17:08:20Z

@bogdandrutu I would like you to be clear about why you think cpu.usage and http.request.count are the same as the specialized metric definitions used to characterize the behavior of an OpenTelemetry Collector.

Neither cpu.usage nor http.request.count is an existing semantic convention, so they're not great to use for this hypothetical discussion.

We specify utilization instead of CPU usage because the limit is governed as a rate, i.e., it's a function of time -- because the limit is a function of time, not an absolute quantity. Nevertheless, a cpu.utilization is too generic because it can be measured in multiple ways. If we're counting system utilization from the host's perspective, we'll use system.cpu.utilization and if we're counting process utilization from the system's perspective, we'll use process.cpu.utilization, I suppose. The qualifier here is not a namespace, it's a descriptor saying what kind of CPU that you are measuring the utilization of because the generic cpu is not specific enough.

http.request.count is also not specified -- as you've written it, the .count indicates an up-down-counter semantic: https://opentelemetry.io/docs/specs/semconv/general/metrics/#naming-rules-for-counters-and-updowncounters. The semantic convention we do have is http.{client,server}.request.duration, a histogram that defines the count you're referring to. Anyway, if we have a semantic convention to for counting http requests, it has to be a very generic counter.

In the case of metrics designed to monitor the OpenTelemetry collector, I'm not sure a generic name like "exporter_spans_sent" is specific enough, because any piece of code that exports spans could reasonably use that metric name. I'm specifically looking for metric names that an SDK can emit (how many spans did you export?) which I can compare with metrics that a Collector can emit (how many spans did you export?). In my work on open-telemetry/oteps#238, (which is closed pending further work), I came to recommend that SDKs and Collectors not use the same name even if the task is nearly the same ("exporting spans") for different reasons.

Still, the name exporter_spans_sent is very generic, considering all the environments in which a Collector might run. If a vendor incorporates the OTel collector code into an agent of their own, or runs it alongside their own agents, they'll have to ensure a different metric name is used for this generic concept. So then I'll have myvendor_exporter_spans_sent and the OTC will have exporter_spans_sent? I don't think that's reasonable. If you have a very specific semantic convention, the name of the metric instrument should be specific too.

Further, you didn't respond to this point:

The way I understand scope name, in any case, is that it is meant to distinguish alternative producers of a set of metric names.

We can instrument an OTC pipeline by integrating with each of the helper libraries, connecting an OTel SDK, and generating metrics using a conventional approach. We leave the scope_name field in the data model free for use so that alternative approaches can generate the same metric names. I might, for example, imagine a single event log written by the collector describing every component invocation, including additional detail not present in the conventional instrumentation. I will be able to process that log and synthesize the same metrics that conventional instrumentation would produce, but I will need to use a different scope name, since a different instrumentation library produced the metrics.

andrzej-stencel · 2024-02-08T08:03:17Z

I'm for the short term solution of adding the otelcol prefix to the metrics. If I understand correctly, the long term solution will be discussed in open-telemetry/oteps#238 / open-telemetry/oteps#249.

codeboten · 2024-03-06T17:37:08Z

As discussed on the SIG call on 6-Mar-2024, the concensus that prefixes were better in the short term to be consistent. Additionally, using _ instead of / will allow both metric outputs to be consistent.

codeboten mentioned this issue Feb 5, 2024

[mdatagen] add CustomMetricName func #9455

Closed

mx-psi added this to the Self observability milestone May 24, 2024

codeboten mentioned this issue Jul 16, 2024

update all metrics to include otelcol_ prefix #9759

Merged

codeboten closed this as completed in #9759 Jul 16, 2024

codeboten closed this as completed in 5753a58 Jul 16, 2024

danelson mentioned this issue Jul 23, 2024

Collector internal telemetry updates open-telemetry/opentelemetry.io#4867

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefix should be consistent for internal metrics #9315

prefix should be consistent for internal metrics #9315

codeboten commented Jan 18, 2024

codeboten commented Jan 18, 2024

bogdandrutu commented Jan 18, 2024 •

edited

Loading

codeboten commented Jan 23, 2024

dashpole commented Jan 23, 2024 •

edited

Loading

jmacd commented Jan 23, 2024

bogdandrutu commented Feb 5, 2024 •

edited

Loading

jmacd commented Feb 6, 2024

andrzej-stencel commented Feb 8, 2024

codeboten commented Mar 6, 2024

prefix should be consistent for internal metrics #9315

prefix should be consistent for internal metrics #9315

Comments

codeboten commented Jan 18, 2024

codeboten commented Jan 18, 2024

bogdandrutu commented Jan 18, 2024 • edited Loading

codeboten commented Jan 23, 2024

dashpole commented Jan 23, 2024 • edited Loading

jmacd commented Jan 23, 2024

bogdandrutu commented Feb 5, 2024 • edited Loading

jmacd commented Feb 6, 2024

andrzej-stencel commented Feb 8, 2024

codeboten commented Mar 6, 2024

bogdandrutu commented Jan 18, 2024 •

edited

Loading

dashpole commented Jan 23, 2024 •

edited

Loading

bogdandrutu commented Feb 5, 2024 •

edited

Loading