Validate that duplicate metrics across receivers are intentional #26499

mackjmr · 2023-09-07T12:32:48Z

Component(s)

cmd/mdatagen

Describe the issue you're reporting

This is a proposal to disallow having metrics with the same name across different receivers to avoid metric collisions. This will be enforced by having the mdatagen validation fail in the case of non-unique metric names defined in metadata.yaml across receivers.

That said, there is likely to be exceptions to this rule. For example, there is some discussion in #24238 as to whether we should allow using the same metric name in receivers which collect metrics that represent the same underlying data.

There should be a way for to have exceptions, and allow different receivers to emit the same metric, but the general rule should be that this is not allowed.

I propose that this should be allowed for receivers that collect data from the same underlying technology (e.g. container runtime) but different implementations (e.g. docker, containerd, cri-o). Fundamentally, this should be for receivers that act as replacements for collecting the same data, and that cannot be used in conjunction.

To address this, I think we should have a field in the metadata.yaml, e.g. group to indicate that receivers pull the same data. For instance, in the case of a dockerstats receiver + containerd receiver, they would both be part of group container runtime. For the validation, only receivers from a same group will be allowed to have non-unique metric names.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-09-07T12:33:07Z

Pinging code owners:

cmd/mdatagen: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

hughesjj · 2023-09-13T16:29:25Z

My 2c:

we should not explicitly disallow this, but a report would be very useful
We need to link this to the semantic conventions group
I could see utility in allowing two different receivers (and thus presumably two different collection mechanisms) two report the same metric to surface issues. I'm thinking boeing/nasa here where you can have a consensus between three different devices measuring the same signal, throwing out one which may disagree, and alerting on such so the users of the telemetry can debug the disagreeing source (as an example)

mackjmr · 2023-09-14T08:49:21Z

Notes from the SIG meeting:

This should act like a linter, where it alerts the user in the case of non unique metric names (by making the validation fail), to make sure that it is intentional. If it is intentional, the user can input the metric name in an allow list.

@hughesjj

we should not explicitly disallow this, but a report would be very useful

With the approach above, this is not disallowed, but more of a confirmation check.

We need to link this to the semantic conventions group

Can you clarify what you mean here ?

I could see utility in allowing two different receivers (and thus presumably two different collection mechanisms) two report the same metric to surface issues. I'm thinking boeing/nasa here where you can have a consensus between three different devices measuring the same signal, throwing out one which may disagree, and alerting on such so the users of the telemetry can debug the disagreeing source (as an example)

Agreed 👍

hughesjj · 2023-09-14T10:12:42Z

@mackjmr @ "link this to the semantic conventions"

I might be thinking too far ahead, and it's likely not a blocking prerequisite. That said, the ideal state/"north star"/eventual convergence(?) of metric and attribute namings is coupled to the opentelemetry semantic conventions. To my understanding the current plan is that only a subset of all metrics/attributes emitted would be tracked by semantic conventions, but I could see motivation for 100% coverage of all sdk instrumentation+collector metric/attribute emission being tracked by semantic conventions.

Summary:
My understanding of this proposal is to reduce conflict between namings of signals across receivers. Semantic conventions are more or less a specification for consistent naming and... semantics.. across signals. Thus, if we ever see a duplicate signal between two receivers, semantic conventions may indicate whether they should be "the same signal" or not.

Tl;Dr: if we see duplicate metric names between receivers, and said metric_name exists in semantic conventions spec, we probably wish to keep the names as-is (assuming they fulfill semantic intent). If the duplicates aren't in semconv, we may eidh to bring the collision to the semantic conventions working group.

hughesjj · 2023-09-14T10:14:10Z

Tldr:tldr: we should additionally discuss this in the semantic conventions working group if we haven't already

mackjmr · 2023-09-14T12:21:16Z

@hughesjj Thank you for clarifying.

Agreed, I think that if semantic conventions on metric_name exist, validating whether the duplicate metrics are OK should be a matter of ensuring they both fulfill semantic intent (and perhaps also making sure there is a way to differentiate them with attributes, e.g. scope name). On the other hand, if semantic conventions on metric_name do not exist, it can become difficult to validate. I will bring this issue to the Semantic Conventions SIG to discuss 🙂

This issue is not specific to receivers, and some SDKs are also affected by this, so perhaps this can also be a broader discussion.

Issue: open-telemetry#26499 If semantic conventions on the duplicate metric exist, validating is a matter of ensuring they both fulfill semantic intent. For Validating duplicate metrics for which there is no semantic conventions, additional discussion is needed with the semantic conventions WG as mentioned in the issue. For the time being, it will be best judgment.

mackjmr · 2023-09-15T12:40:10Z

Added this issue to the Semantic Convention WG on Monday: https://docs.google.com/document/d/10xG7DNKWRhxNmFGt3yYd3980a9uwS8lMl2LvQL3VNK8/edit.

mackjmr · 2023-09-19T08:51:52Z

Notes from semantic conventions WG meeting

Arguments for having Duplicate metrics:
From a metric data model perspective, duplicates are allowed to flow through. It is expected for the backend to resolve them. Additionally, one use case for doing this is being able to see how different systems measure the same value.
Arguments against having Duplicate metrics:
Duplicates can provide a bad user experience. It can create confusion for user when aggregated in the backend (aggregation will not only be spacial aggregation, but also scope aggregation). This is solved by adding scope to the aggregation, but the collector does not necessarily add scope everywhere.

JVM metrics had same issue, and decision was that for duplicates it was opt-in on 1 side to avoid collisions (e.g. only 1 enabled by default). May not be a good default experience to have duplicate metrics, but should be opt in.

This issue should be raised in the specification meeting.

mackjmr · 2023-09-21T12:40:23Z

From the Specification SIG:

There was a lot of discussion (between 4:50=>24:24), TLDR is: Two pieces of instrumentation providing the same data is allowed. That said, it can be a problem but there is not a good solution to solve this in a generic way. Should be a collector decision how this is addressed. We could have a way to detect two identical metrics flowing through the collector (this is not only a problem if there are duplicate metrics across receivers, but can also be an issue when there is a misconfiguration and the same receiver is pointed twice to the same instance).

For the collector, I think that #26687 is a good step so that we are aware that there are duplicates. In the future, we may want to extend mdatagen to document duplicates in the Generated Readme so that users are aware, and if there are reports of duplicates being a bad user experience, perhaps we can consider having a rule that only 1 should be enabled at a time by default like it was done for JVM metrics in SDKs.

cc @dmitryax and @hughesjj if there are any concerns for: #26687.

dmitryax · 2023-11-14T23:36:55Z

The approach sounds good to me. And I like the receivers group concept like container runtime metrics/

**Description:** This test alerts a user if a metric is duplicated across receivers, and instructs the user to add said metric to an allow list if this is intentional. If semantic conventions on the duplicate metric exist, validating is a matter of ensuring they both fulfill semantic intent. For validating duplicate metrics for which there is no semantic conventions, additional discussion is needed with the semantic conventions WG as mentioned in the issue. For the time being, it will be best judgment. **Link to tracking Issue:** #26499

…etry#26687) **Description:** This test alerts a user if a metric is duplicated across receivers, and instructs the user to add said metric to an allow list if this is intentional. If semantic conventions on the duplicate metric exist, validating is a matter of ensuring they both fulfill semantic intent. For validating duplicate metrics for which there is no semantic conventions, additional discussion is needed with the semantic conventions WG as mentioned in the issue. For the time being, it will be best judgment. **Link to tracking Issue:** open-telemetry#26499

github-actions · 2024-01-15T03:30:14Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

cmd/mdatagen: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

mackjmr added the needs triage New item requiring triage label Sep 7, 2023

github-actions bot added the cmd/mdatagen mdatagen command label Sep 7, 2023

This was referenced Sep 7, 2023

Add CI testing for metadata.yaml for receivers metrics. #23375

Closed

[cmd/mdatagen] Add unit test for metric collisions. #24229

Closed

[receiver/kubeletstatsreceiver] prepend container. metrics with k8s. #24238

Closed

This was referenced Sep 7, 2023

[receiver/{dockerstatsreceiver, kubeletstatsreceiver}] Collision with metric container.memory.rss #24009

Closed

[receiver/{dockerstatsreceiver, kubeletstatsreceiver}] Collision with metric container.cpu.utilization #24008

Closed

atoulme added enhancement New feature or request and removed needs triage New item requiring triage labels Sep 9, 2023

github-actions bot mentioned this issue Sep 12, 2023

Weekly Report: 2023-09-05 - 2023-09-12 kevinslin/opentelemetry-collector-contrib#23

Open

This was referenced Sep 12, 2023

Weekly Report: 2023-09-05 - 2023-09-12 kevinslin/opentelemetry-collector-contrib#24

Open

Weekly Report: 2023-09-05 - 2023-09-12 kevinslin/opentelemetry-collector-contrib#25

Open

mackjmr mentioned this issue Sep 14, 2023

Add test to Validate if duplicate metrics are intentional #26687

Merged

mackjmr changed the title ~~Disallow non unique metric name in receivers~~ Validate that duplicate metrics across receivers are intentional Sep 18, 2023

github-actions bot added the Stale label Jan 15, 2024

mackjmr closed this as completed Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate that duplicate metrics across receivers are intentional #26499

Validate that duplicate metrics across receivers are intentional #26499

mackjmr commented Sep 7, 2023 •

edited

Loading

github-actions bot commented Sep 7, 2023

hughesjj commented Sep 13, 2023

mackjmr commented Sep 14, 2023

hughesjj commented Sep 14, 2023

hughesjj commented Sep 14, 2023

mackjmr commented Sep 14, 2023

mackjmr commented Sep 15, 2023 •

edited

Loading

mackjmr commented Sep 19, 2023

mackjmr commented Sep 21, 2023

dmitryax commented Nov 14, 2023

github-actions bot commented Jan 15, 2024

Validate that duplicate metrics across receivers are intentional #26499

Validate that duplicate metrics across receivers are intentional #26499

Comments

mackjmr commented Sep 7, 2023 • edited Loading

Component(s)

Describe the issue you're reporting

github-actions bot commented Sep 7, 2023

hughesjj commented Sep 13, 2023

mackjmr commented Sep 14, 2023

hughesjj commented Sep 14, 2023

hughesjj commented Sep 14, 2023

mackjmr commented Sep 14, 2023

mackjmr commented Sep 15, 2023 • edited Loading

mackjmr commented Sep 19, 2023

mackjmr commented Sep 21, 2023

dmitryax commented Nov 14, 2023

github-actions bot commented Jan 15, 2024

mackjmr commented Sep 7, 2023 •

edited

Loading

mackjmr commented Sep 15, 2023 •

edited

Loading