Clarification needed on how values for attributes filtered by a view are used #2905

MrAlias · 2022-10-27T19:24:53Z

The SDK MUST provide the means to register Views with a MeterProvider. Here are the inputs:
...

The configuration for the resulting metrics stream:
...

A list of attribute keys (optional). If provided, the attributes that are not in the list will be ignored. If not provided, all the attribute keys will be used by default (TODO: once the Hint API is available, the default behavior should respect the Hint if it is available).

For asynchronous counter instruments (counter and up-down-counter), how should their sums be reported when this "filter" is defined?

For example, if there is an asynchronous counter that observes 1 for the attribute set version=1 and then, in the same callback, 2 for the attribute set version=2 and a list of attribute keys equaling {"foo"} is used in a view, both attribute sets (version=1 and version=2) should be ignored according to the specification. Therefore, they both become observations of the empty attribute set. How should their sums be combined? Should the SDK report 2, the last recorded value for the empty attribute set? Or, should it report 3 the combination?

I had assumed the latter would be expected, but when looking at the python implementation they use the former. This is based on the limit_num_of_attrs.py example:

import random
import time
from typing import Iterable

from opentelemetry.metrics import (
    CallbackOptions,
    Observation,
    get_meter_provider,
    set_meter_provider,
)
from opentelemetry.sdk.metrics import MeterProvider, ObservableGauge
from opentelemetry.sdk.metrics.export import (
    ConsoleMetricExporter,
    PeriodicExportingMetricReader,
)
from opentelemetry.sdk.metrics.view import View

view_with_attributes_limit = View(
    instrument_type=ObservableGauge,
    instrument_name="observable_gauge",
    attribute_keys={"foo"},
)

exporter = ConsoleMetricExporter()

reader = PeriodicExportingMetricReader(exporter, export_interval_millis=1_000)
provider = MeterProvider(
    metric_readers=[
        reader,
    ],
    views=[
        view_with_attributes_limit,
    ],
)
set_meter_provider(provider)

meter = get_meter_provider().get_meter("reduce-cardinality-with-view", "0.1.2")


def observable_gauge_func(options: CallbackOptions) -> Iterable[Observation]:
    yield Observation(1, {"version": 1})
    yield Observation(2, {"version": 2})


# Async gauge
observable_gauge = meter.create_observable_gauge(
    "observable_gauge",
    [observable_gauge_func],
)

while 1:
    time.sleep(1)

(caveat, my python understanding is limited so I could have missed something here)

When running this:

$ python limit_num_of_attrs.py
{"resource_metrics": [{"resource": {"attributes": {"telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.12.0", "service.name": "unknown_service"}, "schema_url": ""}, "scope_metrics": [{"scope": {"name": "reduce-cardinality-with-view", "version": "0.1.2", "schema_url": ""}, "metrics": [{"name": "observable_gauge", "description": "", "unit": "", "data": {"data_points": [{"attributes": {}, "start_time_unix_nano": 0, "time_unix_nano": 1666898388887952217, "value": 2}]}}], "schema_url": ""}], "schema_url": ""}]}
{"resource_metrics": [{"resource": {"attributes": {"telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.12.0", "service.name": "unknown_service"}, "schema_url": ""}, "scope_metrics": [{"scope": {"name": "reduce-cardinality-with-view", "version": "0.1.2", "schema_url": ""}, "metrics": [{"name": "observable_gauge", "description": "", "unit": "", "data": {"data_points": [{"attributes": {}, "start_time_unix_nano": 0, "time_unix_nano": 1666898389956287449, "value": 2}]}}], "schema_url": ""}], "schema_url": ""}]}
#...

The value reported is 2.

cc @jmacd @bogdandrutu @reyang @open-telemetry/python-approvers

The text was updated successfully, but these errors were encountered:

MrAlias · 2022-10-27T19:35:52Z

I would be interested to hear how Java, .NET, and other stable implementation behave here. Unfortunately, my ability to write examples there are non-existent.

reyang · 2022-10-27T21:52:15Z

The Python implementation sounds buggy to me.

For additive (e.g. Counter, UpDownCounter, Asynchronous Counter, Asynchronous UpDownCounter) values, removing a spatial dimension should result in sum.

MrAlias · 2022-10-27T22:07:58Z

@reyang can you comment on how .NET handles this? Does that project add them together?

cijothomas · 2022-10-27T22:32:48Z

#1874 Old issue discussing this.

I can confirm that .NET does not handle this correctly, and has the same bug as python. (I'll create an issue to track)

cijothomas · 2022-10-27T22:34:40Z

#1874 Old issue discussing this.

I can confirm that .NET does not handle this correctly, and has the same bug as python. (I'll create an issue to track)

https://github.com/open-telemetry/opentelemetry-dotnet/blob/73f8d3cb0160f633661854f41866dcdb70b81069/test/OpenTelemetry.Tests/Metrics/MetricAPITest.cs#L800-L803 UnitTest showing the same bug in .NET. (I looked at it at that time and it was not trivial to fix, so just added test to come back to this later.)

MrAlias · 2022-10-27T23:57:30Z

#1874 Old issue discussing this.

I can confirm that .NET does not handle this correctly, and has the same bug as python. (I'll create an issue to track)

Gotcha, thanks for confirming 🙏

MrAlias · 2022-10-28T15:08:02Z

@jack-berg can you confirm the behavior of async counters in Java in the presence of an "attribute reducing" view? I'm wondering if there is any implementation that sums these values.

jmacd · 2022-10-28T15:33:52Z

This is a good question. The Lightstep metrics SDK and the v0.31 and earlier OTel-Go metrics SDK did sum these instruments. As a case where this matters, potentially, is the Golang runtime/metrics package, which outputs cumulative Counter and UpDownCounter values in a form that is natural for use with asynchronous instruments.

For example, runtime/metrics outputs three metrics, one is a total of the other two:

/gc/cycles/automatic:gc-cycles
	Count of completed GC cycles generated by the Go runtime.

/gc/cycles/forced:gc-cycles
	Count of completed GC cycles forced by the application.

/gc/cycles/total:gc-cycles
	Count of all completed GC cycles.

In the instrumentation package I wrote (forked from the contrib repo), https://github.com/lightstep/otel-launcher-go/tree/main/lightstep/instrumentation/runtime, the pattern is used to discard the total, since the SDK or a downstream consumer can easily recompute the total using attribute removal and the natural merge function for the data point. So in the example above, a cumulative GC count for class=forced and class=automatic will be generated. If you remove class, you get the total back. There are three groups of metrics in the Go-1.20 runtime/metrics package that follow this pattern. For Go-1.20, we get hierarchical CPU timing measurements, so you could filter in more than one meaningful way.

reyang · 2022-10-28T16:15:42Z

A good way to think about this problem: whether the dimension is removed from the SDK (e.g. by configuring View), or it is removed by the Collector (by reaggregation), or it is removed at the backend/comsumption (via PromQL), the results should be the same.

MrAlias · 2022-10-28T16:55:54Z

For additive (e.g. Counter, UpDownCounter, Asynchronous Counter, Asynchronous UpDownCounter) values, removing a spatial dimension should result in sum.

@reyang should the Asynchronous Counter, Asynchronous UpDownCounter be added to the list of instrument at the end of that section that are treated as additive?

Also, should the API specification be updated to normatively require these instrument types be treated as additive?

lalitb · 2022-10-28T17:03:23Z

otel-cpp implementation does the sum for the sync counters matching the "attribute reducing" view, but selects the last value for the async counter. Need to be fixed, will create a issue to track this.

reyang · 2022-10-28T17:14:29Z

For additive (e.g. Counter, UpDownCounter, Asynchronous Counter, Asynchronous UpDownCounter) values, removing a spatial dimension should result in sum.

@reyang should the Asynchronous Counter, Asynchronous UpDownCounter be added to the list of instrument at the end of that section that are treated as additive?

Which list?

Also, should the API specification be updated to normatively require these instrument types be treated as additive?

I think no. Additive property applies to Sums, and is covered by the SDK specification https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#default-aggregation. It is perfectly fine for the SDK user to configure an UpDownCounter as something different (e.g. gauge) and not apply additive property.

MrAlias · 2022-10-28T17:15:43Z

For additive (e.g. Counter, UpDownCounter, Asynchronous Counter, Asynchronous UpDownCounter) values, removing a spatial dimension should result in sum.

@reyang should the Asynchronous Counter, Asynchronous UpDownCounter be added to the list of instrument at the end of that section that are treated as additive?

Which list?

The list included in the sentence at the end of the linked section:

In OpenTelemetry, each Instrument implies whether it is additive or not. For example, Counter and UpDownCounter are additive while Asynchronous Gauge is non-additive.

MrAlias · 2022-10-28T17:17:42Z

Also, should the API specification be updated to normatively require these instrument types be treated as additive?

I think no. Additive property applies to Sums, and is covered by the SDK specification https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#default-aggregation.

Where is it covered? I don't see anywhere in that linked section defining sums as additive.

It also seems like the term additive is applied to instruments, not aggregations. This would make sense as that is where attributes are also defined. i.e.

each Instrument implies whether it is additive or not

reyang · 2022-10-28T17:17:58Z

The list included in the sentence at the end of the linked section:

In OpenTelemetry, each Instrument implies whether it is additive or not. For example, Counter and UpDownCounter are additive while Asynchronous Gauge is non-additive.

I think yes it can, not necessarily as it is only providing examples (without promising that it will be a complete list).

reyang · 2022-10-28T17:22:09Z

Where is it covered? I don't see anywhere in that linked section defining sums as additive.

It also seems like the term additive is applied to instruments, not aggregations. This would make sense as that is where attributes are also defined.

Yeah, I can see the challenge here.

The data model specification covered "additive" in the temporality section, and temporality is referenced by the sums section. I wrote the supplementary guideline after discussing with @jmacd, hoping it will provide some help for folks who struggle to understand the data model and API specification. Maybe we should improve the data model spec to bring more clarity by making it more streamlined.

MrAlias · 2022-10-28T17:23:50Z

Where is it covered? I don't see anywhere in that linked section defining sums as additive.
It also seems like the term additive is applied to instruments, not aggregations. This would make sense as that is where attributes are also defined.

Yeah, I can see the challenge here.

The data model specification covered "additive" in the temporality section, and temporality is referenced by the sums section. I wrote the supplementary guideline after discussing with @jmacd, hoping it will provide some help for folks who struggle to understand the data model and API specification. Maybe we should improve the data model spec to bring more clarity by making it more streamlined.

Gotcha. Any help clarifying this in the specification would be appreciated 🙏

reyang · 2022-10-28T17:30:16Z

Just to get a sanity check: if we don't currently have normative requirements for this in the specification and stable releases of implementations don't do this, does OTel really require this behavior? Is it just considered undefined currently?

Can we add it in a backward-compatible way? Will having implementations change their behavior be backwards-compatible?

I think the spec does require it, although it is not expressed in a normative way.
Here is my understanding of the complex mental network:

One has to first read the API specification to understand Asynchronous Counter.
Then go to the SDK specification to understand that the default aggregation is Sum.
Then go to the Data Model specification to understand temporality and realize that Sum means additive.
Or by chance discovered the supplementary guideline and realized how additive property should be handled after reading the examples.

Given all these steps, 90% chance folks will run into a trap unless they asked someone who has been deeply engaged in all the three specs...

jack-berg · 2022-10-31T21:31:40Z

Java doesn't do this correctly either. Opened open-telemetry/opentelemetry-java#4901 to track.

MadVikingGod · 2022-11-01T15:06:22Z

I want to try and clarify the workings of the async instruments and their default aggregation.

In the case that was raised here, there is a filter and two observes will record to the same filtered counter, we expect the result is a sum of both observations.

What about other similar cases:

If there is no filter, and two observers happen in separate callbacks?
If there is no filter and two observers happen in the same callback?
If there is a filter, and the same attribute sets are used? E.g. case 1, but there is a filter.
What happens if there is a filter and the view says this should be a gauge?

Would it be a good idea to put other a set of edge cases, and the expected output?

reyang · 2022-11-01T21:14:28Z

the workings of the async instruments and their default aggregation.

In the case that was raised here, there is a filter and two observes will record to the same filtered counter, we expect the result is a sum of both observations.

What about other similar cases:

If there is no filter, and two observers happen in separate callbacks?

If there is no filter and two observers happen in the same callback?

If there is a filter, and the same attribute sets are used? E.g. case 1, but there is a filter.

What happens if there is a filter and the view says this should be a gauge?

@MadVikingGod could you give some concrete examples? I found that the cases you've described can be interpreted differently. Ideally something like this would help to provide clarity https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#asynchronous-example.

If folks feel this area needs more examples to facilitate the understanding, I can add a section under https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#additive-property to explain the spatial aggregation.

MadVikingGod · 2022-11-03T14:42:46Z

That is the gist of what I was asking about, just more tightly focused on edge cases we have historically found to be non-obvious. Like the behavior in this issue.

If we were to ask "What should be the result in these 4 (+ the original) cases?" I think we would see different interpretations. While I think the supplementary guidelines are useful, they are verbose enough that they can only cover a small set of behaviors. I was thinking more along the lines of a document that easily be translated into tests to help with the validation of these interactions of features. This issue might look like this: https://gist.github.com/MadVikingGod/e3e65d6217d321d5de8ded159f4abc7e

reyang · 2022-11-03T23:27:20Z

That is the gist of what I was asking about, just more tightly focused on edge cases we have historically found to be non-obvious. Like the behavior in this issue.

If we were to ask "What should be the result in these 4 (+ the original) cases?" I think we would see different interpretations. While I think the supplementary guidelines are useful, they are verbose enough that they can only cover a small set of behaviors. I was thinking more along the lines of a document that easily be translated into tests to help with the validation of these interactions of features. This issue might look like this: https://gist.github.com/MadVikingGod/e3e65d6217d321d5de8ded159f4abc7e

The gist reminded of me https://github.com/w3c/trace-context/tree/main/test, which is a big commitment to introduce and maintain. I feel it is probably a separate topic on its own.

tsloughter · 2022-11-04T16:18:24Z

Yes, I've wanted something like the tracecontext interop tests for tracing and metrics so bad. I even started on it at one point... Probably need a SIG for it :)

MrAlias added question Question for discussion spec:metrics Related to the specification/metrics directory labels Oct 27, 2022

github-actions bot assigned yurishkuro Oct 27, 2022

MrAlias mentioned this issue Oct 27, 2022

Use view AttributeFilter open-telemetry/opentelemetry-go#3390

Closed

1 task

This comment was marked as outdated.

Sign in to view

MrAlias mentioned this issue Oct 28, 2022

Add table of instrument additive property #2906

Merged

jack-berg mentioned this issue Oct 31, 2022

Async instruments don't do spatial reaggregation when attributes are dropped open-telemetry/opentelemetry-java#4901

Open

srikanthccv mentioned this issue Oct 31, 2022

Incorrect aggregation in Async instruments when attributes are dropped open-telemetry/opentelemetry-python#3008

Closed

lalitb mentioned this issue Oct 31, 2022

[Metrics SDK] Async aggregation doesn't do re-aggregation properly when the spatial dimensions are dropped open-telemetry/opentelemetry-cpp#1724

Open

MadVikingGod mentioned this issue Nov 3, 2022

Align Async Counter behavior when used with filtered attributes. open-telemetry/opentelemetry-go#3439

Closed

MrAlias mentioned this issue Dec 20, 2022

Combine precomputed values of filtered attribute sets open-telemetry/opentelemetry-go#3549

Merged

dashpole mentioned this issue Jan 17, 2023

Clarification needed: handling repeated async counter observations #3109

Open

cijothomas mentioned this issue Nov 30, 2023

Add tests to validate spatial aggregation open-telemetry/opentelemetry-rust#1416

Merged

MrAlias mentioned this issue Feb 5, 2024

[Editorial] clarify cardinality limit #3856

Merged

MrAlias closed this as completed Feb 5, 2024

MrAlias closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification needed on how values for attributes filtered by a view are used #2905

Clarification needed on how values for attributes filtered by a view are used #2905

MrAlias commented Oct 27, 2022 •

edited

Loading

MrAlias commented Oct 27, 2022

reyang commented Oct 27, 2022

MrAlias commented Oct 27, 2022

cijothomas commented Oct 27, 2022

cijothomas commented Oct 27, 2022 •

edited by reyang

Loading

MrAlias commented Oct 27, 2022

MrAlias commented Oct 28, 2022

jmacd commented Oct 28, 2022

reyang commented Oct 28, 2022

MrAlias commented Oct 28, 2022 •

edited

Loading

lalitb commented Oct 28, 2022

reyang commented Oct 28, 2022 •

edited

Loading

MrAlias commented Oct 28, 2022

MrAlias commented Oct 28, 2022 •

edited

Loading

reyang commented Oct 28, 2022

reyang commented Oct 28, 2022

This comment was marked as outdated.

MrAlias commented Oct 28, 2022 •

edited

Loading

reyang commented Oct 28, 2022

jack-berg commented Oct 31, 2022

MadVikingGod commented Nov 1, 2022

reyang commented Nov 1, 2022

MadVikingGod commented Nov 3, 2022

reyang commented Nov 3, 2022

tsloughter commented Nov 4, 2022

Clarification needed on how values for attributes filtered by a view are used #2905

Clarification needed on how values for attributes filtered by a view are used #2905

Comments

MrAlias commented Oct 27, 2022 • edited Loading

MrAlias commented Oct 27, 2022

reyang commented Oct 27, 2022

MrAlias commented Oct 27, 2022

cijothomas commented Oct 27, 2022

cijothomas commented Oct 27, 2022 • edited by reyang Loading

MrAlias commented Oct 27, 2022

MrAlias commented Oct 28, 2022

jmacd commented Oct 28, 2022

reyang commented Oct 28, 2022

MrAlias commented Oct 28, 2022 • edited Loading

lalitb commented Oct 28, 2022

reyang commented Oct 28, 2022 • edited Loading

MrAlias commented Oct 28, 2022

MrAlias commented Oct 28, 2022 • edited Loading

reyang commented Oct 28, 2022

reyang commented Oct 28, 2022

This comment was marked as outdated.

MrAlias commented Oct 28, 2022 • edited Loading

reyang commented Oct 28, 2022

jack-berg commented Oct 31, 2022

MadVikingGod commented Nov 1, 2022

reyang commented Nov 1, 2022

MadVikingGod commented Nov 3, 2022

reyang commented Nov 3, 2022

tsloughter commented Nov 4, 2022

MrAlias commented Oct 27, 2022 •

edited

Loading

cijothomas commented Oct 27, 2022 •

edited by reyang

Loading

MrAlias commented Oct 28, 2022 •

edited

Loading

reyang commented Oct 28, 2022 •

edited

Loading

MrAlias commented Oct 28, 2022 •

edited

Loading

MrAlias commented Oct 28, 2022 •

edited

Loading