Aggregating async Counter with Views dropping some attribute #1874

cijothomas · 2021-08-18T00:42:40Z

Lets say I have a AsyncCounter (requestCountAsync).
At time T1, its callback reported the following measurements

measurements (100, success=true, verb=Get)
measurements (5, success=false, verb=Get)
we export the following:
{{requestCountAsync{Verb="Get", success=true, "Sum:100", Temporality=Cumulative, {T0}, T1}}}
{{requestCountAsync{Verb="Get", success=false, "Sum:5", Temporality=Cumulative, {T0}, T1}}}

At T1+1

measurements (120, success=true, verb=Get)
measurements (7, success=false, verb=Get)
we export the following:
{{requestCountAsync{Verb="Get", success=true, "Sum:120", Temporality=Cumulative, {T0}, T1+1}}}
{{requestCountAsync{Verb="Get", success=false, "Sum:7", Temporality=Cumulative, {T0}, T1+1}}}

At T1+2

measurements (150, success=true, verb=Get)
measurements (10, success=false, verb=Get)
we export the following:
{{requestCountAsync{Verb="Get", success=true, "Sum:150", Temporality=Cumulative, {T0}, T1+2}}}
{{requestCountAsync{Verb="Get", success=false, "Sum:10", Temporality=Cumulative, {T0}, T1+2}}}

and so on.

Now I configured my SDK to have a View, which says "only pick "Verb" as an attribute, and drop everything else.
Repeating the same application:
At time T1, its callback reported the following measurements

measurements (100, success=true, verb=Get)
measurements (5, success=false, verb=Get)

What should we export?

Report last value:
{{requestCountAsync{Verb="Get", "Sum:5", Temporality=Cumulative, {T0}, T1}}}

OR

Do "some sort of spatial aggregation" or "some merge", and report
{{requestCountAsync{Verb="Get", "Sum:105", Temporality=Cumulative, {T0}, T1}}}

(Note: This is the "truth", as my web server has 105 requests total until now)

OR

This is undefined and upto languages..

reyang · 2021-08-19T23:38:39Z

Discussed during the 8/19 Metrics SIG meeting, the answer is 2.

reyang · 2021-08-19T23:45:50Z

@victlu will follow up by putting a concrete example, if it turned out that we need to clarify the spec, we will turn this into a spec issue (rather than a question).

victlu · 2021-08-20T17:06:37Z

Setup

Using an Asynchronous Counter instrument.

A View configured with:

Sum aggregation (Cumulative Temporality)
Select only the "verb" attribute

Report at T0:

    Nothing

Report at T1 from one callback:

    100, success=true, verb=get
    5, success=false, verb=get
    20, verb=get, location=A
    50, verb=get, location=B

Report at T2 from one callback:

    120, success=true, verb=get
    7, success=false, verb=get
    100, verb=get, location=A
    40, verb=get, location=B

Report at T3 from one callback:

    150, success=true, verb=get
    10, success=false, verb=get
    30, verb=get, location=A
    10, verb=get, location=B

Expected Answer

Collect after T1:

    T0-T1: verb=get, Sum=105 (cum:0 + success=true:100 + success=false:5)

Collect after T2:

    T0-T2: verb=get, Sum=232 (cum:105 + success=true:120 + success=false:7)

Collect after T3:

    T0-T3: verb=get, Sum=392 (cum:232 + success=true:150 + success=false:10)

Because we, as instrumenter, know the total for "Get" is based on
success=true + success=false.

Currently, there is no way for SDK to know this fact.

So, what do we do with the location label With respect to the verb label?

A possible approach

Collect after T1:

    T0-T1: verb=get, Sum=175 (0 + 100+5+20+50)
    T0-T1: verb=get, success=true, Sum=100 (0 + 100)
    T0-T1: verb=get, success=false, Sum=5 (0 + 5)
    T0-T1: verb=get, location=A, Sum=20 (0 + 20)
    T0-T1: verb=get, location=B, Sum=50 (0 + 50)

Collect after T2:

    T0-T2: verb=get, Sum=442 (175 + 120+7+100+40)
    T0-T2: verb=get, success=true, Sum=220 (100 + 120)
    T0-T2: verb=get, success=false, Sum=12 (5 + 7)
    T0-T2: verb=get, location=A, Sum=120 (20 + 100)
    T0-T2: verb=get, location=B, Sum=90 (50 + 40)

Collect after T3:

    T0-T3: verb=get, Sum=642 (442 + 150+10+30+10)
    T0-T3: verb=get, success=true, Sum=370 (220 + 150)
    T0-T3: verb=get, success=false, Sum=22 (12 + 10)
    T0-T3: verb=get, location=A, Sum=150 (120 + 30)
    T0-T3: verb=get, location=B, Sum=100 (90 + 10)

We now have all the data in the "slices" of the total "pie".

We can decide (SDK or downstream) what labels constitute the total,
i.e. using label "success":

    T1: using verb=get + success=*, Sum=105 (100+5)

    T2: using verb=get + success=*, Sum=232 (220+12)

    T3: using verb=get + success=*, Sum=392 (370+22)

This method can be extended to any number of label combination.

Problem

The current spec does not offer any clarification on this topic.

My thoughts for solution to this problem are:

The View specify all the labels (verb + success) that constitute the "whole" pie.
- And which label/s (i.e. verb) to summarize/group by.
All measurements reported in a callback (or otherwise for same Timestamp) are subject to spatial aggregation (i.e. dropping labels).
We do not do spatial aggregation for different measurements in different timestamps.

victlu · 2021-08-24T17:36:06Z

Based on SIG discussion on 8/24/2021, the guidance should be:

An instrument should record the "Whole" value with all labels that constitute the whole be included
Instrumenters should avoid instrumentation that would "double count" or otherwise lead to a non-sensical sum.

jmacd · 2021-08-24T19:48:20Z

From Prometheus docs, this is summarized with an example that I like:

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful). If it is not meaningful, split the data up into multiple metrics. For example, having the capacity of various queues in one metric is good, while mixing the capacity of a queue with the current number of elements in the queue is not.

jmacd · 2021-08-25T16:54:48Z

@cijothomas OK to resolve this issue?

cijothomas · 2021-08-25T18:07:29Z

@cijothomas OK to resolve this issue?

Yes! thank you.

cijothomas added question Question for discussion spec:metrics Related to the specification/metrics directory labels Aug 18, 2021

github-actions bot assigned jmacd Aug 18, 2021

cijothomas closed this as completed Aug 25, 2021

jmacd mentioned this issue Dec 8, 2021

Metrics supplemental guidelines for async views #2208

Merged

cijothomas mentioned this issue Oct 27, 2022

Clarification needed on how values for attributes filtered by a view are used #2905

Closed

MrAlias mentioned this issue Jan 17, 2023

Combine precomputed values of filtered attribute sets open-telemetry/opentelemetry-go#3549

Merged

cijothomas mentioned this issue Nov 30, 2023

Add tests to validate spatial aggregation open-telemetry/opentelemetry-rust#1416

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregating async Counter with Views dropping some attribute #1874

Aggregating async Counter with Views dropping some attribute #1874

cijothomas commented Aug 18, 2021

reyang commented Aug 19, 2021

reyang commented Aug 19, 2021 •

edited

Loading

victlu commented Aug 20, 2021 •

edited

Loading

victlu commented Aug 24, 2021

jmacd commented Aug 24, 2021

jmacd commented Aug 25, 2021

cijothomas commented Aug 25, 2021

Aggregating async Counter with Views dropping some attribute #1874

Aggregating async Counter with Views dropping some attribute #1874

Comments

cijothomas commented Aug 18, 2021

reyang commented Aug 19, 2021

reyang commented Aug 19, 2021 • edited Loading

victlu commented Aug 20, 2021 • edited Loading

Setup

Expected Answer

A possible approach

Problem

victlu commented Aug 24, 2021

jmacd commented Aug 24, 2021

jmacd commented Aug 25, 2021

cijothomas commented Aug 25, 2021

reyang commented Aug 19, 2021 •

edited

Loading

victlu commented Aug 20, 2021 •

edited

Loading