Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregating async Counter with Views dropping some attribute #1874

Closed
cijothomas opened this issue Aug 18, 2021 · 7 comments
Closed

Aggregating async Counter with Views dropping some attribute #1874

cijothomas opened this issue Aug 18, 2021 · 7 comments
Assignees
Labels
question Question for discussion spec:metrics Related to the specification/metrics directory

Comments

@cijothomas
Copy link
Member

Lets say I have a AsyncCounter (requestCountAsync).
At time T1, its callback reported the following measurements

  • measurements (100, success=true, verb=Get)
  • measurements (5, success=false, verb=Get)
    we export the following:
    {{requestCountAsync{Verb="Get", success=true, "Sum:100", Temporality=Cumulative, {T0}, T1}}}
    {{requestCountAsync{Verb="Get", success=false, "Sum:5", Temporality=Cumulative, {T0}, T1}}}

At T1+1

  • measurements (120, success=true, verb=Get)
  • measurements (7, success=false, verb=Get)
    we export the following:
    {{requestCountAsync{Verb="Get", success=true, "Sum:120", Temporality=Cumulative, {T0}, T1+1}}}
    {{requestCountAsync{Verb="Get", success=false, "Sum:7", Temporality=Cumulative, {T0}, T1+1}}}

At T1+2

  • measurements (150, success=true, verb=Get)
  • measurements (10, success=false, verb=Get)
    we export the following:
    {{requestCountAsync{Verb="Get", success=true, "Sum:150", Temporality=Cumulative, {T0}, T1+2}}}
    {{requestCountAsync{Verb="Get", success=false, "Sum:10", Temporality=Cumulative, {T0}, T1+2}}}

and so on.

Now I configured my SDK to have a View, which says "only pick "Verb" as an attribute, and drop everything else.
Repeating the same application:
At time T1, its callback reported the following measurements

  • measurements (100, success=true, verb=Get)
  • measurements (5, success=false, verb=Get)

What should we export?

  1. Report last value:
    {{requestCountAsync{Verb="Get", "Sum:5", Temporality=Cumulative, {T0}, T1}}}

OR

  1. Do "some sort of spatial aggregation" or "some merge", and report
    {{requestCountAsync{Verb="Get", "Sum:105", Temporality=Cumulative, {T0}, T1}}}

(Note: This is the "truth", as my web server has 105 requests total until now)

OR

This is undefined and upto languages..

@cijothomas cijothomas added question Question for discussion spec:metrics Related to the specification/metrics directory labels Aug 18, 2021
@reyang
Copy link
Member

reyang commented Aug 19, 2021

Discussed during the 8/19 Metrics SIG meeting, the answer is 2.

@reyang
Copy link
Member

reyang commented Aug 19, 2021

@victlu will follow up by putting a concrete example, if it turned out that we need to clarify the spec, we will turn this into a spec issue (rather than a question).

@victlu
Copy link
Contributor

victlu commented Aug 20, 2021

Setup

Using an Asynchronous Counter instrument.

A View configured with:

  • Sum aggregation (Cumulative Temporality)
  • Select only the "verb" attribute

Report at T0:

    Nothing

Report at T1 from one callback:

    100, success=true, verb=get
    5, success=false, verb=get
    20, verb=get, location=A
    50, verb=get, location=B

Report at T2 from one callback:

    120, success=true, verb=get
    7, success=false, verb=get
    100, verb=get, location=A
    40, verb=get, location=B

Report at T3 from one callback:

    150, success=true, verb=get
    10, success=false, verb=get
    30, verb=get, location=A
    10, verb=get, location=B

Expected Answer

Collect after T1:

    T0-T1: verb=get, Sum=105 (cum:0 + success=true:100 + success=false:5)

Collect after T2:

    T0-T2: verb=get, Sum=232 (cum:105 + success=true:120 + success=false:7)

Collect after T3:

    T0-T3: verb=get, Sum=392 (cum:232 + success=true:150 + success=false:10)

Because we, as instrumenter, know the total for "Get" is based on
success=true + success=false.

Currently, there is no way for SDK to know this fact.

So, what do we do with the location label With respect to the verb label?

A possible approach

Collect after T1:

    T0-T1: verb=get, Sum=175 (0 + 100+5+20+50)
    T0-T1: verb=get, success=true, Sum=100 (0 + 100)
    T0-T1: verb=get, success=false, Sum=5 (0 + 5)
    T0-T1: verb=get, location=A, Sum=20 (0 + 20)
    T0-T1: verb=get, location=B, Sum=50 (0 + 50)

Collect after T2:

    T0-T2: verb=get, Sum=442 (175 + 120+7+100+40)
    T0-T2: verb=get, success=true, Sum=220 (100 + 120)
    T0-T2: verb=get, success=false, Sum=12 (5 + 7)
    T0-T2: verb=get, location=A, Sum=120 (20 + 100)
    T0-T2: verb=get, location=B, Sum=90 (50 + 40)

Collect after T3:

    T0-T3: verb=get, Sum=642 (442 + 150+10+30+10)
    T0-T3: verb=get, success=true, Sum=370 (220 + 150)
    T0-T3: verb=get, success=false, Sum=22 (12 + 10)
    T0-T3: verb=get, location=A, Sum=150 (120 + 30)
    T0-T3: verb=get, location=B, Sum=100 (90 + 10)

We now have all the data in the "slices" of the total "pie".

We can decide (SDK or downstream) what labels constitute the total,
i.e. using label "success":

    T1: using verb=get + success=*, Sum=105 (100+5)

    T2: using verb=get + success=*, Sum=232 (220+12)

    T3: using verb=get + success=*, Sum=392 (370+22)

This method can be extended to any number of label combination.

Problem

The current spec does not offer any clarification on this topic.

My thoughts for solution to this problem are:

  • The View specify all the labels (verb + success) that constitute the "whole" pie.
    • And which label/s (i.e. verb) to summarize/group by.
  • All measurements reported in a callback (or otherwise for same Timestamp) are subject to spatial aggregation (i.e. dropping labels).
  • We do not do spatial aggregation for different measurements in different timestamps.

@victlu
Copy link
Contributor

victlu commented Aug 24, 2021

Based on SIG discussion on 8/24/2021, the guidance should be:

  • An instrument should record the "Whole" value with all labels that constitute the whole be included
  • Instrumenters should avoid instrumentation that would "double count" or otherwise lead to a non-sensical sum.

@jmacd
Copy link
Contributor

jmacd commented Aug 24, 2021

From Prometheus docs, this is summarized with an example that I like:

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful). If it is not meaningful, split the data up into multiple metrics. For example, having the capacity of various queues in one metric is good, while mixing the capacity of a queue with the current number of elements in the queue is not.

@jmacd
Copy link
Contributor

jmacd commented Aug 25, 2021

@cijothomas OK to resolve this issue?

@cijothomas
Copy link
Member Author

@cijothomas OK to resolve this issue?

Yes! thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question for discussion spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

No branches or pull requests

4 participants