From 656de22573d700904c1b2416a8fad51fd776104c Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 21 Sep 2020 23:49:25 -0700 Subject: [PATCH] Draft: Additive: Adding; Non-*: Grouping; LastValue; TBD #636 --- specification/metrics/api.md | 127 +++++++++++++++++------------------ 1 file changed, 63 insertions(+), 64 deletions(-) diff --git a/specification/metrics/api.md b/specification/metrics/api.md index d278d109be0..76f2f93e727 100644 --- a/specification/metrics/api.md +++ b/specification/metrics/api.md @@ -18,7 +18,7 @@ - [Instrument properties](#instrument-properties) * [Instrument naming requirements](#instrument-naming-requirements) * [Synchronous and asynchronous instruments compared](#synchronous-and-asynchronous-instruments-compared) - * [Additive and non-additive instruments compared](#additive-and-non-additive-instruments-compared) + * [Adding and grouping instruments compared](#adding-and-grouping-instruments-compared) * [Monotonic and non-monotonic instruments compared](#monotonic-and-non-monotonic-instruments-compared) * [Function names](#function-names) - [The instruments](#the-instruments) @@ -110,24 +110,25 @@ has put effort into taking some kind of measurement. For both performance and semantic reasons, the API let users choose between two kinds of measurement. -The term _additive_ is used to specify a characteristic of some +The term _Adding_ is used to specify a characteristic of some measurements, meant to indicate that only the sum is considered useful information. These are measurements that you would naturally combine using arithmetic addition, usually real quantities of something (e.g., number of bytes). -Non-additive measurements are used when the set of values, also known -as the population, is presumed to have useful information. A -non-additive measurement is either one that you would not naturally +The term _Grouping_ is used, as opposed to _Adding_, to specify +measurements that are individually interesting, where the distribution +of values is presumably useful. A +Grouping measurement is either one that you would not naturally combine using arithmetic addition (e.g., request latency), or it is a measurement you would naturally add where the intention is to monitor the distribution of values (e.g., queue size). The median value is -considered useful information for non-additive measurements. +considered useful information for Grouping measurements. -Non-additive instruments semantically capture more information than -additive instruments. Non-additive measurements are more expensive -than additive measurements, by this definition. Users will choose -additive instruments except when they expect to get value from the +Grouping instruments semantically capture more information than +Adding instruments. Grouping measurements are more expensive +than Adding measurements, by this definition. Users will choose +Adding instruments except when they expect to get value from the additional cost of information about individual values. None of this is to prevent an SDK from re-interpreting measurements based on configuration. Anything can happen with any kind of measurement. @@ -148,14 +149,14 @@ which is the user-facing entry point to the SDK. Instruments are classified in several ways that distinguish them from one another. -1. Synchronicity: A synchronous instrument is called by the user in a distributed [Context](../context/context.md) (i.e., Span context, Baggage). An asynchronous instrument is called by the SDK once per collection interval, lacking a Context. -2. Additivity: An additive instrument is one that records additive measurements, as described above. -3. Monotonicity: A monotonic instrument is an additive instrument, where the progression of each sum is non-decreasing. Monotonic instruments are useful for monitoring rate information. +1. Synchronous vs asynchronous: A synchronous instrument is called by the user in a distributed [Context](../context/context.md) (i.e., Span context, Baggage). An asynchronous instrument is called by the SDK once per collection interval, lacking a Context. +2. Adding vs grouping: An Adding / Grouping instrument is one that records Adding / Grouping measurements, as described above. +3. Monotonic: A monotonic instrument is an Adding instrument, where the progression of sum values is non-decreasing. Monotonic instruments are useful for monitoring rate information. The metric instruments names are shown below along with whether they -are synchronous, additive, and/or monotonic. +are Synchronous, Adding, and/or Monotonic. -| Name | Synchronous | Additive | Monotonic | +| Name | Synchronous | Adding | Monotonic | | ---- | ----------- | -------- | --------- | | Counter | Yes | Yes | Yes | | UpDownCounter | Yes | Yes | No | @@ -170,13 +171,13 @@ useful when measurements are expensive, therefore should be gathered periodically. Read more [characteristics of synchronous and asynchronous instruments](#synchronous-and-asynchronous-instruments-compared) below. -The synchronous and asynchronous additive instruments have a +The Synchronous and Asynchronous Adding instruments have a significant difference: synchronous instruments are used to capture changes in a sum, whereas asynchronous instruments are used to capture -sums directly. Read more [characteristics of additive -instruments](#additive-and-non-additive-instruments-compared) below. +sums directly. Read more [characteristics of adding +instruments](#adding-and-grouping-instruments-compared) below. -The monotonic additive instruments are significant because they support rate +The monotonic adding instruments are significant because they support rate calculations. Read more information about [choosing metric instruments](#monotonic-and-non-monotonic-instruments-compared) below. @@ -242,7 +243,7 @@ give users an understanding of how it is meant to be used. Instruments, in the absence of any configuration override, can be expected to deliver a useful, economical aggregation out of the box. -The additive instruments (`Counter`, `UpDownCounter`, `SumObserver`, +The Adding instruments (`Counter`, `UpDownCounter`, `SumObserver`, `UpDownSumObserver`) use a Sum aggregation by default. Details about computing a Sum aggregation vary, but from the user's perspective this means they will be able to monitor the sum of values captured. The @@ -250,13 +251,11 @@ distinction between synchronous and asynchronous instruments is crucial to specifying how exporters work, a topic that is covered in the [SDK specification (WIP)](https://github.com/open-telemetry/opentelemetry-specification/pull/347). -The non-additive instruments (`ValueRecorder`, `ValueObserver`) use -a MinMaxSumCount aggregation, by default. This aggregation keeps track -of the minimum value, the maximum value, the sum of values, and the -count of values. These four values support monitoring the range of -values, the rate of events, and the average event value. +The Synchronous Grouping instrument (`ValueRecorder`) uses +[TODO issue 636](https://github.com/open-telemetry/opentelemetry-specification/issues/636) aggregation, by default. +The Asynchronous Grouping instrument (`ValueObserver`) uses LastValue aggregation by default. -Other standard aggregations are available, especially for non-additive +Other standard aggregations are available, especially for Grouping instruments, where we are generally interested in a variety of different summaries, such as histograms, quantile summaries, cardinality estimates, and other kinds of sketch data structure. @@ -437,35 +436,35 @@ corresponding to the instrument and label set. (For this reasons, SDKs SHOULD run asynchronous instrument callbacks near the end of the collection interval.) -### Additive and non-additive instruments compared +### Adding and grouping instruments compared -Additive instruments are used to capture information about a sum, +Adding instruments are used to capture information about a sum, where, by definition, only the sum is of interest. Individual events are considered not meaningful for these instruments, the event count is not computed. This means, for example, that two `Counter` events `Add(N)` and `Add(M)` are equivalent to one `Counter` event `Add(N + -M)`. This is the case because `Counter` is synchronous, and -synchronous additive instruments are used to capture changes to a sum. +M)`. This is the case because `Counter` is Synchronous, and +Synchronous Adding instruments are used to capture changes to a sum. -Asynchronous, additive instruments (e.g., `SumObserver`) are used to +Asynchronous Adding instruments (e.g., `SumObserver`) are used to capture sums directly. This means, for example, that in any sequence of `SumObserver` observations for a given instrument and label set, the Last Value defines the sum of the instrument. -In both synchronous and asynchronous cases, the additive instruments +In both synchronous and asynchronous cases, the adding instruments are inexpensively aggregated into a single number per collection interval -without loss of information. This property makes additive instruments -higher performance, in general, than non-additive instruments. +without loss of information. This property makes adding instruments +higher performance, in general, than grouping instruments. -Non-additive instruments use a relatively inexpensive aggregation -method default (MinMaxSumCount), but still more expensive than the -default for additive instruments (Sum). Unlike additive instruments, -where only the sum is of interest by definition, non-additive +Grouping instruments use a relatively more-expensive aggregation +(TBD #636 for `ValueRecorder`, LastValue for `ValueObserver`) compared to the +default for adding instruments, which is Sum aggregation. Unlike adding instruments, +where only the sum is of interest by definition, grouping instruments can be configured with even more expensive aggregators. ### Monotonic and non-monotonic instruments compared -Monotonicity applies only to additive instruments. `Counter` and +Monotonicity applies only to adding instruments. `Counter` and `SumObserver` instruments are defined as monotonic because the sum captured by either instrument is non-decreasing. The `UpDown-` variations of these two instruments are non-monotonic, meaning the sum @@ -481,10 +480,10 @@ Non-increasing sums are not considered a feature in the Metric API. Each instrument supports a single function, named to help convey the instrument's semantics. -Synchronous additive instruments support an `Add()` function, +Synchronous adding instruments support an `Add()` function, signifying that they add to a sum and do not directly capture a sum. -Synchronous non-additive instruments support a `Record()` function, +Synchronous grouping instruments support a `Record()` function, signifying that they capture individual events, not only a sum. Asynchronous instruments all support an `Observe()` function, @@ -497,7 +496,7 @@ signifying that they capture only one value per measurement interval. `Counter` is the most common synchronous instrument. This instrument supports an `Add(increment)` function for reporting a sum, and is restricted to non-negative increments. The default aggregation is -`Sum`, as for any additive instrument. +`Sum`, as for any adding instrument. Example uses for `Counter`: @@ -533,17 +532,17 @@ levels across a group of processes. ### ValueRecorder -`ValueRecorder` is a non-additive synchronous instrument useful for -recording any non-additive number, positive or negative. Values +`ValueRecorder` is a grouping synchronous instrument useful for +recording any grouping number, positive or negative. Values captured by a `Record(value)` are treated as individual events belonging to a distribution that is being summarized. `ValueRecorder` should be chosen either when capturing measurements that do not contribute meaningfully to a sum, or when capturing numbers that are -additive in nature, but where the distribution of individual +adding in nature, but where the distribution of individual increments is considered interesting. One of the most common uses for `ValueRecorder` is to capture latency -measurements. Latency measurements are not additive in the sense that +measurements. Latency measurements are not adding in the sense that there is little need to know the latency-sum of all processed requests. We use a `ValueRecorder` instrument to capture latency measurements typically because we are interested in knowing mean, @@ -554,15 +553,15 @@ maximum values, the sum of event values, and the count of events, allowing the rate, the mean, and range of input values to be monitored. -Example uses for `ValueRecorder` that are non-additive: +Example uses for `ValueRecorder` that are grouping: - capture any kind of timing information - capture the acceleration experienced by a pilot - capture nozzle pressure of a fuel injector - capture the velocity of a MIDI key-press. -Example _additive_ uses of `ValueRecorder` capture measurements that -are additive, but where we may have an interest in the distribution of +Example _adding_ uses of `ValueRecorder` capture measurements that +are adding, but where we may have an interest in the distribution of values and not only the sum: - capture a request size @@ -570,16 +569,16 @@ values and not only the sum: - capture a queue length - capture a number of board feet of lumber. -These examples show that although they are additive in nature, +These examples show that although they are adding in nature, choosing `ValueRecorder` as opposed to `Counter` or `UpDownCounter` implies an interest in more than the sum. If you did not care to collect information about the distribution, you would have chosen one -of the additive instruments instead. Using `ValueRecorder` makes +of the adding instruments instead. Using `ValueRecorder` makes sense for capturing distributions that are likely to be important in an observability setting. Use these with caution because they naturally cost more than the use -of additive measurements. +of adding measurements. ### SumObserver @@ -629,7 +628,7 @@ would be impractical to instrument them, use a `UpDownSumObserver`. ### ValueObserver `ValueObserver` is the asynchronous instrument corresponding to -`ValueRecorder`, used to capture non-additive measurements with +`ValueRecorder`, used to capture grouping measurements with `Observe(value)`. These instruments are especially useful for capturing measurements that are expensive to compute, since it gives the SDK control over how often they are evaluated. @@ -639,9 +638,9 @@ Example uses for `ValueObserver`: - capture CPU fan speed - capture CPU temperature. -Note that these examples use non-additive measurements. In the +Note that these examples use grouping measurements. In the `ValueRecorder` case above, example uses were given for capturing -synchronous additive measurements during a request (e.g., +synchronous adding measurements during a request (e.g., current queue size seen by a request). In the asynchronous case, however, how should users decide whether to use `ValueObserver` as opposed to `UpDownSumObserver`? @@ -678,7 +677,7 @@ How are the instruments fundamentally different, and why are there only three? Why not one instrument? Why not ten? As we have seen, the instruments are categorized as to whether -they are synchronous, additive, and/or and monotonic. This approach +they are synchronous, adding, and/or and monotonic. This approach gives each of the instruments unique semantics, in ways that meaningfully improve the performance and interpretation of metric events. @@ -688,8 +687,8 @@ most cases it allows the SDK to provide good default functionality "out of the box", without requiring alternative behaviors to be configured. The choice of instrument determines not only the meaning of the events but also the name of the function called by the user. -The function names--`Add()` for additive instruments, `Record()` for -non-additive instruments, and `Observe()` for asynchronous +The function names--`Add()` for adding instruments, `Record()` for +grouping instruments, and `Observe()` for asynchronous instruments--help convey the meaning of these actions. The properties and standard implementation described for the @@ -697,12 +696,12 @@ individual instruments is summarized in the table below. | **Name** | Instrument kind | Function(argument) | Default aggregation | Notes | | ----------------------- | ----- | --------- | ------------- | --- | -| **Counter** | Synchronous additive monotonic | Add(increment) | Sum | Per-request, part of a monotonic sum | -| **UpDownCounter** | Synchronous additive | Add(increment) | Sum | Per-request, part of a non-monotonic sum | -| **ValueRecorder** | Synchronous | Record(value) | MinMaxSumCount | Per-request, any non-additive measurement | -| **SumObserver** | Asynchronous additive monotonic | Observe(sum) | Sum | Per-interval, reporting a monotonic sum | -| **UpDownSumObserver** | Asynchronous additive | Observe(sum) | Sum | Per-interval, reporting a non-monotonic sum | -| **ValueObserver** | Asynchronous | Observe(value) | MinMaxSumCount | Per-interval, any non-additive measurement | +| **Counter** | Synchronous adding monotonic | Add(increment) | Sum | Per-request, part of a monotonic sum | +| **UpDownCounter** | Synchronous adding | Add(increment) | Sum | Per-request, part of a non-monotonic sum | +| **ValueRecorder** | Synchronous | Record(value) | [TBD](https://github.com/open-telemetry/opentelemetry-specification/issues/636) | Per-request, any grouping measurement | +| **SumObserver** | Asynchronous adding monotonic | Observe(sum) | Sum | Per-interval, reporting a monotonic sum | +| **UpDownSumObserver** | Asynchronous adding | Observe(sum) | Sum | Per-interval, reporting a non-monotonic sum | +| **ValueObserver** | Asynchronous | Observe(value) | LastValue | Per-interval, any grouping measurement | ### Constructors