From 81e6df2419f99ac8fc84c4a48b9648fba507fe64 Mon Sep 17 00:00:00 2001 From: jmacd Date: Sun, 23 Feb 2020 11:12:14 -0800 Subject: [PATCH 01/15] Work on new ontology --- ...-metric-instrument-optional-refinements.md | 127 ++++++++++++++++++ 1 file changed, 127 insertions(+) create mode 100644 text/0088-metric-instrument-optional-refinements.md diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md new file mode 100644 index 000000000..859acd9dd --- /dev/null +++ b/text/0088-metric-instrument-optional-refinements.md @@ -0,0 +1,127 @@ +# Metric Instruments + +Removes the optional semantic declarations `Monotonic` and `Absolute` +for metric instruments, declares the Measure and Observer instruments +as _foundational_, and introduces a process for standardizing new +instrument "refinements". + +## Motivation + +With the removal of Gauge instruments and the addition of Observer +instruments in the specification, the existing `Monotonic` and +`Absolute` options began to create confusion. For example, a Counter +instrument is used for capturing changes in a Sum, so we could say that +values which are non-negative (absolute) determine metric events +define a monotonic Counter. The confusion arises, in this case, +because `Absolute` refers to the captured values, whereas `Monotonic` +refers to the instrument or, more precisely, to a property of the +standard aggregation for Counters. + +From a different perspective, Counter instruments might be treated as +as refinements of the Measure instrument. Whereas the Measure +instrument is used for capturing all-purpose synchronous measurements, +the Counter instrument is used specifically for capturing measurements +of synchronous changes in a sum, therefore it uses `Add()` instead of +`Record()` as the action and specifies `Sum` as the standard +aggregation. + +What this illustrates is that we have modeled this space poorly. This +does not propose to change any existing metric APIs, only our +understanding of the three instruments currently part of the +specification: Measure, Observer, and Counter (a refinement). + +## Explanation + +The Measure and Observer instrument are defined as _foundational_ +here, in the sense that any kind of metric instrument must reduce to +one of these archetypes. The foundational instruments are +unrestricted, in the sense that metric events support any numerical +value, positive or negative, zero or infinity. + +The distinction between the two instrument archetypes is in their +synchronicity. Measure instruments are called synchronously by the +user, while Observer instruments are called asynchronously by the SDK. +Synchronous instruments (Measure and refinements) have three calling +patterns (_Bound_, _Unbound_, and _Batch_) to capture measurements. +Asynchronous instruments (Observer any refinements) use callbacks to +capture measurements. + +All measurements, synchronous or asynchronous, produce a metric event +([Context, timestamp, instrument descriptor, label set, and numerical +value](api-metrics.md#metric-event-format)), however there exists a +semantic distinction between synchronous and asynchronous events +related to the definition of "last value". This is due to the +relationship with time. + +Synchronous events happen concurrently, meaning we can only determine +whether one event happens before another by referring to timestamp +When querying events from synchronous instruments, you may find +multiple events for the same instrument, label set, and timestamp. + +Asynchronous events are captured sequentially, meaning there is always +a well-defined _last value_. When querying events from asynchronous +instruments, you cannot find more than one event for the same +instrument, label set, and timestamp. Values observed asynchronously +are referred to as the _last value_ that was observed. + +### Standard implementation + +Sum and Count for both Measure and Observer. + +### First Refinement: Counter + +Captures non-negative increments (deltas). + +And it matters because Prometheus needs to know. (Duh.) + +### More refinements: Monotonic, Non-negative + +Measures +-------- +Measure unstricted, sumcount +Counter non-negative, sum +UpDownCounter unrestricted, sum +NonNegativeMeasure non-negative, sumcount + +Observers +--------- +Observer unrestricted, sumcount +MonotonicObserver unrestricted/monotonic, sum +NonNegativeObserver non-negative, sumcount +DeltaObserver unrestricted, sum + + +## Explanation + +Explain the proposed change as though it was already implemented and you were explaining it to a user. Depending on which layer the proposal addresses, the "user" may vary, or there may even be multiple. + +We encourage you to use examples, diagrams, or whatever else makes the most sense! + +## Internal details + +From a technical perspective, how do you propose accomplishing the proposal? In particular, please explain: + +* How the change would impact and interact with existing functionality +* Likely error modes (and how to handle them) +* Corner cases (and how to handle them) + +While you do not need to prescribe a particular implementation - indeed, OTEPs should be about **behaviour**, not implementation! - it may be useful to provide at least one suggestion as to how the proposal *could* be implemented. This helps reassure reviewers that implementation is at least possible, and often helps them inspire them to think more deeply about trade-offs, alternatives, etc. + +## Trade-offs and mitigations + +What are some (known!) drawbacks? What are some ways that they might be mitigated? + +Note that mitigations do not need to be complete *solutions*, and that they do not need to be accomplished directly through your proposal. A suggested mitigation may even warrant its own OTEP! + +## Prior art and alternatives + +What are some prior and/or alternative approaches? For instance, is there a corresponding feature in OpenTracing or OpenCensus? What are some ideas that you have rejected? + +## Open questions + +What are some questions that you know aren't resolved yet by the OTEP? These may be questions that could be answered through further discussion, implementation experiments, or anything else that the future may bring. + +## Future possibilities + +What are some future changes that this proposal would enable? + From 65864b3b9d994a118e3a7b7e8683174609c71399 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 25 Feb 2020 00:47:27 -0800 Subject: [PATCH 02/15] More --- ...-metric-instrument-optional-refinements.md | 197 +++++++++++------- 1 file changed, 121 insertions(+), 76 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index 859acd9dd..515fae7b9 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -15,7 +15,7 @@ values which are non-negative (absolute) determine metric events define a monotonic Counter. The confusion arises, in this case, because `Absolute` refers to the captured values, whereas `Monotonic` refers to the instrument or, more precisely, to a property of the -standard aggregation for Counters. +standard aggregation for Counter instrumenrts. From a different perspective, Counter instruments might be treated as as refinements of the Measure instrument. Whereas the Measure @@ -34,94 +34,139 @@ specification: Measure, Observer, and Counter (a refinement). The Measure and Observer instrument are defined as _foundational_ here, in the sense that any kind of metric instrument must reduce to -one of these archetypes. The foundational instruments are -unrestricted, in the sense that metric events support any numerical -value, positive or negative, zero or infinity. - -The distinction between the two instrument archetypes is in their -synchronicity. Measure instruments are called synchronously by the -user, while Observer instruments are called asynchronously by the SDK. -Synchronous instruments (Measure and refinements) have three calling -patterns (_Bound_, _Unbound_, and _Batch_) to capture measurements. -Asynchronous instruments (Observer any refinements) use callbacks to -capture measurements. - -All measurements, synchronous or asynchronous, produce a metric event -([Context, timestamp, instrument descriptor, label set, and numerical -value](api-metrics.md#metric-event-format)), however there exists a -semantic distinction between synchronous and asynchronous events -related to the definition of "last value". This is due to the -relationship with time. - -Synchronous events happen concurrently, meaning we can only determine -whether one event happens before another by referring to timestamp -When querying events from synchronous instruments, you may find -multiple events for the same instrument, label set, and timestamp. - -Asynchronous events are captured sequentially, meaning there is always -a well-defined _last value_. When querying events from asynchronous -instruments, you cannot find more than one event for the same -instrument, label set, and timestamp. Values observed asynchronously -are referred to as the _last value_ that was observed. - -### Standard implementation - -Sum and Count for both Measure and Observer. - -### First Refinement: Counter - -Captures non-negative increments (deltas). - -And it matters because Prometheus needs to know. (Duh.) - -### More refinements: Monotonic, Non-negative - -Measures --------- -Measure unstricted, sumcount -Counter non-negative, sum -UpDownCounter unrestricted, sum -NonNegativeMeasure non-negative, sumcount - -Observers ---------- -Observer unrestricted, sumcount -MonotonicObserver unrestricted/monotonic, sum -NonNegativeObserver non-negative, sumcount -DeltaObserver unrestricted, sum - - -## Explanation - -Explain the proposed change as though it was already implemented and you were explaining it to a user. Depending on which layer the proposal addresses, the "user" may vary, or there may even be multiple. - -We encourage you to use examples, diagrams, or whatever else makes the most sense! +one of these. The foundational instruments are unrestricted, in the +sense that metric events support any numerical value, positive or +negative, zero or infinity. + +The distinction between the two foundational instruments is whether +they are synchronous. Measure instruments are called synchronously by +the user, while Observer instruments are called asynchronously by the +implementation. Synchronous instruments (Measure and refinements) +have three calling patterns (_Bound_, _Unbound_, and _Batch_) to +capture measurements. Asynchronous instruments (Observer and +refinements) use callbacks to capture measurements. + +All measurements produce a metric event consisting of [timestamp, +instrument descriptor, label set, and numerical +value](api-metrics.md#metric-event-format). Synchronous instrument +events additionally have [Context](api-context.md), describing +properties of the associated trace and distributed correlation values. + +Observer instruments have a well-defined _last value_ of the +measurement that can be useful in defining aggregations. To maintain +this property, we impose this requirement: two or more calls to +`Observer()` in a single Observer callback invocation are treated as +duplicates of each other, and the last call to `Observe()` wins. + +Measure instruments do not define a _last value_ relationship. + +### Standard implementation of Measure and Observer + +OpenTelemetry specifies how the default SDK should treat metric +events, by default, when asked to export data from an instrument. +Usually, an aggregation is specified along with the label keys used in +the aggregation. Measure and Observer instruments use `Sum` and +`Count` aggregators by default, in the standard implementation. This +pair of measurements, of course, defines an average value. There are +no restrictions placed on the numerical value in an event by one of +the foundational instruments. + +### Refinements to Measure and Observer + +Options like `Monotonic` and `Absolute` were removed in the 0.3 +specification. Here, we propose to regain the equivalent effects +through _instrument refinements_, which declare instruments with +calling patterns like Measure and Observer, but with different +standard implementations and standard-alternative implementations. + +We have done away with options on instruments, in other words, in +favor of optional metric instruments. Here we discuss three important +capabilities used to make up instrument refinements. + +#### Non-negative + +For some instruments, such as those that measure real quantities, +negative values are meaningless. For example, it is impossible for a +person to weigh a negative amount. + +A non-negative instrument refinement accepts only non-negative values. +For instruments with this property, negative values are considered +measurement errors. + +#### Monotonic + +A monotonic instrument is one where the user promises that successive +metric events for a given instrument definition and label set will +differ by a non-negative value. This is defined in terms of the last +value relationship, therefore only applies to refinements of the +Observer instrument. For example, the CPU time used by a process as +read in successive collection intervals cannot change by a negative +amount, because it is impossible to use a negative amount of CPU time. + +A monotonic instrument refinement accepts only values that are greater +than or equal to the last-captured value of the instrument, for a +given label set. For instruments with this property, values that are +less than some prior value are considered a measurement error. + +#### Sum-Only + +A sum-only instrument is one where only the sum is considered of +interest. For a Sum-Only instrument refinement, we have a semantic +property that two events with numeric values `M` and `N` are +semantically equivalent to a single event with value `M+N`. For +example, in a count of users arriving by bus to an event, we are not +concerned with the number of buses that arrived. + +A sum-only instrument is one where the number of events is not +counted, by default, only the `Sum`. + +#### Language-level refinements + +OpenTelemetry implementations may wish to add instrument refinements +to accomdate built-in types. Languages with distinct integer and +floating point should offer instrument refinements for each, leading +to type names like `Int64Measure` and `Float64Measure`. + +A language with support for unsigned integer types may wish to create +dedicated instruments to report these values, leading to type names +like `UnsignedInt64Observer` and `UnsignedFloat64Observer`. + +Other uses for built-in type refinements involve the type for duration +measurements. Where there is built-in type for the difference between +two clock measurements, OpenTelemetry languages should offer a +refinement to automatically apply the correct units. + +### Counter refinement + +Counter is a non-negative, sum-only refinement of the Measure +instrument. ## Internal details -From a technical perspective, how do you propose accomplishing the proposal? In particular, please explain: - -* How the change would impact and interact with existing functionality -* Likely error modes (and how to handle them) -* Corner cases (and how to handle them) +This is a change of understanding. It does not request any new +instruments be created, only specifiy how we should think about adding +new instruments. -While you do not need to prescribe a particular implementation - indeed, OTEPs should be about **behaviour**, not implementation! - it may be useful to provide at least one suggestion as to how the proposal *could* be implemented. This helps reassure reviewers that implementation is at least possible, and often helps them inspire them to think more deeply about trade-offs, alternatives, etc. +No API changes are called for in this proposal. ## Trade-offs and mitigations -What are some (known!) drawbacks? What are some ways that they might be mitigated? - -Note that mitigations do not need to be complete *solutions*, and that they do not need to be accomplished directly through your proposal. A suggested mitigation may even warrant its own OTEP! +The trade-off explicitly introduced here is that we will prefer to +create new instruments for each dedicated purpose, rather than create +generic instruments with support for multiple semantic options. ## Prior art and alternatives -What are some prior and/or alternative approaches? For instance, is there a corresponding feature in OpenTracing or OpenCensus? What are some ideas that you have rejected? +The optional behaviors `Monotonic` and `Absolute` were first discussed +in the August 2019 Metrics working group meeting. ## Open questions -What are some questions that you know aren't resolved yet by the OTEP? These may be questions that could be answered through further discussion, implementation experiments, or anything else that the future may bring. +This approach allows questions about new instruments to be addressed +on a case-by-case basis. ## Future possibilities -What are some future changes that this proposal would enable? - +A future OTEP will request the introduction of several new standard +refinements. For example, a monotonic observer instrument named +`MonotonicObserver` and a timing instrument named `TimingMeasure`. From c4fb330725eb792413a63c278545edc116018fac Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 25 Feb 2020 00:57:17 -0800 Subject: [PATCH 03/15] Typos --- text/0088-metric-instrument-optional-refinements.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index 515fae7b9..873d69daa 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -123,7 +123,7 @@ counted, by default, only the `Sum`. #### Language-level refinements OpenTelemetry implementations may wish to add instrument refinements -to accomdate built-in types. Languages with distinct integer and +to accommodate built-in types. Languages with distinct integer and floating point should offer instrument refinements for each, leading to type names like `Int64Measure` and `Float64Measure`. @@ -144,7 +144,7 @@ instrument. ## Internal details This is a change of understanding. It does not request any new -instruments be created, only specifiy how we should think about adding +instruments be created, only specify how we should think about adding new instruments. No API changes are called for in this proposal. From ddd64830d5cb612724b3524b757e82946f4eb43c Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 25 Feb 2020 15:53:26 -0800 Subject: [PATCH 04/15] Rewording and typos --- ...-metric-instrument-optional-refinements.md | 135 +++++++++++------- 1 file changed, 84 insertions(+), 51 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index 873d69daa..d26caba93 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -3,32 +3,30 @@ Removes the optional semantic declarations `Monotonic` and `Absolute` for metric instruments, declares the Measure and Observer instruments as _foundational_, and introduces a process for standardizing new -instrument "refinements". +instrument _refinements_. ## Motivation With the removal of Gauge instruments and the addition of Observer instruments in the specification, the existing `Monotonic` and `Absolute` options began to create confusion. For example, a Counter -instrument is used for capturing changes in a Sum, so we could say that -values which are non-negative (absolute) determine metric events -define a monotonic Counter. The confusion arises, in this case, -because `Absolute` refers to the captured values, whereas `Monotonic` -refers to the instrument or, more precisely, to a property of the -standard aggregation for Counter instrumenrts. +instrument is used for capturing changes in a Sum, and we could say +that non-negative-valued metric events define a monotonic Counter, in +the sense that its Sum is monotonic. The confusion arises, in this +case, because `Absolute` refers to the captured values, whereas +`Monotonic` refers to the semantic output. From a different perspective, Counter instruments might be treated as -as refinements of the Measure instrument. Whereas the Measure -instrument is used for capturing all-purpose synchronous measurements, -the Counter instrument is used specifically for capturing measurements -of synchronous changes in a sum, therefore it uses `Add()` instead of -`Record()` as the action and specifies `Sum` as the standard -aggregation. +refinements of the Measure instrument. Whereas the Measure instrument +is used for capturing all-purpose synchronous measurements, the +Counter instrument is used specifically for synchronously capturing +measurements of changes in a sum, therefore it uses `Add()` instead of +`Record()`, and it specifies `Sum` as the standard aggregation. What this illustrates is that we have modeled this space poorly. This -does not propose to change any existing metric APIs, only our -understanding of the three instruments currently part of the -specification: Measure, Observer, and Counter (a refinement). +does not propose to change any existing metrics APIs, only our +understanding of the three instruments currently included in the +specification: Measure, Observer, and Counter. ## Explanation @@ -41,21 +39,21 @@ negative, zero or infinity. The distinction between the two foundational instruments is whether they are synchronous. Measure instruments are called synchronously by the user, while Observer instruments are called asynchronously by the -implementation. Synchronous instruments (Measure and refinements) +implementation. Synchronous instruments (Measure and its refinements) have three calling patterns (_Bound_, _Unbound_, and _Batch_) to -capture measurements. Asynchronous instruments (Observer and +capture measurements. Asynchronous instruments (Observer and its refinements) use callbacks to capture measurements. -All measurements produce a metric event consisting of [timestamp, +All measurement APIs produce metric events consisting of [timestamp, instrument descriptor, label set, and numerical value](api-metrics.md#metric-event-format). Synchronous instrument events additionally have [Context](api-context.md), describing properties of the associated trace and distributed correlation values. -Observer instruments have a well-defined _last value_ of the -measurement that can be useful in defining aggregations. To maintain +Observer instruments have a well-defined _last value_ measured by the +instrument that can be useful in defining aggregations. To maintain this property, we impose this requirement: two or more calls to -`Observer()` in a single Observer callback invocation are treated as +`Observe()` in a single Observer callback invocation are treated as duplicates of each other, and the last call to `Observe()` wins. Measure instruments do not define a _last value_ relationship. @@ -64,24 +62,24 @@ Measure instruments do not define a _last value_ relationship. OpenTelemetry specifies how the default SDK should treat metric events, by default, when asked to export data from an instrument. -Usually, an aggregation is specified along with the label keys used in -the aggregation. Measure and Observer instruments use `Sum` and -`Count` aggregators by default, in the standard implementation. This -pair of measurements, of course, defines an average value. There are -no restrictions placed on the numerical value in an event by one of -the foundational instruments. +Measure and Observer instruments compute `Sum` and `Count` +aggregations, by default, in the standard implementation. This pair +of measurements, of course, defines an average value. There are no +restrictions placed on the numerical value in an event for the two +foundational instruments. ### Refinements to Measure and Observer -Options like `Monotonic` and `Absolute` were removed in the 0.3 +The `Monotonic` and `Absolute` options were removed in the 0.3 specification. Here, we propose to regain the equivalent effects -through _instrument refinements_, which declare instruments with -calling patterns like Measure and Observer, but with different -standard implementations and standard-alternative implementations. +through instrument refinements. Instrument refinements have the same +calling patterns as the foundational instrument they refine, adding +either a different standard implementation or a restriction of the +input domain. -We have done away with options on instruments, in other words, in -favor of optional metric instruments. Here we discuss three important -capabilities used to make up instrument refinements. +We have done away with instrument options, in other words, in favor of +optional metric instruments. Here we discuss three important +capabilities achievable using instrument refinements. #### Non-negative @@ -106,11 +104,11 @@ amount, because it is impossible to use a negative amount of CPU time. A monotonic instrument refinement accepts only values that are greater than or equal to the last-captured value of the instrument, for a given label set. For instruments with this property, values that are -less than some prior value are considered a measurement error. +less than some prior value are considered measurement errors. #### Sum-Only -A sum-only instrument is one where only the sum is considered of +A sum-only instrument is one where only the sum is considered to be of interest. For a Sum-Only instrument refinement, we have a semantic property that two events with numeric values `M` and `N` are semantically equivalent to a single event with value `M+N`. For @@ -129,31 +127,64 @@ to type names like `Int64Measure` and `Float64Measure`. A language with support for unsigned integer types may wish to create dedicated instruments to report these values, leading to type names -like `UnsignedInt64Observer` and `UnsignedFloat64Observer`. +like `UnsignedInt64Observer` and `UnsignedFloat64Observer`. These +would naturally apply a non-negative refinment. Other uses for built-in type refinements involve the type for duration -measurements. Where there is built-in type for the difference between -two clock measurements, OpenTelemetry languages should offer a -refinement to automatically apply the correct units. +measurements. For example, where there is built-in type for the +difference between two clock measurements, OpenTelemetry APIs should +offer a refinement to automatically apply the correct unit of time to +the measurement. ### Counter refinement Counter is a non-negative, sum-only refinement of the Measure instrument. +### Future refinements + +This leaves the potential to include other refinements by combining +the above elements. A `Counter`-like instrument that permits +non-negative updates could be called an `UpDownCounter`, for example. +An `Observer`-like instrument with non-descending values could be +called a `MonotonicObserver` instrument. An `Observer`-like instrument +that reports non-negative updates to a sum could be called a +`DeltaObserver` instrument. + ## Internal details This is a change of understanding. It does not request any new -instruments be created, only specify how we should think about adding -new instruments. +instruments be created or APIs be changed, but it does specify how we +should think about adding new instruments. No API changes are called for in this proposal. +### Note for Cumulative Exporters + +The Prometheus system collects cumulative data for its counter +instruments, meaning it exports the lifetime sum of a counter at each +collection interval, not the change in that sum over the collection +interval. + +It is important, therefore, to know when the output of an instrument +should be reflected as a cumulative value in the exporter. The +`Counter` instrument automatically has this property, but why? + +We can infer that a value is cumulative in the following circumstances: + +- Non-negative and Sum-only: This is called `Counter`, if synchronous, and can be `DeltaObserver`, potentially, if asynchronous +- Monotonic: This can be `MonotonicObserver`, potentially. + +We see that these refinements satisfy their intended purpose, which is +to convey additional semantics that exporters use to convert data into +their system. + ## Trade-offs and mitigations -The trade-off explicitly introduced here is that we will prefer to -create new instruments for each dedicated purpose, rather than create -generic instruments with support for multiple semantic options. +The trade-off explicitly introduced here is that we should prefer to +create new instrument refinements, each for a dedicated purpose, +rather than create generic instruments with support for multiple +semantic options. ## Prior art and alternatives @@ -162,11 +193,13 @@ in the August 2019 Metrics working group meeting. ## Open questions -This approach allows questions about new instruments to be addressed -on a case-by-case basis. +This approach allows new instrument refinements to be considered on a +case-by-case basis. For example, is a `MonotonicObserver` sufficient, +or do we also need a `DeltaObserver`? ## Future possibilities -A future OTEP will request the introduction of several new standard -refinements. For example, a monotonic observer instrument named -`MonotonicObserver` and a timing instrument named `TimingMeasure`. +A future OTEP will request the introduction of two new standard +refinements for the 0.4 API specification. These will be a monotonic +Observer instrument named `MonotonicObserver` and a synchronous timing +instrument named `TimingMeasure`. From f4c6698177d5adde9e0a21312f5abb6c9d81008c Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 10 Mar 2020 00:45:27 -0700 Subject: [PATCH 05/15] Updates --- ...-metric-instrument-optional-refinements.md | 163 +++++++++++------- 1 file changed, 100 insertions(+), 63 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index d26caba93..e6acad74d 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -50,6 +50,8 @@ value](api-metrics.md#metric-event-format). Synchronous instrument events additionally have [Context](api-context.md), describing properties of the associated trace and distributed correlation values. +### Last-value relationship + Observer instruments have a well-defined _last value_ measured by the instrument that can be useful in defining aggregations. To maintain this property, we impose this requirement: two or more calls to @@ -58,6 +60,37 @@ duplicates of each other, and the last call to `Observe()` wins. Measure instruments do not define a _last value_ relationship. +### Aggregating changes to a sum: Rate calculation + +The former `Monotonic` option had been introduced in order to support +reporting of a current sum, such that a rate calculation is implied. +Here we defined _Rate_ as an aggregation, defined for a subset of +instruments, that may be calculated differently depending on how the +instrument is defined. The rate aggregation outputs the amount of +change in a quantity divided by the amount of change in time. + +A rate can be computed from values that are reported as differences, +referred to as _delta_ reporting here, or as sums, referred to as +_cumulative_ reporting here. The primary goal of the instrument +refinements introduced in this proposal is to facilitate rate +calculations in more than one way. + +When delta reporting, a rate is calculated by summing individual +measurements or observations. For Measure instruments, these values +fall into a range of time, as indicated by the event timestamp. For +Observer instruments, these values fall into a range of collection +intervals. + +When cumulative reporting, a rate is calculated by computing a +difference between individual values. For an Observer instrument, we +compute rate over a range of collection intervals, and for a Measure +instrument we compute rate over a range of timestamps. In either +case, we are interested in subtracting the final value from the prior +value measured or observed on the instrument. + +Note that rate aggregation, as illustrated above, treats the time +dimension differently than the other dimensions used for aggregation. + ### Standard implementation of Measure and Observer OpenTelemetry specifies how the default SDK should treat metric @@ -72,14 +105,15 @@ foundational instruments. The `Monotonic` and `Absolute` options were removed in the 0.3 specification. Here, we propose to regain the equivalent effects -through instrument refinements. Instrument refinements have the same -calling patterns as the foundational instrument they refine, adding -either a different standard implementation or a restriction of the -input domain. +through instrument refinements. Instrument refinements are added to +the foundational instruments, yielding new instruments with the same +calling patterns as the foundational instrument they refine. These +refinements support adding either a different standard implementation +or a restriction of the input domain to the instrument. We have done away with instrument options, in other words, in favor of -optional metric instruments. Here we discuss three important -capabilities achievable using instrument refinements. +optional metric instruments. Here we discuss four significant +instrument refinements. #### Non-negative @@ -89,34 +123,57 @@ person to weigh a negative amount. A non-negative instrument refinement accepts only non-negative values. For instruments with this property, negative values are considered -measurement errors. - -#### Monotonic +measurement errors. Both Measure and Observer instruments support +non-negative refinements. -A monotonic instrument is one where the user promises that successive -metric events for a given instrument definition and label set will -differ by a non-negative value. This is defined in terms of the last -value relationship, therefore only applies to refinements of the -Observer instrument. For example, the CPU time used by a process as -read in successive collection intervals cannot change by a negative -amount, because it is impossible to use a negative amount of CPU time. - -A monotonic instrument refinement accepts only values that are greater -than or equal to the last-captured value of the instrument, for a -given label set. For instruments with this property, values that are -less than some prior value are considered measurement errors. - -#### Sum-Only +#### Sum-only A sum-only instrument is one where only the sum is considered to be of -interest. For a Sum-Only instrument refinement, we have a semantic +interest. For a sum-only instrument refinement, we have a semantic property that two events with numeric values `M` and `N` are semantically equivalent to a single event with value `M+N`. For -example, in a count of users arriving by bus to an event, we are not -concerned with the number of buses that arrived. +example, in a sum-only count of users arriving by bus to an event, we +are not concerned with the number of buses that arrived. A sum-only instrument is one where the number of events is not -counted, by default, only the `Sum`. +counted, only the `Sum`. A key property of sum-only instruments is +that they always support a Rate aggregation, whether reporting delta- +or cumulative-values. Both Measure and Observer instruments support +sum-only refinements. + +#### Precomputed-sum + +A precomputed-sum refinement indicates that values reported through an +instrument are observed or measured in terms of a sum that changes +over time. Pre-computed sum instruments support cumulative reporting, +meaning the rate aggregation is defined by computing a difference +across timestamps or collection intervals. + +A precomputed sum refinement implies a sum-only refinement. Note that +values assocaited with a precomputed sum are still sums. Precomputed +sum values are combined using addition, when aggregating over the +spatial dimensions; only the time dimension receives special treatment. + +#### Non-negative-rate + +A non-negative-rate instrument refinement states that rate aggregation +produces only non-negative results. There are non-negative-rate cases +of interest for delta reporting and cumulative reporting, as follows. + +For delta reporting, any non-negative and sum-only instrument is also +a non-negative-rate instrument. + +For cumulative reporting, a sum-only and pre-computed sum instrument +does not necessarily have a non-negative rate, but adding an explicit +non-negative-rate refinement makes it the equivalent of `Monotonic` in +the 0.2 specification. + +For example, the CPU time used by a process, as read in successive +collection intervals, cannot change by a negative amount, because it +is impossible to use a negative amount of CPU time. CPU time a +typical value to report through an Observer instrument, so the rate +for a specific set of labels is defined by subtracting the prior +observation from the current observation. #### Language-level refinements @@ -144,12 +201,18 @@ instrument. ### Future refinements This leaves the potential to include other refinements by combining -the above elements. A `Counter`-like instrument that permits -non-negative updates could be called an `UpDownCounter`, for example. -An `Observer`-like instrument with non-descending values could be -called a `MonotonicObserver` instrument. An `Observer`-like instrument -that reports non-negative updates to a sum could be called a -`DeltaObserver` instrument. +the above elements. The following are current and proposed names for +three instruments that support non-negative rate reporting: + +| Foundation | Refinements | Name | +|--|--|--| +| Measure | non-negative, sum-only, non-negative-rate | Counter | +| Observer | non-negative, sum-only, non-negative-rate | DeltaObserver | +| Observer | sum-only, precomputed-sum, non-negative-rate | CumulativeObserver | + +The Counter instrument is already part of the specification. A +proposal to introduce DeltaObserver and CumulativeObserver will follow +in a future OTEP. ## Internal details @@ -159,26 +222,6 @@ should think about adding new instruments. No API changes are called for in this proposal. -### Note for Cumulative Exporters - -The Prometheus system collects cumulative data for its counter -instruments, meaning it exports the lifetime sum of a counter at each -collection interval, not the change in that sum over the collection -interval. - -It is important, therefore, to know when the output of an instrument -should be reflected as a cumulative value in the exporter. The -`Counter` instrument automatically has this property, but why? - -We can infer that a value is cumulative in the following circumstances: - -- Non-negative and Sum-only: This is called `Counter`, if synchronous, and can be `DeltaObserver`, potentially, if asynchronous -- Monotonic: This can be `MonotonicObserver`, potentially. - -We see that these refinements satisfy their intended purpose, which is -to convey additional semantics that exporters use to convert data into -their system. - ## Trade-offs and mitigations The trade-off explicitly introduced here is that we should prefer to @@ -191,15 +234,9 @@ semantic options. The optional behaviors `Monotonic` and `Absolute` were first discussed in the August 2019 Metrics working group meeting. -## Open questions - -This approach allows new instrument refinements to be considered on a -case-by-case basis. For example, is a `MonotonicObserver` sufficient, -or do we also need a `DeltaObserver`? - ## Future possibilities -A future OTEP will request the introduction of two new standard -refinements for the 0.4 API specification. These will be a monotonic -Observer instrument named `MonotonicObserver` and a synchronous timing -instrument named `TimingMeasure`. +A future OTEP will request the introduction of several standard +refinements for the 0.4 API specification. These will be the +`DeltaObserver` and `CumulativeObserver` instruments described above +plus a synchronous timing instrument named `TimingMeasure`. From 36565f61f13a1623dda89d94bd2e961e2547314e Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 10 Mar 2020 12:03:49 -0700 Subject: [PATCH 06/15] More table --- ...-metric-instrument-optional-refinements.md | 82 +++++++++++++++---- 1 file changed, 64 insertions(+), 18 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index e6acad74d..4bfa168ba 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -195,24 +195,40 @@ the measurement. ### Counter refinement -Counter is a non-negative, sum-only refinement of the Measure -instrument. - -### Future refinements - -This leaves the potential to include other refinements by combining -the above elements. The following are current and proposed names for -three instruments that support non-negative rate reporting: - -| Foundation | Refinements | Name | -|--|--|--| -| Measure | non-negative, sum-only, non-negative-rate | Counter | -| Observer | non-negative, sum-only, non-negative-rate | DeltaObserver | -| Observer | sum-only, precomputed-sum, non-negative-rate | CumulativeObserver | - -The Counter instrument is already part of the specification. A -proposal to introduce DeltaObserver and CumulativeObserver will follow -in a future OTEP. +Counter is a sum-only, non-negative, thus non-negative-rate refinement +of the Measure instrument. + +### Standardizing new instruments + +With these refinements we can exhaustively list each distinct kind of +instrument. There are a total of twelve hypothetical instruments +listed in the table below, of which only one has been standardized. +Hypothetical future instrument names are _italicized_. + +| Foundation instrument | Sum-only? | Precomputed-sum? | Non-negative? | Non-negative-rate? | Instrument name _(hyptothetical)_ | +|--|--|--|--|--|--| +| Measure | sum-only | | non-negative | non-negative-rate | Counter | +| Measure | sum-only | precomputed-sum | | non-negative-rate | _CumulativeCounter_ | +| Measure | sum-only | | | | _UpDownCounter_ | +| Measure | sum-only | precomputed-sum | | | _UpDownCumulativeCounter_ | +| Measure | | | non-negative | | _AbsoluteMeasure_ | +| Measure | | | | | _NonAbsoluteMeasure_ | +| Observer | sum-only | | non-negative | non-negative-rate | _DeltaObserver_ | +| Observer | sum-only | precomputed-sum | | non-negative-rate | _CumulativeObserver_ | +| Observer | sum-only | | | | _UpDownDeltaObserver_ | +| Observer | sum-only | precomputed-sum | | | _UpDownCumulativeObserver_ | +| Observer | | | non-negative | | _AbsoluteObserver_ | +| Observer | | | | | _NonAbsoluteObserver_ | + +To arrive at this listing, several assumptions have been made. For +example, the precomputed-sum and non-negative-rate refeinments are only +applicable in conjunction with a sum-only refinement. + +For the precomputed-sum instruments, we technically do not care +whether the inputs are non-negative, because rate aggregation computes +differences. However, it is useful for other aggregations to assume +that precomputed sums start at zero, and we will ignore the case where +a precomputed sum has an initial value other than zero. ## Internal details @@ -222,6 +238,31 @@ should think about adding new instruments. No API changes are called for in this proposal. +## Open question + +Eleven instruments have been given hyptothetical names in the table +above, but only a subset of these should be included in the +specification. + +An open question is whether the foundational instruments should be +considered "abstract", meaning that users can only create refined +instruments. + +An argument in favor of treating the foundation instruments as +abstract goes like this: users will be confused because sometimes the +documentation and specification discusses Measure and Observer +instruments generally, and sometimes it discusses them specifically. +If the foundation instruments are abstract, this confusion is +eliminated. + +An argument against treating the foundation instruments as abstract +goes like this: by excluding these short, well-understood names from +use in the API, we force long, less-well understood names on the user, +which will leave them confused. For example, _NonAbsoluteObserver_ is +a completely unrefined Observer, and wouldn't you rather read and +write "Observer" in code? (Likewise for _NonAbsoluteMeasure_ vs +Measure.) + ## Trade-offs and mitigations The trade-off explicitly introduced here is that we should prefer to @@ -240,3 +281,8 @@ A future OTEP will request the introduction of several standard refinements for the 0.4 API specification. These will be the `DeltaObserver` and `CumulativeObserver` instruments described above plus a synchronous timing instrument named `TimingMeasure`. + +If the above open question is decided in favor of treating the +foundational instruments as abstract, instrument names like +_NonAbsoluteMeasure_ and _NonAbsoluteCounter_ will need to be +standardized. From 6643fa40868cdc64f1466b2477e87557c8cde605 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 10 Mar 2020 12:04:39 -0700 Subject: [PATCH 07/15] Typo --- text/0088-metric-instrument-optional-refinements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index 4bfa168ba..f4b3a914d 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -150,7 +150,7 @@ meaning the rate aggregation is defined by computing a difference across timestamps or collection intervals. A precomputed sum refinement implies a sum-only refinement. Note that -values assocaited with a precomputed sum are still sums. Precomputed +values associated with a precomputed sum are still sums. Precomputed sum values are combined using addition, when aggregating over the spatial dimensions; only the time dimension receives special treatment. From 6ddbf370ad39250fe025e4d425f83c8d092c2a34 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 10 Mar 2020 12:19:13 -0700 Subject: [PATCH 08/15] Add an example --- ...-metric-instrument-optional-refinements.md | 26 ++++++++++++++++--- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index f4b3a914d..aeba71207 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -238,6 +238,22 @@ should think about adding new instruments. No API changes are called for in this proposal. +## Example + +Suppose you wish to capture the CPU usage of a process broken down by +the CPU core ID. The operating system provides a mechanism to read +the current usage from the `/proc` file system, which will be reported +once per collection interval using an Observer instrument. Because +this is a precomputed sum with a non-negative rate, use a +_CumulativeObserver_ to report this quantity with a metric label +indicating the CPU core ID. + +It will be common to compute a rate of CPU usage over this data. The +rate can be calculated for an individual CPU core by computing a +difference between the value of two metric events. To compute the +aggregate rate across all cores–a spatial aggregation–these +differences are added together. + ## Open question Eleven instruments have been given hyptothetical names in the table @@ -277,10 +293,12 @@ in the August 2019 Metrics working group meeting. ## Future possibilities -A future OTEP will request the introduction of several standard -refinements for the 0.4 API specification. These will be the -`DeltaObserver` and `CumulativeObserver` instruments described above -plus a synchronous timing instrument named `TimingMeasure`. +A future OTEP will request the introduction of two standard +refinements for the 0.4 API specification. This will be the +`CumulativeObserver` instrument described above plus a synchronous +timing instrument named `TimingMeasure` that is equivalent to +_AbsoluteMeasure_ with the correct unit and a language-specific +duration type for measuring time. If the above open question is decided in favor of treating the foundational instruments as abstract, instrument names like From 419a8e01d568e933354f168d6d430dc4c710d29b Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 23 Mar 2020 22:18:27 -0700 Subject: [PATCH 09/15] Add detail and translation guidance --- ...-metric-instrument-optional-refinements.md | 232 +++++++++++++++--- 1 file changed, 197 insertions(+), 35 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index aeba71207..be68b91bb 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -52,13 +52,21 @@ properties of the associated trace and distributed correlation values. ### Last-value relationship -Observer instruments have a well-defined _last value_ measured by the -instrument that can be useful in defining aggregations. To maintain -this property, we impose this requirement: two or more calls to -`Observe()` in a single Observer callback invocation are treated as -duplicates of each other, and the last call to `Observe()` wins. - -Measure instruments do not define a _last value_ relationship. +Observer instruments have a well-defined _Last Value_ measured by the +instrument, that can be useful in defining aggregations. To maintain +this property, we impose this requirement: two or more `Observe()` +calls with an identical LabelSet during a single Observer callback +invocation are treated as duplicates of each other, where the last +call to `Observe()` wins. (This is also the specified way that +`Labels()` treats duplicates for a given label key.) + +Unlike Observer instruments, Measure instruments do not define the +Last Value relationship. One reason is that [synchronous events can +happen +simultaneously](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-metrics.md#time). +The Last Value relationship refers to values read in a single +collection period, whereas with Measure instruments we can define a +last-value aggregation. ### Aggregating changes to a sum: Rate calculation @@ -175,6 +183,27 @@ typical value to report through an Observer instrument, so the rate for a specific set of labels is defined by subtracting the prior observation from the current observation. +#### Discussion: Additive vs. Non-Additive numbers + +The refinements proposed above may leave us wondering about the +distinction between an unrefined Measure and the +_UpDownCumulativeCounter_. Both values are unrestricted, in terms of +their range, so why should they be treated differntly? + +The _UpDownCumulativeCounter_ has sum-only, precomputed-sum +refinements, which indicate that the numbers being observed are the +result of addition. These instruments have the additive property that +observing `N` and `M` separately is equivalent to observing `N+M`. +When performing spatial aggregation over data with these additive +properties, it is natural to compute the sum. + +When performing spatial aggregation over data without additive +properties, it is natural to combine the distributions. The +distinction is about how we interpret the values when aggregating. +Use one of the sum-only refinments to report a sum in the default +configuration, otherwise use one of the non-sum-only instruments to +report a distribution. + #### Language-level refinements OpenTelemetry implementations may wish to add instrument refinements @@ -221,8 +250,8 @@ Hypothetical future instrument names are _italicized_. | Observer | | | | | _NonAbsoluteObserver_ | To arrive at this listing, several assumptions have been made. For -example, the precomputed-sum and non-negative-rate refeinments are only -applicable in conjunction with a sum-only refinement. +example, the precomputed-sum and non-negative-rate refeinments are +only applicable in conjunction with a sum-only refinement. For the precomputed-sum instruments, we technically do not care whether the inputs are non-negative, because rate aggregation computes @@ -230,6 +259,19 @@ differences. However, it is useful for other aggregations to assume that precomputed sums start at zero, and we will ignore the case where a precomputed sum has an initial value other than zero. +#### Gauge instrument + +A Measure instrument with a default Last Value aggregation could be +defined, hypothetically named a _Gauge_ instrument. This would offer +convenience for users that want this behavior, for there is otherwise +no standard Measure refinement with Last Value aggregation. + +Sum-only uses for this hypothetical instrument should instead use +either _CumulativeCounter_ or _UpDownCumulativeCounter_, since they +are reporting a sum. This (hypothetical) _Gauge_ instrument would be +useful when a value is time-dependent and the average value is not of +interest. + ## Internal details This is a change of understanding. It does not request any new @@ -238,7 +280,152 @@ should think about adding new instruments. No API changes are called for in this proposal. -## Example +### Translation into well-known systems + +#### Prometheus + +The Prometheus system defines four kinds of [synchronous metric +instrument](https://prometheus.io/docs/concepts/metric_types). + +| System | Metric Kind | Operation | Aggregation | Notes | +| ---------- | ------------ | ------------------- | -------------------- | ------------------- | +| Prometheus | Counter | Inc() | Sum | Sum of positive deltas | +| Prometheus | Counter | Add() | Sum | Sum of positive deltas | +| Prometheus | Gauge | Set() | Last Value | Non-additive or monotonic cumulative | +| Prometheus | Gauge | Inc()/Dec() | Sum | Sum of deltas | +| Prometheus | Gauge | Add()/Sub() | Sum | Sum of deltas | +| Prometheus | Histogram | Observe() | Histogram | | +| Prometheus | Summary | Observe() | Summary | Aggregation does not merge | + +Note that the Prometheus Gauge supports five methods (`Set`, `Inc`, +`Dec`, `Add`, and `Sub`), one which sets the last value while the +others modify the last value. This interface is not compatible with +OpenTelemetry, because it requires the SDK to maintain long-lived +state about Gauge values in order to compute the last value following +one of the additive methods (`Inc`, `Dec`, `Add`, and `Sub`). + +If we restrict Prometheus Gauges to support only a `Set` method, or to +support only the additive methods, then we can model these two +instruments seprately, in a way that is compatible with OpenTelemetry. +A Prometheus Gauge that is used exclusively with `Set()` can be +modeled as a Measure instrument with Last Value aggregation. A +Prometheus Gauge that is used exclusively with the additive methods be +modeled as a `UpDownCounter` + +Prometheus has support for asynchronous reporting via the "Collector" +interface, but this is a low-level API to support directly exporting +encoded metric data. The Prometheus "Collector" interface could be +used to implement Observer-like instruments, but they are not natively +supported in Prometheus. + +#### Statsd + +The Statsd system supports only synchronous reporting. + +| System | Metric Event | Operation | Aggregation | Notes | +| ------ | ------------ | ------------------- | -------------------- | ------------------- | +| Statsd | Count | Count() | Sum | Sum of deltas | +| Statsd | Gauge | Gauge() | Last Value | | +| Statsd | Histogram | Histogram() | Histogram | | +| Statsd | Distribution | Distribution() | _Not specified_ | A distribution summary | +| Statsd | Timing | Timing() | _Not specified_ | A distribution summary | +| Statsd | Set | Set() | Cardinality | Unique value count | + +The Statsd Count operation translates into either a Counter, if +increments are non-negative, or an _UpDownCounter_ if values may be +negative. The Statsd Gauge operation translates into a Measure +instrument configured with Last Value aggregation. + +The Histogram, Distribution, and Timing operations are semantically +identical, but have different units and default behavior in statsd +systems. Each of these distribution-valued instruments can be +replaced using a Measure with a distribution-valued aggregation such +as MinMaxSumCount, Histogram, Exact, or Summary. + +The Set operation does not have a direct replacement in OpenTelemetry, +however one can be constructed using a Measure or Observer instrument +and a dummy value. Each distinct label set is naturally output each +collection interval, whether reported synchronously or asynchronously, +so the set size can be computed by using a metric label as the unique +element and no aggregation operator. + +#### OpenCensus + +The OpenCensus system defines three kinds of instrument: + +| System | Metric Kind | Operation | Aggregation | Notes | +| ------ | ---------------- | -------------- | ----------------- | ------------------- | +| OpenCensus | Cumulative | Inc() | Sum | Positive deltas | +| OpenCensus | Gauge | Set() | LastValue | | +| OpenCensus | Gauge | Add() | Sum | Deltas | +| OpenCensus | Raw-Stats | Record() | Sum, Count, Mean, or Distribution | | + +OpenCensus departed from convention with the introduction of a Views +API, which makes it possible to support fewer kinds of instrument +directly, since they can be configured in multiple ways. + +Like Prometheus, the combination of multiple APIs in the Gauge +instrument is not compatible with OpenTelemetry. A Gauge used with +Set() generally implies last-value aggregation, whereas a Gauge used +with Add() is additive and uses Sum aggregation. + +Raw statstistics can be aggregated using any aggregation, and all the +OpenCensus aggregations have equivalents in OpenTelemetry. + +OpenCensus supported callback-oriented asynchronous forms of both +Cumulative and Gauge instruments. An asynchronous Cumulative +instrument would be replaced by a CumulativeObserver in OpenTelemetry. +An asynchronous Last-value Gauge would be replaced by AbsoluteObserver +or just the unrestricted Observer. An asynchronous Additive Gauge +would be replaced by a DeltaObserver. + +### Proposals + +Using the information above, we can propose a set of refinements for +both synchronous and asynchronous instruments. + +#### Synchronous instruments + +The foundational `Measure` instrument without refinements or +restrictions shall continue to be called a `Measure` instrument. It is +not an abstract kind of instrument, therefore the hypothetical +`NonAbsoluteMeasure` instrument is not needed. + +Along with `Counter` and `Measure`, we recognize several less-common +but still important cases and reasons why they should be standardized: + +- _UpDownCounter_: Support Prometheus additive Gauge instrument use +- _TimingMeasure_: Support Prometheus and Statsd timing measurements. + +Instruments that are not standardized but may be in the future (and why): + +- _CumulativeCounter_: Support a synchronous monotonic cumulative instrument +- _AbsoluteMeasure_: Support non-negative valued distributions +- _LastValueMeasure_: Support a last value aggregation without configuring a view. + +Instruments that are probably not seen as widely useful: + +- _UpDownCumulativeCounter_: We believe this is better handled asynchronously + +#### Observer instruments + +The foundational `Observer` instrument without refinements or +restrictions shall continue to be called an `Observer` instrument. It +is not an abstract kind of instrument, therefore the hypothetical +`NonAbsoluteObserver` instrument is not needed. + +We have identified important cases that should be standardized: + +- _CumulativeObserver_: Support a cumulative monotone counter +- _UpDownDeltaObserver_: Support an asynchronous delta counter. + +Observer refinements that could be standardized in the future: + +- _UpDownCumulativeObserver_: Observer a non-monotonic cumluative counter +- _DeltaObserver_: Observe a sum of non-negative deltas +- _AbsoluteObserver_: Observe non-negative current values. + +## Example: Observer aggregation Suppose you wish to capture the CPU usage of a process broken down by the CPU core ID. The operating system provides a mechanism to read @@ -254,31 +441,6 @@ difference between the value of two metric events. To compute the aggregate rate across all cores–a spatial aggregation–these differences are added together. -## Open question - -Eleven instruments have been given hyptothetical names in the table -above, but only a subset of these should be included in the -specification. - -An open question is whether the foundational instruments should be -considered "abstract", meaning that users can only create refined -instruments. - -An argument in favor of treating the foundation instruments as -abstract goes like this: users will be confused because sometimes the -documentation and specification discusses Measure and Observer -instruments generally, and sometimes it discusses them specifically. -If the foundation instruments are abstract, this confusion is -eliminated. - -An argument against treating the foundation instruments as abstract -goes like this: by excluding these short, well-understood names from -use in the API, we force long, less-well understood names on the user, -which will leave them confused. For example, _NonAbsoluteObserver_ is -a completely unrefined Observer, and wouldn't you rather read and -write "Observer" in code? (Likewise for _NonAbsoluteMeasure_ vs -Measure.) - ## Trade-offs and mitigations The trade-off explicitly introduced here is that we should prefer to From ca0ae57b2d66793e59066aacce2a14fbcd5960a8 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 23 Mar 2020 23:22:55 -0700 Subject: [PATCH 10/15] About monotonicity --- ...88-metric-instrument-optional-refinements.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index be68b91bb..e409e1aac 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -401,7 +401,7 @@ Instruments that are not standardized but may be in the future (and why): - _CumulativeCounter_: Support a synchronous monotonic cumulative instrument - _AbsoluteMeasure_: Support non-negative valued distributions -- _LastValueMeasure_: Support a last value aggregation without configuring a view. +- _LastValueMeasure_: Support a last value aggregation without configuring a view (could be "Gauge"). Instruments that are probably not seen as widely useful: @@ -441,6 +441,21 @@ difference between the value of two metric events. To compute the aggregate rate across all cores–a spatial aggregation–these differences are added together. +## Open Questions + +Are there still questions surrounding the former Monotonic refinement? + +Should the _CumulativeObserver_ instrument be named +_MonotonicObserver_? In this proposal, we prefer _Cumulative_ and +_UpDownCumulative_. _Cumulative_ is a good descriptive term in this +setting (i.e., some additive values are _cumulative_, some are +_delta_). Being _Cumulative_ and not _UpDownCumulative_ implies +monotonicity in this proposal. + +For synchronous instruments, this proposals does not standardize +_CumulativeCounter_. Such an instrument might be named +_MonotonicCounter_. + ## Trade-offs and mitigations The trade-off explicitly introduced here is that we should prefer to From d4b43affcf62f5873d61a0ecc68cbafc6f39e6b3 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 6 Apr 2020 22:17:49 -0700 Subject: [PATCH 11/15] More on Last Value relationship --- ...-metric-instrument-optional-refinements.md | 50 +++++++++++++------ 1 file changed, 35 insertions(+), 15 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index e409e1aac..7f26d166c 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -24,8 +24,8 @@ measurements of changes in a sum, therefore it uses `Add()` instead of `Record()`, and it specifies `Sum` as the standard aggregation. What this illustrates is that we have modeled this space poorly. This -does not propose to change any existing metrics APIs, only our -understanding of the three instruments currently included in the +proposal does not propose to change any existing metrics APIs, only +our understanding of the three instruments currently included in the specification: Measure, Observer, and Counter. ## Explanation @@ -53,20 +53,40 @@ properties of the associated trace and distributed correlation values. ### Last-value relationship Observer instruments have a well-defined _Last Value_ measured by the -instrument, that can be useful in defining aggregations. To maintain -this property, we impose this requirement: two or more `Observe()` -calls with an identical LabelSet during a single Observer callback -invocation are treated as duplicates of each other, where the last -call to `Observe()` wins. (This is also the specified way that -`Labels()` treats duplicates for a given label key.) - -Unlike Observer instruments, Measure instruments do not define the -Last Value relationship. One reason is that [synchronous events can -happen +instrument, that can be useful in defining aggregations. The Last +Value of an Observer instrument is the value that was captured during +the last-completed collection interval, and it is a useful +relationship because it is defined without relation to collection +interval timing. The Last Value of an Observer is determined by the +single most-recently completed collection interval--it is not +necessary to consider prior collection intervals. The Last Value of +an Observer is undefined when it is not observed during a collection +interval. + +To maintain this property, we impose a requirement: two or more +`Observe()` calls with an identical LabelSet during a single Observer +callback invocation are treated as duplicates of each other, where the +last call to `Observe()` wins. + +Based on the Last Value relationship, we can ask and answer questions +such as "what is the average last value of a metric at a point in +time?". Observer instruments define the Last Value relationship +without referring to the collection interval and without ambiguity. + +#### Last-value and Measure instruments + +Measure instruments do not define a Last Value relationship. One +reason is that [synchronous events can happen simultaneously](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-metrics.md#time). -The Last Value relationship refers to values read in a single -collection period, whereas with Measure instruments we can define a -last-value aggregation. + +For Measure instruments, it is possible to compute an aggregation that +computes the last-captured value in a collection interval, but it is +potentially not unique and the result will vary depending on the +timing of the collection interval. For example, a synchronous metric +event that last took place one minute ago will appear as the last +value for collection intervals one minute or longer, but the last +value will be undefined if the collection interval is shorter than one +minute. ### Aggregating changes to a sum: Rate calculation From 56765db0cdfb8a520cbe186192ba459068e19926 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 7 Apr 2020 00:33:21 -0700 Subject: [PATCH 12/15] Clarify _aggregation_ --- ...-metric-instrument-optional-refinements.md | 52 +++++++++++++------ 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index 7f26d166c..ce14ba391 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -50,6 +50,29 @@ value](api-metrics.md#metric-event-format). Synchronous instrument events additionally have [Context](api-context.md), describing properties of the associated trace and distributed correlation values. +#### Terminology: Kinds of Aggregation + +_Aggregation_ refers to the technique used to summarize many +measurements and/or observations into _some_ kind of summary of the +data. As detailed in the [metric SDK specification (TODO: +WIP)](https://github.com/open-telemetry/opentelemetry-specification/pull/347/files?short_path=5b01bbf#diff-5b01bbf3430dde7fc5789b5919d03001), +there are generally two relevant modes of aggregation: + +1. Within one collection interval, for one label set, the SDK's +`Aggregator.Add()` interface method incorporates one new measurement +value into the current aggregation value. This happens at run time, +therefore is referred to as _temporal aggregation_. This mode applies +only to Measure instruments. +2. Within one collection interval, when combining label sets, the +SDK's `Aggregator.Merge()` interface method incorporates two +aggregation values into one aggregation value. This is referred to as +_spatial aggregation_. This mode applies to both Measure and Observer +instruments. + +As discussed below, we are especially interested in aggregating rate +information, which sometimes requires that temporal and spatial +aggregation be treated differently. + ### Last-value relationship Observer instruments have a well-defined _Last Value_ measured by the @@ -86,7 +109,7 @@ timing of the collection interval. For example, a synchronous metric event that last took place one minute ago will appear as the last value for collection intervals one minute or longer, but the last value will be undefined if the collection interval is shorter than one -minute. +minute. ### Aggregating changes to a sum: Rate calculation @@ -104,20 +127,13 @@ refinements introduced in this proposal is to facilitate rate calculations in more than one way. When delta reporting, a rate is calculated by summing individual -measurements or observations. For Measure instruments, these values -fall into a range of time, as indicated by the event timestamp. For -Observer instruments, these values fall into a range of collection -intervals. - -When cumulative reporting, a rate is calculated by computing a -difference between individual values. For an Observer instrument, we -compute rate over a range of collection intervals, and for a Measure -instrument we compute rate over a range of timestamps. In either -case, we are interested in subtracting the final value from the prior -value measured or observed on the instrument. +measurements or observations. When cumulative reporting, a rate is +calculated by computing a difference between individual values. -Note that rate aggregation, as illustrated above, treats the time -dimension differently than the other dimensions used for aggregation. +Note that cumulative-reported metric data requires special treatment +of the time dimension when computing rates. When aggregating across +the time dimension, the difference should be computed. When +aggregating across spatial dimensions, the sum should be computed. ### Standard implementation of Measure and Observer @@ -201,16 +217,18 @@ collection intervals, cannot change by a negative amount, because it is impossible to use a negative amount of CPU time. CPU time a typical value to report through an Observer instrument, so the rate for a specific set of labels is defined by subtracting the prior -observation from the current observation. +observation from the current observation. Using a non-negative-rate +refinement asserts that the values increases by a non-negative amount +on subsequent collection intervals. #### Discussion: Additive vs. Non-Additive numbers The refinements proposed above may leave us wondering about the distinction between an unrefined Measure and the _UpDownCumulativeCounter_. Both values are unrestricted, in terms of -their range, so why should they be treated differntly? +range, so why should they be treated differently? -The _UpDownCumulativeCounter_ has sum-only, precomputed-sum +The _UpDownCumulativeCounter_ has sum-only and precomputed-sum refinements, which indicate that the numbers being observed are the result of addition. These instruments have the additive property that observing `N` and `M` separately is equivalent to observing `N+M`. From 293b9f8d4ba9bc5d0f8cbdbe4f16782992e03f8b Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 7 Apr 2020 00:50:58 -0700 Subject: [PATCH 13/15] For Bogdan --- ...-metric-instrument-optional-refinements.md | 45 +++++++++---------- 1 file changed, 21 insertions(+), 24 deletions(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index ce14ba391..255146bf7 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -278,14 +278,14 @@ Hypothetical future instrument names are _italicized_. | Measure | sum-only | precomputed-sum | | non-negative-rate | _CumulativeCounter_ | | Measure | sum-only | | | | _UpDownCounter_ | | Measure | sum-only | precomputed-sum | | | _UpDownCumulativeCounter_ | -| Measure | | | non-negative | | _AbsoluteMeasure_ | -| Measure | | | | | _NonAbsoluteMeasure_ | +| Measure | | | non-negative | | _AbsoluteDistribution_ | +| Measure | | | | | _Distribution_ | | Observer | sum-only | | non-negative | non-negative-rate | _DeltaObserver_ | | Observer | sum-only | precomputed-sum | | non-negative-rate | _CumulativeObserver_ | | Observer | sum-only | | | | _UpDownDeltaObserver_ | | Observer | sum-only | precomputed-sum | | | _UpDownCumulativeObserver_ | -| Observer | | | non-negative | | _AbsoluteObserver_ | -| Observer | | | | | _NonAbsoluteObserver_ | +| Observer | | | non-negative | | _AbsoluteLastValueObserver_ | +| Observer | | | | | _LastValueObserver_ | To arrive at this listing, several assumptions have been made. For example, the precomputed-sum and non-negative-rate refeinments are @@ -332,7 +332,7 @@ instrument](https://prometheus.io/docs/concepts/metric_types). | Prometheus | Gauge | Set() | Last Value | Non-additive or monotonic cumulative | | Prometheus | Gauge | Inc()/Dec() | Sum | Sum of deltas | | Prometheus | Gauge | Add()/Sub() | Sum | Sum of deltas | -| Prometheus | Histogram | Observe() | Histogram | | +| Prometheus | Histogram | Observe() | Histogram | Non-negative values | | Prometheus | Summary | Observe() | Summary | Aggregation does not merge | Note that the Prometheus Gauge supports five methods (`Set`, `Inc`, @@ -366,7 +366,7 @@ The Statsd system supports only synchronous reporting. | Statsd | Gauge | Gauge() | Last Value | | | Statsd | Histogram | Histogram() | Histogram | | | Statsd | Distribution | Distribution() | _Not specified_ | A distribution summary | -| Statsd | Timing | Timing() | _Not specified_ | A distribution summary | +| Statsd | Timing | Timing() | _Not specified_ | Non-negative, distribution summary, Millisecond units | | Statsd | Set | Set() | Cardinality | Unique value count | The Statsd Count operation translates into either a Counter, if @@ -417,51 +417,48 @@ An asynchronous Last-value Gauge would be replaced by AbsoluteObserver or just the unrestricted Observer. An asynchronous Additive Gauge would be replaced by a DeltaObserver. -### Proposals +### Sample Proposal -Using the information above, we can propose a set of refinements for -both synchronous and asynchronous instruments. +The the information above will be used to propose a set of refinements +for both synchronous and asynchronous instruments in a follow-on OTEP. +What follows is a sample of the forthcoming proposal, to motivate the +discussion here. #### Synchronous instruments The foundational `Measure` instrument without refinements or -restrictions shall continue to be called a `Measure` instrument. It is -not an abstract kind of instrument, therefore the hypothetical -`NonAbsoluteMeasure` instrument is not needed. +restrictions will be called a `Distribution` instrument. -Along with `Counter` and `Measure`, we recognize several less-common +Along with `Counter` and `Distribution`, we recognize several less-common but still important cases and reasons why they should be standardized: - _UpDownCounter_: Support Prometheus additive Gauge instrument use -- _TimingMeasure_: Support Prometheus and Statsd timing measurements. +- _Timing_: Support Prometheus and Statsd timing measurements. Instruments that are not standardized but may be in the future (and why): - _CumulativeCounter_: Support a synchronous monotonic cumulative instrument -- _AbsoluteMeasure_: Support non-negative valued distributions -- _LastValueMeasure_: Support a last value aggregation without configuring a view (could be "Gauge"). +- _AbsoluteDistribution_: Support non-negative valued distributions Instruments that are probably not seen as widely useful: -- _UpDownCumulativeCounter_: We believe this is better handled asynchronously +- _UpDownCumulativeCounter_: We believe this is better handled asynchronously. #### Observer instruments The foundational `Observer` instrument without refinements or -restrictions shall continue to be called an `Observer` instrument. It -is not an abstract kind of instrument, therefore the hypothetical -`NonAbsoluteObserver` instrument is not needed. +restrictions shall be called a `LastValueObserver` instrument. We have identified important cases that should be standardized: - _CumulativeObserver_: Support a cumulative monotone counter -- _UpDownDeltaObserver_: Support an asynchronous delta counter. +- _DeltaObserver_: Support an asynchronous delta counter. Observer refinements that could be standardized in the future: -- _UpDownCumulativeObserver_: Observer a non-monotonic cumluative counter -- _DeltaObserver_: Observe a sum of non-negative deltas -- _AbsoluteObserver_: Observe non-negative current values. +- _UpDownCumulativeObserver_: Observe a non-monotonic cumluative counter +- _UpDownDeltaObserver_: Observe positive and negative deltas +- _AbsoluteLastValueObserver_: Observe non-negative current values. ## Example: Observer aggregation From bef8192c1b9e678389f1cd485cf54709b890fadc Mon Sep 17 00:00:00 2001 From: jmacd Date: Fri, 10 Apr 2020 11:56:00 -0700 Subject: [PATCH 14/15] Note about OTEP 93 and OTEP 96 --- text/0088-metric-instrument-optional-refinements.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index 255146bf7..e7060e02d 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -5,6 +5,15 @@ for metric instruments, declares the Measure and Observer instruments as _foundational_, and introduces a process for standardizing new instrument _refinements_. +Note that [OTEP 93](https://github.com/open-telemetry/oteps/pull/93) +contains a final proposal for the set of instruments, of which there +are seven. Note that [OTEP +96](https://github.com/open-telemetry/oteps/pull/96) contains a final +proposal for the names of the seven standard instruments. These three +OTEPs will be applied as a group to the specification, using the names +finalized in OTEP 96. + + ## Motivation With the removal of Gauge instruments and the addition of Observer From 2735cf0129a3fd311dd4ca3d9f076ce5f192d9f5 Mon Sep 17 00:00:00 2001 From: jmacd Date: Fri, 10 Apr 2020 12:07:10 -0700 Subject: [PATCH 15/15] Formatting --- text/0088-metric-instrument-optional-refinements.md | 1 - 1 file changed, 1 deletion(-) diff --git a/text/0088-metric-instrument-optional-refinements.md b/text/0088-metric-instrument-optional-refinements.md index e7060e02d..b5f7a4598 100644 --- a/text/0088-metric-instrument-optional-refinements.md +++ b/text/0088-metric-instrument-optional-refinements.md @@ -13,7 +13,6 @@ proposal for the names of the seven standard instruments. These three OTEPs will be applied as a group to the specification, using the names finalized in OTEP 96. - ## Motivation With the removal of Gauge instruments and the addition of Observer