Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent empty OTLP's Int/DoubleSum from being sent #1976

Closed
carlosalberto opened this issue Nov 2, 2020 · 17 comments
Closed

Prevent empty OTLP's Int/DoubleSum from being sent #1976

carlosalberto opened this issue Nov 2, 2020 · 17 comments
Labels

Comments

@carlosalberto
Copy link
Contributor

Currently, if a given counter has no reported values in a given interval (that is, MetricData.getPoints().isEmpty == true), it will be sent anyway (was OTLP's IntSum/DoubleSum without any data point at all).

Is this needed at all? Maybe we can simply ignore them in such a case? Or is there any reason why we keep on sending them, even if empty?

@jkwatson
Copy link
Contributor

jkwatson commented Nov 2, 2020

This sounds like a bug, but I'm not sure what the collector expects in these cases. @bogdandrutu do you know?

@jrcamp
Copy link

jrcamp commented Nov 4, 2020

@carlosalberto I'm not aware of anything in the collector that would make use of metricdata if there are zero datapoints. However I was wondering why is it generating empty metricdata instead of having a datapoint with value of zero?

@jkwatson
Copy link
Contributor

jkwatson commented Nov 4, 2020

@carlosalberto I'm not aware of anything in the collector that would make use of metricdata if there are zero datapoints. However I was wondering why is it generating empty metricdata instead of having a datapoint with value of zero?

Currently our Sum aggregations are all cumulative, so a zero report would mean that the counter had been reset, which would not be correct.

@jrcamp
Copy link

jrcamp commented Nov 4, 2020

If it's a cumulative shouldn't a datapoint value always be reported each interval then? Even if the value is unchanged it should still be sent as datapoints can be dropped at the collector or further upstream.

@keitwb
Copy link

keitwb commented Nov 4, 2020

Yes, it is generally more robust to report something on a regular interval, even if it hasn't changed. It might not be as optimal but it will be much more robust for the backends that ultimately consume this data.

If there are any delta values where it reports only the difference since the last report, it would be good to emit 0s as well.

@jmacd
Copy link

jmacd commented Nov 5, 2020

I see that we need to add some specification to clarify this matter. In the OTel-Go SDK the Processor component has two distinct behaviors:

  1. It can keep state needed for DELTA to CUMULATIVE translation
  2. it can keep state needed purely for memory

Both of these mechanisms are needed for a Prometheus exporter that doesn't otherwise manage its own state. When we discuss what the behavior should be for OTLP export, both (1) and (2) are valid behaviors.

open-telemetry/opentelemetry-specification#731 states that (1) should be on by default, but we haven't specified what to do about (2). I'm afraid this has to be tackled in the specification, but's tangential to the question here.

One thing we can say, I think, is that we should never send empty data points. Either send the current value because it has changed or because of a memory configuration (2), or do not send a metric at all, but never send a metric without points. We expect to not Sum data points when reporting DELTAs and the value is zero, I would say that SDKs SHOULD not send 0-valued DELTAs, but this implicitly says something about memory behavior.

We might simply state for the metrics SDK specification that memory behavior should be enabled when reporting CUMULATIVE temporality, but this is a stretch: the question is when should an OTLP exporter report old values for Gauges that were previously reported. Note that Gauge points do not have temporality, so the rule "memory when cumulative" does not literally apply.

@carlosalberto would be happy for you to file a spec issue on this. 😀

@jmacd
Copy link

jmacd commented Nov 5, 2020

Just to clarify: I am not aware of any reason ever to send an empty data point.

@jkwatson I would not interpret an "empty" point as a timestamp reset. The StartTimeUnixNano field should be used for the start timestamp and is independent of whether there are points.

There is a legitimate behavior, which I call "Memory", that tells the SDK to continue to export values for metrics that are not updated during an interval and remember them indefinitely. This behavior is generally paired with cumulative reporting, and it is required for Prometheus. When configuring the SDK for cumulative-reporting-with-memory, it should continue to report the last value (for gauges) and cumulative values (for counters and histograms)

I would argue that this "Memory" behavior is a choice: whether to report no metric or to report an old value, but in either case we should not report an empty Metric.

Although this is not written into the SDK specification (there should be a spec issue on this topic), but I'll point you to the optional behavior in OTel-Go, where it is built in to the basic Processor component: https://github.com/open-telemetry/opentelemetry-go/blob/9ac3a08eef14d96687edb260e2c27114a658d6b7/sdk/metric/processor/basic/config.go#L34

@jkwatson
Copy link
Contributor

jkwatson commented Nov 5, 2020

@jkwatson I would not interpret an "empty" point as a timestamp reset. The StartTimeUnixNano field should be used for the start timestamp and is independent of whether there are points.

I wasn't saying that an empty point would work that way, but one with an explicit '0' sent as the point's value.

@jkwatson
Copy link
Contributor

jkwatson commented Nov 6, 2020

BTW, I believe this "view" PR would let us decide to always send the last value recorded, for last-value aggregations, even if nothing had been reported in an interval, by specifying the "CUMULATIVE" temporality for whatever you need: #2037

(It wouldn't prevent empty records, but that's a separate issue)

@jrcamp
Copy link

jrcamp commented Nov 6, 2020

I would say that SDKs SHOULD not send 0-valued DELTAs, but this implicitly says something about memory behavior.

@jmacd Are you saying if a delta time series only reports 0 it would never send any datapoints? Or if it reports 10, then 0, then 20, it only reports the 10 and 20 but not the 0 between?

@bogdandrutu
Copy link
Member

@jrcamp I think it is not about reporting "0" it is about reporting or not anything at all, I think this is a sequence of events and what to export:

-> t0
report 1
report 1
report -1
-> t1 -> export delta "1"
// nothing reported
-> t2 -> nothing exported
report 0
report 0
-> t3 -> export delta "0"
report -1
-> t4 -> export delta "-1"

@jmacd
Copy link

jmacd commented Nov 12, 2020

I agree with @bogdandrutu's example and would like us to specify this behavior. When there is a metric event, a value should be exported regardless of whether it matches the empty state. We will say SDKs MUST (or SHOULD) emit 0-valued DELTA-AggregationTemporality points and SHOULD/MUST NOT emit an empty state in these cases? I.e., in this sequence we must export delta "0":

report 0
report 0
-> t3 -> export delta "0"

@jkwatson
Copy link
Contributor

If we specify that all delta-temporality aggregations must report "zero valued" values every report cycle if no recordings are made during that cycle, that will impact all exporters, not just OTLP. Are we ok with that? Also, this does mean that the SDK must hold on to all references to all instruments for the lifetime of the process. Is that ok?

@jmacd
Copy link

jmacd commented Nov 12, 2020

Are we ok with that?

No, I don't think so. I believe the point being made is that zero-valued deltas SHOULD be reported IFF a recording is made. In open-telemetry/opentelemetry-specification#1198 I am trying to make this requirement clear: if "no recording is made" after some time the Accumulator SHOULD erase its memory of this instrument. The discussion here refines this requirement to assert that the Accumulator MUST output an Accumulation for any instrument/labelset that is used (i.e., "recorder") during an interval and that the Accumulator MUST NOT output an Accumulation for any instrument/labelset that is not used during an interval.

The point of open-telemetry/opentelemetry-specification#1198 is that the Processor is the place that MAY be configured to keep indefinite state, for one of two reasons: (1) because DELTA->CUMULATIVE was requested, (2) because Memory was requested.

Another way to consider this requirement: the SDK SHOULD be factored into Accumulator/Processor/Exporter such that an implementation of Prometheus export is achieved where both the Accumulator and the Exporter are stateless.

@jkwatson
Copy link
Contributor

Ah, ok. I misunderstood what was being proposed. Totally fine with reporting something is something is recorded, and not if not (for delta temporality in the Processor aggregator).

@jkwatson
Copy link
Contributor

jkwatson commented Mar 5, 2021

@carlosalberto can we close this now?

@jkwatson
Copy link
Contributor

closing this due to inactivity, and having had a related PR merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants