-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HistogramExemplarReservoir and performance impact #3953
Comments
Regarding "last-seen" this was for compatibility with existing prometheus histogram behavior (as is the reservoir-matches-bucketing). We did a lot of experimentation around exemplars and I actually think we MAY be able to alter the default to use the FixedSizeReservoir, as we get decent performance/coverage out of that. We expect to spend some time on new algorithms. In this investigation, though, we saw pretty good performance from FixedSizedReservoir. It has less guarantees about capturing "edges" but given its nature, may get better coverage of split-means, e.g. Regarding performance - in Java we pre-allocate cells for the reservoir, so no allocations occur in the hot path. Is this ugly in a GC language, yes. Does it matter for performance? yes. See: https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/ReservoirCell.java.
FYI - in Java we're using a special clock designed for trace that only hits one of the system clocks to diff duration from start-of-span. It helps dramatically here. Unfortunately, I don't think we should relax this requirement, it's one of the key values of an exemplar vs. a metric in that it can help you time-align other signals with an incident. |
Regarding the suggestions -
|
got it! Are you looking for feedback from end-users or from the people who implement Exemplars in OTel sdks? Most of end-users would not know (or care) if the SDK defaults pick first-seen vs last-seen in |
@cijothomas This issue seems to be asking for a new exemplar reservoir specific to histograms, and to consider using that new reservoir as the default for histograms. This shouldn't be a blocker to exemplar stabilization because:
Can you either change the title of this request / framing of this issue, or open a new issue? Thanks. |
@cijothomas please take a look at @jack-berg's comment, can you provide the required details? Thank you! |
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#alignedhistogrambucketexemplarreservoir has the wording "This implementation MUST keep the last seen measurement that falls within a histogram bucket". Consider the example where Delta temporarily is used.
What is the reasoning behind preferring the "last-seen" measurement within a histogram bucket? In other words, would "first-seen" be as good? (I am only talking about Delta temporality).
From a performance standpoint, "last-seen" causes additional work, without any additional value. Specifically, "last-seen" means the Reservoir must create a new Exemplar and overwrite it in the existing Exemplar slot. Overwriting itself might be okay, but creating a new Exemplar requires memory allocation for value,traceid,spanid,timestamp/dropped-attributes. It is entirely possible that value,traceid,spanid already exists in memory, so less of a concern, but timestamp has to be generated (system call + memory allocation most likely), and possibly dropped-attributes.
Also, "first-seen", allows the possibility of further perf optimizations in sdk implementations. For example, if an Exemplar is already recorded in an interval, the SDK can short-circuit a lot, knowing the slot is already filled in, and no longer have to worry about Exemplar until next collection.
Proposal: Could we change the algorithm to use "first-seen" for Delta Aggregation, and keep current (last-seen) for Cumulative.
Alternate: If timestamp is the only perf concern, could we relax the wording on the
time
. It currently says "The time the API call was made to record a Measurement", but could we relax it, so that implementations can pick the endtime instead, and avoid the cost of obtaining time in hot path?Related discussion (Cumulative focused): #3891
Because of #3943, I don't think this should block #3870, but I'd love to see if we can change defaults before declaring stable.!
The text was updated successfully, but these errors were encountered: