-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Exemplar + SimpleFixedSizeExemplarReservoir + Cumulative aggregation #3891
Comments
On the next collection, if there are no other measurements offered, you will send The order of the exemplars in the reservoir, or the export, has no significance. Remember, they should be randomly replaced once the reservoir is full anyway. Users are not expected to understand or use the order they are exported in any way. The only important part here is going to be consistent and complete in your algorithm. Given you have reset your count and need to replace all of the previous exemplars before replacing the new it is likely easier to structure your algorithm in a predicable way so you don't have to track these replacements. For example, if you make measurement The example algorithm R makes the replacements by starting at the front of the storage and working to the end before starting random replacements. You are free to, say, start at the back and work to the front, or maybe some middle out approach, but the required thing is you are consistent in your complete replacement of prior measurements before you start random evictions. |
I agree with most of this. The part I don't agree with is that in cumulative we should ever reset It might help in the Something like htis... SimpleFixedSizeExemplarReservoir...existing content goes here minus the reset bit... Aggregation behavior and collectionIn delta temporal aggregation any stateful portion of sampling computation SHOULD be reset every collection cycle. For the above example, that would mean that the Aggregation exampleGiven a
|
You will prioritize exemplars from the start of the application instead of the most recent collection cycle given the probability of inclusion diminishes the more observations you see. That is not what the user wants. Users do not want to see measurements from the first collection cycle in the Nth collection cycle, unless there haven't been any measurements. |
I don't see how your logic solves anything. Let's say my app gets very busy and I fill up a
A collection fires. Now my app is less busy and I see only two new exemplars: K, L. A collection fires.
My app stays less busy and I see only two exemplars: M, N. A collection fires.
We are still sending things which were part of the first cycle, except we've given unfair weight to new things.
Is that what users want? In cumulative they want to see everything which has happened between start & end times which are the lifetime of the process, no? |
How is this unfair? |
In your example what if the next collection cycle is just as busy as before, with hundreds of measurements offered. Based on random probability you still return Is this really what they want? They want measurements from previous collections even though there have been hundreds more? |
TBH, I don't know 😄 The spec does say...
Collection cycle doesn't seem relevant, only start/stop time. For cumulative our start/stop is the lifetime of the process so why wouldn't I see a mix of exemplars recorded over the lifetime of the process for any given collect? |
Agree. This is currently ensured by the spec. |
The TC discussed about this during the spec issue triage, here is the suggestion - check what Prometheus does, and if Prometheus is doing the same (only have exemplars from the latest collection cycle), we can close the issue as it aligns with the current spec. FYI @dashpole if you could help to confirm. |
Prometheus has exemplar storage ...
This can be verified here. Prometheus does match our behavior in that they only show the latest collection cycle if there were enough exemplars offered. They explicitly evict oldest from the last collection cycle first, like the example in our spec suggest. |
@MrAlias I think that is about how exemplars are stored in the prometheus server. For clients: For histograms, Prometheus stores one exemplar per-bucket. When a new observation is made with an exemplar, it always overwrites the exemplar for that bucket: https://github.com/prometheus/client_golang/blob/14259fa70cfb69f1262f69fdfe58ed5e6318d441/prometheus/histogram.go#L745. For counters, it only keeps a single exemplar. When a new observation is made with an exemplar, it always overwrites the existing exemplar for the counter: https://github.com/prometheus/client_golang/blob/14259fa70cfb69f1262f69fdfe58ed5e6318d441/prometheus/counter.go#L172. If there haven't been any observations for the counter during a scrape interval, it would still export the exemplar from the previous interval. Overall, Prometheus will always return the most recent exemplar recorded for a series regardless of whether it has been exported/scraped. |
I support the statement by @reyang -- we should ensure the default behavior with cumulative aggregation matches Prometheus. I agree with @MrAlias that the "unfairness" is intentional, that the specified behavior as written ensures that recent sampled traces will be exposed while they are still recent. I agree with @CodeBlanch that the concept of "collection time" is awkward when talking about cumulative data.
I want to make sure this concept is well-defined, as it stems from an algorithmic problem with simple random sampling. When you have a two populations of different sizes, and you take a simple random sample of equal size from each, the resulting samples cannot be recombined fairly using a simple algorithm, such that the items are equally representative (*). What we can do, instead, is combine the two samples into unequal-probability or weighted samples. My preferred algorithm for this task is VarOpt. Here's a Golang reference implementation. Here's how this can work for cumulative collection-- Say we define a semantic conventional-attribute, to be included in the exemplar, stating the adjusted count of the exemplar (e.g. In order to compute a single, fair sample over the lifetime of all cumulative collection cycles, create a VarOpt sampler of the intended size. Next, enter every exemplar that has been (ever) collected into the VarOpt sampler using the This is a powerful technique because it applies to all subsets of collected exemplars. Choose any subset of exemplars and sum their adjusted counts (or recombine them and sum their recombined adjusted counts), and the resulting total is expected to match the cumulative counter or histogram-counter from which it was derived. This means you can actually dispense with the timeseries data entirely, a set of exemplars with accurate adjusted counts can estimate any timeseries you might have otherwise computed, and this works with any aggregation temporality. @CodeBlanch If this interests you, I would be glad to support development on this semantic convention. I have this prototype for the Lightstep metrics SDK. (*) This is not to say there are not equal-representativity algorithms -- see "Conditional Poisson Sampling" schemes. |
@jmacd I'm happy to work on this with you but I'll admit, a lot of that went over my head 🤣 Let me put down what I think you are saying: For |
The spec for SimpleFixedSizeExemplarReservoir says:
The spec for ExemplarReservoir says:
I'm working on exemplars for .NET and I want to make sure I understand this correctly.
It makes total sense to me for delta aggregation. But let's say we are in cumulative aggregation.
I have
SimpleFixedSizeExemplarReservoir
with 3 slots. I have recorded 3 exemplars A, B, & C. A collection occurs and I export A, B, & C.I reset
num_measurements_seen
after that collection following what the spec says to do.Now a new measurement happens and it goes into the first slot due to the reset.
On the next collection do I send...
or
The first case seems a little odd (to me) given we're in cumulative mode. But so does the second. Why would we give unfair priority to the first slot? If I don't reset
num_measurements_seen
then D may or may not end up in any slot and we would see either A, B, C again or one of (D, B, C / A, D, C / A, B, D). This seems more reasonable (to me) for cumulative mode.Apologies in advanced if this has already been discussed.
The text was updated successfully, but these errors were encountered: