-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ability to capture all metric raw data without aggregation #617
Comments
I believe this would be part of SDK spec. I am also interested in getting a hook in the Metric SDK to get raw metric events and also aggregated metric values. My use case for this is different: Having hooks to get raw events( similar to aggregator) and aggregated events (MetricProcessor) will help me achieve my scenario as well. |
The Aggregator interface that is currently described in the draft SDK specification and in the OTel-Go library does receive raw events. An aggregator that handles raw events and does not compute checkpoints could be referred to as a "pass-through" aggregator. This has been requested, but no prototypes exist. The metrics SDK describes an "Integrator" which is like the span processor. The integrator receives checkpoints output by the aggregators (one per distinct label set). I believe it is possible to compute multiple aggregations as described without a pass-through aggregator. To achieve multiple aggregations for different label sets, one checkpoint (for one label set) will be merged into multiple output aggregations according to the labels. To achieve multiple intervals, combine shorter aggregations into longer aggregations. Example: counter x.y.z has labels "a" and "b". x.y.z{a=1,b=1} = 250 The collection interval has four aggregators, each with a count of 250 and a distinct label set. To expose an aggregation over "a", dropping "b", combine two pairs: Likewise, the aggregation over "b", dropping "a", combine two pairs: Now suppose this covers a 1 second interval. You'd like to expose a 1 minute interval. Now take the output from the previous step, and combine 60 consecutive intervals together. As demonstrated (I hope), you shouldn't need a pass-through aggregator to perform multiple aggregations--your goals can be accomplished by merging aggregator checkpoints I think. The request for a pass-through aggregator would make more sense in an aggregator that computes something derived from time and is not window-oriented, like computing an exponential moving average over the last 5 minutes (regardless of interval). |
@cijothomas Mostly my response above was for your statement. |
Thanks @jmacd. If we want to have it work with OTel Collector, I assume there is no proper way that we can send the metrics with structured logs in the current metrics data model until Logs data model spec is ready for Collector. I looked Metrics SDK Draft Spec, if I get it right will the new aggregator
|
It will be very nice If we can get the synchronous instruments for raw measurements support in GA. So it will help us to fully support EMF metrics with logs format in the collector and it is also possible the solution for us to support AWS Lambda customers with OT integration |
@jmacd are you ok to support this for GA? |
I am conflicted about a few things here. We have @cnnradams working on an exemplars prototype, and I encouraged him to go a little further than simply capturing trace/span IDs in histogram buckets. As far as exemplars go, we can assert that there are sensible reasons to include exemplars of any measurement when there has been dimensionality reduction, and that we can include statistical weight to make such exemplars more useful. That said, not many consumers of this statistical sample data are asking for this, and there is not a widespread belief that we need this. What's the minimum viable product here? If we just fall back to trace_id and span_id, would we include these in the data points directly? Should we be limited to only one trace_id/span_id per data point? Should we try to include the removed labels when exporting exemplars? Do we care about statistical exemplars? All of these questions come up when I think about how to represent raw values, since exemplars are practically the same where the statistical count of each exemplar is 1. There has been some debate about how such values should appear in OTLP: there is a concern that raw values will tend to repeat their label sets and that the current shape of the Metric data point does not offer a way to repeatedly use label sets. Does it matter if every raw data point copies the effective label set, or is it important to combine raw values into a group and list their label set only once? If we don't care about this performance/cost concern, then it's stragithforward to represent raw values and exemplars as a repeated RawValue in the Metric, alongside the scalar, Histogram, and Summary repeated values. In this case, if the data point is actually raw data, then the value @bogdandrutu and @open-telemetry/specs-metrics-approvers please consider. I don't want this issue to block OTLP, but I want to give @cnnradams a target he can hit at the same time. |
I think this is the important question here.
For an MVP, I don't think so. It seems like from what @mxiamxia is asking for it would be satisfied with what you described:
It seems to me that if you are asking for "all the data" from the SDK you likely already assume it to be sent in a non-optimized format. My guess is this will be seen as a benefit as the users is looking for full control in all statistical sampling decisions here. As for trace info:
Could this be added afterwards in a backwards compatible way? I think it would be useful, but if it blocks the MVP I don't see it being that useful. |
@MrAlias in the protocol we can add any new data type in a backwards compatible way (there is a need to deploy the collector first, but that is true for any distributed system). In terms of the SDK, this is another aggregation type "AllRawValues" so I don't see why cannot be added later. |
These are all good points. IIRC AWS engineers were asking for this. I feel like they should probably be a part of the conversation. cc @alolita (not sure who else to ping here). |
From metrics data model sig:
|
The SDK spec allows processors to have access to the raw data with this PR #1673. |
There is an interesting discussion in CNCF community slack. Some folks have shared some potential workarounds for now. Linking it here in case if may be useful to readers of this issue. |
The majority of discussion around this happened long before the stabilization of the metrics APIs and the OTLP protocol for metrics. Is there an appetite now for instating something like the Exact (the old name, i don't mind what it's called) aggregator and equivalent OTLP format now and how would we go about it? At it's core our metrics system doesn't do bucketted histograms and so we're blocked on exporting OTLP from the SDK today which is very limiting. Exporting the complete measurement array is the best way to get the full data into the collector which allows us to write the export in only one place. |
I feel this is interesting, I do understand the need, but I don't see a high demand/priority. @jmacd @jsuereth I know you're busy with other workstreams, just curious if you think this can be achieved with either one of the following approaches?
|
I would really love to see this feature come to fruition! My team maintains a service with a very bursty usage pattern and we loose a lot of fidelity in our metrics because there is no way to turn off aggregation. Similar to AWS CloudWatch, Azure Application Insights has powerful tools to query and visualize raw metric data. I just wish I could get all my metric data there more easily. |
In OT Metric API Instruments Spec, we defined that each type of metric instrument should be binding with a default aggregation behavior / aggregator. In AWS CloudWatch, we support to capture all raw metric measurement events and assemble it as Embedded Metric Format(EMF) in a structured log and so CloudWatch backend can process all raw metric data and provide all metrics visualization capabilities.
We'd like to implement such Exporters for SDK and Collector to support EMF metric format but it requires that we can capture all the raw metric events without aggregating. I see we have Views API WIP to allow customers customizing aggregator. We do want to start the work on this ASAP and looking for the suggestions here if there is a way we can capture all the raw metrics event data(especially not SummaryDataPoint) and pass it to OT collector so we can bring in AWS exporters to support it? Thanks.
The text was updated successfully, but these errors were encountered: