-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rollup] Support for data-structure based metrics (Cardinality, Percentiles, etc) #33214
Comments
Pinging @elastic/es-search-aggs |
Small update.
|
Relates: #24468 |
Hi @polyfractal, do you know when this slated to go into production? |
Hi @painslie, I'm afraid I do not have an update. We'll update this issue when there's more information, or link to it from a PR. |
@polyfractal I'm curious how well the promethus histogram would line up with what you're thinking? |
HDRHistogram is essentially just a clever layout of different-sized intervals: a set of exponentially-sized intervals, with a fixed number of linear intervals inside each exponential "level". But at it's heart, it's still just a histogram of counts like Prometheus histos (and unlike algos like TDigest which are weighted centroids, etc). So it should be possible to translate a Prometheus histogram into an HDRHisto. Prometheus histos have user-definable intervals, which means the accuracy of translation will depend on how nicely the Promtheus histos line up with the HDRHisto intervals. I think any Prometheus histo should be convertible, and the accuracy of that conversion depends on the exact layout. Prometheus Summaries are an implementation of Targeted Quantiles and will be much harder to use. The output of a summary is just a percentile estimation at that point in time, which is mostly useless to us. It might be possible to convert the underlying Targeted Quantiles sketch into a TDigest since the algos share some similarities, but I suspect it won't give great accuracy. I've been told summaries aren't as common either compared to Histos, so also probably not a priority. With all that said, it's still not entirely clear how a user will convert a prometheus (or any other system's histogram output) into our datastructure. I'm kinda thinking an ingest processor would make the most sense, slurping up a prometheus histo and emitting a compatible HDRHisto-field. But I haven't spent a lot of time thinking about the ergonomics of that yet. :) |
Hi @polyfractal is there any ticket for adding weighted average support in pack rollups? |
@polyfractal A quick update here. @kbourgoin and I have implemented a custom field type for serialized HLL rollups in the ES index, along with a corresponding aggregation query that works much like |
Excellent. We have had some discussions on our end as well on what the API and implementation could look like for a histogram field for percentile aggregations and a HLL++ field for cardinality aggregations. I suspect both impls will end up looking similar. :) cc @iverase |
We will plan to build this support in Downsampling. Support for Histograms in Downsampling is pending, Design is in place, ready to be prioritized as soon as we have availability. With the 8.7 release of Elasticsearch, we have made a new downsampling capability associated with the new time series datastreams functionality generally available (GA). This capability was in tech preview in ILM since 8.5. Downsampling provides a method to reduce the footprint of your time series data by storing it at reduced granularity. The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the min, max, sum, value_count, and average for each metric. Data stream time series dimensions are stored unchanged. Downsampling is superior to rollup because:
Because of the introduction of this new capability, we are deprecating the rollups functionality, which never left the Tech Preview/Experimental status, in favor of downsampling and thus we are closing this issue. We encourage you to migrate your solution to downsampling and take advantage of the new TSDB functionality. |
@wchaparro the new downsampling feature looks great, but it still doesn't support percentiles. Downsampling a fixed set of percentiles such as median, 75th, 90th, 95th and 99th, is a very common use case for reporting latencies so I bet a lot of ElasticSearch users could benefit for having percentiles in the downsample feature. |
We would like to support more complex metrics in Rollup such as cardinality, percentiles and percentile ranks. These are trickier since they are calculated from data sketches rather than simple numerics.
They also introduce issues with backwards compatibility. If the algorithm powering the sketch changes in the future (improvements, bug-fixes, etc) we will likely have to continue supporting the old versions of the algorithm. It's unlikely that these sketches will be "upgradable" to the new version since they are lossy by nature.
I see two approaches to implementing these types of metrics:
New data types
In the first approach, we implement new data types in the Rollup plugin. Similar to the hash, geo or completion data types, these would expect input data to adhere to some kind of complex format. Internally it would be stored as a compressed representation that could be used to build the sketch (e.g. a
long[]
which could be used to build a HLL sketch).The pro's are strong validation and making it easier for aggregations to work with the data. Another large positive is that it allows external clients to provide pre-built sketches as long as they follow the correct format. For example, edge-nodes may be collecting and aggregating data locally and just want to send the sketch.
The cons are considerably more work implementing the data types. It may also not be ideal to expose these data structures outside Rollup, since they carry the aforementioned bwc baggage.
Convention-only types
Alternatively, we could implement these entirely by convention (like the rest of Rollup). E.g. a
binary
field can be used to hold the appropriate data sketch, and we just use field naming to convey the meaning. Versioning can be done with a secondary field.The advantage is much less upfront work...we can just serialize into fields and we're off. It also limits the impact of these data types, since only Rollup will be equipped to deal with the convention (less likely for a user to accidentally use one and then run into trouble later).
Big downside is that external clients will have a more difficult time providing pre-built sketches, since the format is just convention and won't be validated until search time. It also feels a bit more fragile since it is another convention to maintain.
BWC
In both cases, Rollup will probably have to maintain a catalog of "old" algorithms so that historical rollup indices can continue to function. Not ideal, but given that these algos don't change super often it's probably an ok burden to bear.
The text was updated successfully, but these errors were encountered: