Give a weight for documents in aggregations #8094

rnonnon-ebiz · 2014-10-15T13:02:40Z

Hi all,

I'm wondering something about aggregations, say "Percentiles" (although it could be fine to get it with other aggregations).
When a percentile aggregation is processed, it uses a specific field as a reference. If the 50th percentile for field 'f' is 10, it means there are 50% of documents with 'f' under 10.
=> Each document has the same weight in the aggregation ( => 1)

I'm wondering if it could be possible to give a different weight for each document using another field in the document.
The following Gist give an example of what I'd like to do : https://gist.github.com/rnonnon/093c111014bd14a46efe

I'd like to compute some percentiles on the "age" field. But for each document, there is a "count" field associated.
For example, there are 5 persons who are 10 years old ; 1 who is 20 years old...
If the percentile agg runs, it won't use my factor(number of person) to compute the percentile, it will count the number of documents...
I don't think that feature is natively supported, but, do you guess it could be easily supported? Do you think it makes sense to implement that?

Why am I asking this?
I'm using percentile (and other) aggregation over around 70 000 000 documents and I use only 1 node. ES uses my 8 cores at 100% for a while :s... Then I try to reduce the number of documents by grouping them, but I can't use aggregation in the same way...

Thanks.

jpountz · 2015-02-20T10:53:34Z

Your suggestion is achievable with a script that would create an array that contains count occurrences of the age value but it would not make things faster I'm afraid.

The algorithm that we use for percentiles (t-digest) is not so fast because it tries to work on all kinds of data.

We have another issue open in order to add support to HdrHistogram: #8324. It is faster but has relative accuracy: percentiles would be more accurate when values are close to 0 and vice-versa. This typically works very well when working with eg. response times (since you care about microsecond precision for millisecond response times, but usually only about second precision for hour response times). Would it work in your case too?

clintongormley · 2015-11-21T19:48:45Z

No further feedback. Closing

clintongormley added the discuss label Oct 16, 2014

clintongormley closed this as completed Nov 21, 2015

romainneutron mentioned this issue Apr 24, 2019

Add a weight for documents in percentiles aggregations #41479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give a weight for documents in aggregations #8094

Give a weight for documents in aggregations #8094

rnonnon-ebiz commented Oct 15, 2014

jpountz commented Feb 20, 2015

clintongormley commented Nov 21, 2015

Give a weight for documents in aggregations #8094

Give a weight for documents in aggregations #8094

Comments

rnonnon-ebiz commented Oct 15, 2014

jpountz commented Feb 20, 2015

clintongormley commented Nov 21, 2015