You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering something about aggregations, say "Percentiles" (although it could be fine to get it with other aggregations).
When a percentile aggregation is processed, it uses a specific field as a reference. If the 50th percentile for field 'f' is 10, it means there are 50% of documents with 'f' under 10.
=> Each document has the same weight in the aggregation ( => 1)
I'd like to compute some percentiles on the "age" field. But for each document, there is a "count" field associated.
For example, there are 5 persons who are 10 years old ; 1 who is 20 years old...
If the percentile agg runs, it won't use my factor(number of person) to compute the percentile, it will count the number of documents...
I don't think that feature is natively supported, but, do you guess it could be easily supported? Do you think it makes sense to implement that?
Why am I asking this?
I'm using percentile (and other) aggregation over around 70 000 000 documents and I use only 1 node. ES uses my 8 cores at 100% for a while :s... Then I try to reduce the number of documents by grouping them, but I can't use aggregation in the same way...
Thanks.
The text was updated successfully, but these errors were encountered:
Your suggestion is achievable with a script that would create an array that contains count occurrences of the age value but it would not make things faster I'm afraid.
The algorithm that we use for percentiles (t-digest) is not so fast because it tries to work on all kinds of data.
We have another issue open in order to add support to HdrHistogram: #8324. It is faster but has relative accuracy: percentiles would be more accurate when values are close to 0 and vice-versa. This typically works very well when working with eg. response times (since you care about microsecond precision for millisecond response times, but usually only about second precision for hour response times). Would it work in your case too?
Hi all,
I'm wondering something about aggregations, say "Percentiles" (although it could be fine to get it with other aggregations).
When a percentile aggregation is processed, it uses a specific field as a reference. If the 50th percentile for field 'f' is 10, it means there are 50% of documents with 'f' under 10.
=> Each document has the same weight in the aggregation ( => 1)
I'm wondering if it could be possible to give a different weight for each document using another field in the document.
The following Gist give an example of what I'd like to do : https://gist.github.com/rnonnon/093c111014bd14a46efe
I'd like to compute some percentiles on the "age" field. But for each document, there is a "count" field associated.
For example, there are 5 persons who are 10 years old ; 1 who is 20 years old...
If the percentile agg runs, it won't use my factor(number of person) to compute the percentile, it will count the number of documents...
I don't think that feature is natively supported, but, do you guess it could be easily supported? Do you think it makes sense to implement that?
Why am I asking this?
I'm using percentile (and other) aggregation over around 70 000 000 documents and I use only 1 node. ES uses my 8 cores at 100% for a while :s... Then I try to reduce the number of documents by grouping them, but I can't use aggregation in the same way...
Thanks.
The text was updated successfully, but these errors were encountered: