-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New type of aggregation #9791
Comments
Let me know if you need more information. I can provide you a real example. |
@colings86 this sounds like it would fit into the new aggs features? |
@clintongormley yep this would fit into the new agg features in #9876, although it would probably need to be done through the scriptable implementation. Closing in favour of that issue (#9876) |
Thank you very much for adding this to the new aggregation framework! |
@zmathe I may be totally missing something but in your aggs examples, I see that the kibana version does a date histogram first, then prodcut terms aggs and then sums the quantity. The request under
is different. It does terms aggs first, then date histogram, then summing the quantity. I wonder if just changing the agg order in kibana will also give you what you want? |
@bleskes Changing the order does not work in Kibana, because there is an extra step (which calculates the averages), which can not be done in ES. I had to do this extra step in the client side (in the JS). POST /example/transactions/_bulk
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time": "2015-02-09T09:00:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "b", "time": "2015-02-09T16:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "b", "time": "2015-02-09T16:30:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time":"2015-02-09T09:00:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time": "2015-02-09T09:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 2, "Product": "b", "time": "2015-02-09T16:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time":"2015-02-09T09:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 2, "Product": "b", "time": "2015-02-09T16:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time": "2015-02-09T09:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 2, "Product": "b", "time": "2015-02-09T16:15:00Z"} Kibana is using the following query to plot the quantity over the time, group by product: GET /example/_search/?search_type=count
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*"
}
},
"filter": {
"bool": {
"must": [{
"range": {
"time": {
"gte": 1423399451544,
"lte": 1423631917911
}
}
}],
"must_not": []
}
}
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "time",
"interval": "3600000ms",
"min_doc_count": 1,
"extended_bounds": {
"min": 1423399451544,
"max": 1423631917911
}
},
"aggs": {
"4": {
"terms": {
"field": "Product",
"size": 0,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"sum": {
"field": "quantity"
}
}
}
}
}
}
}
}
The result of this query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0,
"hits": []
},
"aggregations": {
"3": {
"buckets": [
{
"4": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"1": {
"value": 5
},
"key": "a",
"doc_count": 5
}
]
},
"key_as_string": "2015-02-09T09:00:00.000Z",
"key": 1423472400000,
"doc_count": 5
},
{
"4": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"1": {
"value": 8
},
"key": "b",
"doc_count": 5
}
]
},
"key_as_string": "2015-02-09T16:00:00.000Z",
"key": 1423497600000,
"doc_count": 5
}
]
}
}
}
Changing the aggs order the query is:
GET /example/_search/?search_type=count
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"gte": 1423147775940,
"lte": 1423752575940
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"2": {
"terms": {
"size": 0,
"field": "Product"
},
"aggs": {
"end_data": {
"date_histogram": {
"field": "time",
"interval": "1080000ms",
"min_doc_count": 1,
"extended_bounds": {
"min":1423147775940,
"max":1423752575940
}
},
"aggs": {
"times": {
"terms": {
"field": "time"
},
"aggs": {
"total_jobs": {
"sum": {
"field": "quantity"
}
}
}
}
}
}
}
}
}
}
The result of this query is:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0,
"hits": []
},
"aggregations": {
"2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 5,
"end_data": {
"buckets": [
{
"key_as_string": "2015-02-09T09:00:00.000Z",
"key": 1423472400000,
"doc_count": 5,
"times": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1423473300000,
"key_as_string": "2015-02-09T09:15:00.000Z",
"doc_count": 3,
"total_jobs": {
"value": 3
}
},
{
"key": 1423472400000,
"key_as_string": "2015-02-09T09:00:00.000Z",
"doc_count": 2,
"total_jobs": {
"value": 2
}
}
]
}
}
]
}
},
{
"key": "b",
"doc_count": 5,
"end_data": {
"buckets": [
{
"key_as_string": "2015-02-09T16:12:00.000Z",
"key": 1423498320000,
"doc_count": 4,
"times": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1423498500000,
"key_as_string": "2015-02-09T16:15:00.000Z",
"doc_count": 4,
"total_jobs": {
"value": 7
}
}
]
}
},
{
"key_as_string": "2015-02-09T16:30:00.000Z",
"key": 1423499400000,
"doc_count": 1,
"times": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1423499400000,
"key_as_string": "2015-02-09T16:30:00.000Z",
"doc_count": 1,
"total_jobs": {
"value": 1
}
}
]
}
}
]
}
}
]
}
}
}
As you can see in the result we have two buckets 'a' and 'b', in the other example we have: 2015-02-09T09:00:00.000Z and 2015-02-09T16:00:00.000Z.
In the last example in the bucket 'a' I had to make the avgs (because it can not be calculated by ES):
{
"key": "a",
"doc_count": 5,
"end_data": {
"buckets": [
{
"key_as_string": "2015-02-09T09:00:00.000Z",
"key": 1423472400000,
"doc_count": 5,
"times": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1423473300000,
"key_as_string": "2015-02-09T09:15:00.000Z",
"doc_count": 3,
"total_jobs": {
"value": 3
}
},
{
"key": 1423472400000,
"key_as_string": "2015-02-09T09:00:00.000Z",
"doc_count": 2,
"total_jobs": {
"value": 2
}
}
]
}
}
]
}
}
I have: a->2015-02-09T09:00:00.000Z and the average of:
"buckets": [
{
"key": 1423473300000,
"key_as_string": "2015-02-09T09:15:00.000Z",
"doc_count": 3,
"total_jobs": {
"value": 3
}
},
{
"key": 1423472400000,
"key_as_string": "2015-02-09T09:00:00.000Z",
"doc_count": 2,
"total_jobs": {
"value": 2
}
}
]
3+2/2=2.5.
After the avgs I made the correct buckets by changing the order of the rows:
2015-02-09T09:00:00.000Z, a, 2.5.
However in the firs example (the current ES implementation):
2015-02-09T09:00:00.000Z, a, 5
You can see the different:
![screen shot 2015-03-03 at 14 27 48](https://cloud.githubusercontent.com/assets/981448/6463479/bca0959e-c1b2-11e4-939b-6fe434bf6563.png)
Because of the avg calculation I can not use Kibana. I think this will be useful not only us. I am sorry for the long message.
Thanks,
Zoltan |
We've been running into this exact same problem with summed averages. |
Hello,
I am using ElasticSearch 1.4.2 and Kibana 4. I am not able to get the correct plots, because the following aggregation does not exists in Elasticsearch and also Kibana.
Given T1, T2, T3 ... Tn time periods. Each period contains the following element: {T1, d1:v1, d2:v2...dn:vn} where d1:v1 keys value pairs. For example: {"09:00","Product":"a","Color":"red","quantity":1}, {"09:01","Product":"c","Color":"blue","quantity":5}.
I wanted to make a plot based on time(X axis) and quantity (Y axis). The current elasticsearch and kibana return the "wrong" values in my case.
The system automatically splits the T1..Tn time interval into bucket[1],bucket[2],...,bucket[n](bucket[1]>=T1 and bucket[n]<=T[n])
bucket[1] = SUM(T1) + SUM(T2)
bucket[2] = SUM(T1) + SUM(T2)+SUM(T3)
...
bucket[n] = SUM(Tn-1) + SUM(Tn)
This aggregation is wrong for me. I would like to have the following:
bucket[1] = AVG(SUM(T1) + SUM(T2))
bucket[2] = AVG(SUM(T1) + SUM(T2)+SUM(T3))
...
bucket[n] = AVG(SUM(Tn-1) + SUM(Tn))
For example:
The bucket length is 18 minute: we have 09:00,16:12, 16:30 buckets.
09:00 SUM(T1) + SUM(t2) = 1 + 1 + 1 + 1 + 1= 5
16:12 SUM(T3) = 1 + 2 + 2+ 2 = 7
16:20 SUM(T4) = 1
What I want to have:
09:00 = AVG(SUM(T1) + SUM(T2)) = (2 + 3)/2 = 2.5
16:12 AVG(SUM(T3)) = 7/1 = 7
16:20 AVG(SUM(T4)) = 1
I provide an example which can easily reproducible using kibana 4 and the latest elasticsearch:
-Insert the following data to elasticsearch(I used sense from marvel):
POST /example/transactions/_bulk
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time": "2015-02-09T09:00:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "b", "time": "2015-02-09T16:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "b", "time": "2015-02-09T16:30:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time":"2015-02-09T09:00:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time": "2015-02-09T09:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 2, "Product": "b", "time": "2015-02-09T16:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time":"2015-02-09T09:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 2, "Product": "b", "time": "2015-02-09T16:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 1, "Product": "a", "time": "2015-02-09T09:15:00Z"}
{ "index": {}}
{"Color": "red", "quantity": 2, "Product": "b", "time": "2015-02-09T16:15:00Z"}
The query made to the elasticsearch is:
I made a simple web page where I can make queries to ES and the result of the query is plotted. I changed the query as follow in order to make the "correct" plot.
I made the averages in JS and I got the plot what I want:
Is it possible to support this kind of aggregation?
Thank you,
Zoltan
The text was updated successfully, but these errors were encountered: