-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
range_histogram and date_range_histogram aggregations to help analyse "session" duration type data #23182
Comments
Discussed in FixItFriday and it was generally considered that this would be good to implement but we would need to use Binary DocValues for the columnar data for range fields. We expect that the number of range fields should be small so we don't think this would affect compression too much. In keeping with the other field types that support doc values, we would enable doc_values by default on range fields. |
I also have this problem and it was resolved by adding a painless script. I followed the complete query along with the script:
I do not know if it is a good solution because it has influenced the generation time of the graph in Kibana. Example Document: |
Stalled until #24823 is merged |
@colings86 just as a note: #24823 was merged. |
@colings86 any news regarding this? |
@gmoskovicz this is no longer stalled but it is not currently being worked on either. IT should be possible to implement now, though it will need a ValuesSource implementation to be created for Range fields to be used with the new aggregation. |
Sounds good. So until this isn't implemented the only way to aggregate this fields is probably to reindex into a regular date field (for example) and create multiple values for the field or use a script. |
Yes, for now the way to do this is to have separate date fields for the start and end date (or start date and duration) and use a script to calculate the histogram data (as in the discuss issue linked in the description of this issue) |
cc @elastic/es-search-aggs |
The fact that range fields don't work in aggregations should probably be documented as a warning on either the description of the range types, in the aggregation documentation, or both. Without documentation to the contrary a user would naturally expect a date histogram to work on their data that has been mapped as a date range. |
Organizing all the agg range issues into a central ticket, closing in favor of #34644 |
Now that (since 5.2) we support range field types I am wondering if we can use them to help users with the concurrent sessions problem (e.g. https://discuss.elastic.co/t/display-concurrency-in-data-on-kibana/26006/3)
The problem detailed in the post above is that the user is trying to determine, for each 30 second period, how many concurrent phone calls are occurring. This problem can be generalised to wanting to analysis how many concurrent 'sessions' are occurring over fixed intervals of time (or potentially some other unit for this axis). By 'session' here I mean something that has a start time and an end time, this could be phone calls, web sessions, calendar meetings/appointments.
The aggregation would work by adding each collected document to all the histogram buckets which fall into the range given by the value of the range field. Currently the range field does not write doc_values when indexing so we will either need to write doc_values or have a different way to retrieve the field values in a columnar way.
The following should be interpreted as thinking out loud and may or may not be useful:
For the non-date applications of this, one (possibly contrived) use-case could be in aggregated metric data. If I was taking temperature data for every weather station in the UK, I might have a document per day that would probably contain the mean and median temperature for the day but also minimum and maximum temperature for the day which I could store in a range field containing the range of temperatures reported that day. When I come to analyse the data one useful thing to see would be how many days the temperature was between -10C to 0C, 0C to 10C, 10C to 20C etc. I could use the
range_histogram
aggregation to get the answer to this question as it would tell me for each 10C interval how many days the temperature was recorded in the interval at some point in the day. Analysing the max and min temperature independently would only tell me the days when the maximum or minimum was in each interval which answers a slightly different question.The text was updated successfully, but these errors were encountered: