-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bucket script aggregation returns invalid value for missing docs #27377
Comments
@shaharmor Did you try setting a missing value on the date_histogram aggregation? Github issues are meant for bugs and feature requests and this sounds more like a question that should be asked on the forum first. |
I'm not sure how a missing value on the date_histogram would help, as there are no docs to add the missing field to... Anyway, I created a forum thread as well: https://discuss.elastic.co/t/bucket-script-fails-when-some-docs-are-missing/107592/1 |
This is actually a bug. It seems that the bucket_script aggregation is not executed on buckets that have a doc count of zero. The following recreation script highlights this:
The response from the search request is:
In the response above the buckets with keys 2017-01-01T00:01:00.000Z to 2017-01-01T00:04:00.000Z should have a sub-aggregation bucket whose value should be 100 (the same as the cumulative_sum aggregation since the script just outputs that value). |
Hi @colings86 Could you please provide some instructions to fix this? Thanks in advance! |
I just look a look at the code around this and I think its going to need some thought around how we can fix this bug without adversely affecting other pipeline aggregations. The problem arises when we resolve the value for the buckets_path for the empty bucket in if (Double.isInfinite(value) || Double.isNaN(value) || (bucket.getDocCount() == 0 && !isDocCountProperty)) {
switch (gapPolicy) {
case INSERT_ZEROS:
return 0.0;
case SKIP:
default:
return Double.NaN;
}
} else {
return value;
} That if statement return In
We could potentially change the bucket_script aggregation so it executes on all buckets regardless of the retrieved value but we also pass the doc count of the current bucket to the script so it can determine what value to output but I'm worried that this might make this aggregation too unwieldly for the user. I'll mark this issue as |
Discussed in FIxItFriday and we decided that this is requires more in depth discussion within the Search and Aggregations team |
Discussed with the search and aggs team and we decided that we should pass the actual value to the script (instead of converting it to NaN or 0.0 using the gap policy and also pass the doc count of the bucket to the script so the user writing the script can know if the bucket is empty and decide how to interpret the value |
We should ensure that the solution to this issue also fixes the issue in #27544 |
@colings86 Is this something that is being actively worked on? Any ETA? |
@shaharmor its not being actively worked on right now but since we now have a way forward on this it is now available to be picked up and worked on. There is no ETA for this currently. |
@elastic/es-search-aggs |
For anyone stumbling upon the same issue with I will use the example from the docs:
The issue: the script will select only non-empty buckets. If you try something like
At least in 6.3, this works for |
Is this being fixed in 7? |
Hi @shaharmor, I'm afraid there is still no progress on this issue. It's still on our radar, but we don't currently have anyone working on it at this time. We'll update this ticket when there's movement. |
Adds a new keep_values gap policy that works like skip, except if the metric calculated on an empty bucket provides a non-null non-NaN value, this value is used for the bucket. Fixes elastic#27377
Adds a new keep_values gap policy that works like skip, except if the metric calculated on an empty bucket provides a non-null non-NaN value, this value is used for the bucket. Fixes #27377 Co-authored-by: Mark Tozzi <[email protected]>
Adds a new keep_values gap policy that works like skip, except if the metric calculated on an empty bucket provides a non-null non-NaN value, this value is used for the bucket. Fixes #27377 Co-authored-by: Mark Tozzi <[email protected]>
Elasticsearch version (
bin/elasticsearch --version
): 6.0.0-rc1JVM version (
java -version
):java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
OS version (
uname -a
if on a Unix-like system):Linux elasticsearch-data-hot-003 4.11.0-1013-azure #13-Ubuntu SMP Mon Oct 2 17:59:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
When doing a
bucket script
aggregation that depends on a cumulative sum aggregation of another sum aggregation, if the sum aggregation returns null values (Because there are no docs in that time interval bucket), the bucket script aggregation will also return null, instead of relying on the cumulative sum value that was gathered so far.Steps to reproduce:
Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.
bucket
aggregation does not show the value that its supposed to (Thecumulative_bytes
value)The text was updated successfully, but these errors were encountered: