-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sum aggregation not ignoring missing values #71582
Comments
Pinging @elastic/es-analytics-geo (Team:Analytics) |
I'm not sure how longs its been like this. I can look into it. If we decide this is the right behavior we should at least document it. Folks don't always expect it. |
Elasticsearch 1.7's |
It'd be super reasonable if the `sum` aggregation returned `null` when run on an unmapped field. Or if the query filtered out all results. But it doesn't. It returns `0`. Which is also reasonable! Its just different from what other reasonable systems like Postgresql do. This adds a note to with an anchor we can link to. Folks ask about this a fair bit. Closes elastic#71582
With that being the case, is there any way to have a Kibana line or bar type visualization aggregate data without plotting a point at 0 when all of the values being aggregated over the bucket are null? |
@nik9000 I agree that it doesn't make sense to change it now if it's been this way for so long. That said, per @kspurrier, can we have a setting (or value for the existing |
From the PR associated, Postgres is referenced which is a darn good example. Postgres can be reasonable in getting away with something like this because one can easily just use COALESCE. If there's an equivalent workaround (in the other direction) with ES / Kibana, that'd be awesome in the docs. Example: Returns:
|
I recall that being trickier to do than it sounded the last time folks talked about it. Could you link to an issue this is trying to solve?
There are kind of three cases where we do a different thing than postgresql with sum. Probably more, but three I can think of. But here goes:
|
Wait, that isn't right. That'll get you |
You'd need to run another aggregation to count docs that have missing values. It'd pretty terrible frankly. It looks like:
I suspect folks don't typically encounter this because they treat our fields like they are non-nullable and single valued. Do you bump into this sort of thing frequently? documents with uneven fields that encode meaning in the absence of that field? |
Thank you. That explanation definitely helps me, and I think it may give me a way around some of the issues that may more specifically just be limitations around how we're using Kibana out of the box for more general visualizations of data that aren't the classic use case. For us lately, the absence of data tends to be real and we don't want to see it plotted in Kibana visualizations when doing things like applying filters with Sum aggregations. I've recently been working with visualizing some agile metrics from tasking data where we want to see things like burn down with projections where we ideally want the line to start beyond x-axis item 0+ where we have data. If I handle all aggregations prior to Kibana, it isn't too tough to make the visualization look nice, but as soon as users ask about multiple filters using sums, I don't currently have a good answer for them and they dislike the line starting from 0 and ramping up to the true starting point every time. Something like the Jira visualization here isn't a bad example |
It seems like there's some interest in changing this to |
Since the SQL `SUM` function behaves as expected, elastic#45251 can be closed. As soon as elastic#71582 is resolved, we can go back to using the `sum` aggregation instead of `stats`.
Elasticsearch version (
bin/elasticsearch --version
):7.12.0
OS version (
uname -a
if on a Unix-like system):OSX
Description of the problem including expected versus actual behavior:
The sum aggregation by default should ignore documents that are missing values for the field being summed. Instead, it looks like it defaults to a
0
value. The min and max aggregations have the same default behavior, but correctly ignore missing fields and returnnull
Steps to reproduce:
Results showing correct handling of the missing
baz
field by min (i.e. anull
), but incorrect handling by sum (0
)Provide logs (if relevant):
The text was updated successfully, but these errors were encountered: