-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid variance computed in extended_stats aggregation #37303
Comments
Pinging @elastic/es-analytics-geo |
Ah yeah, I think you're probably right. Would you like to work on a PR to fix this? If not, I'm marking this as adopt-me and we'll get it fixed when we have a few spare cycles. Thanks for the bug report! |
I would like to work on a PR to fix this. |
@vishnugt 👍 feel free! |
As mentioned in a previous issue, this is due to catastrophic cancellation, the sum of square method of calculating variances in floating point environment is not appropriate, instead the Welford algorithm has to be used to get proper variance values https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm |
Elasticsearch version (
bin/elasticsearch --version
): 6.1.4Plugins installed: n/a
JVM version (
java -version
): 1.8.191OS version (
uname -a
if on a Unix-like system): Centos 7.5Description of the problem including expected versus actual behavior:
In some conditions, the variance computed in an extended_stats aggregation is computed as a negative number that should never append.
The variance is a sum of positive numbers, hence cannot be negative. What makes it negative here is the way it is computed (probably as the difference of two positive numbers here: "sum_of_squares / count" and "avg * avg"). Due to the non-infinite precision of floating point numbers, both numbers are 'almost' the same...
Proposed solution:
At least prevent negative values to appear in the variance: add a "Math.max(0.0, ...)" to the existing computation formula.
Steps to reproduce:
Using the attached zip file, do the following :
What does it do?
Provide logs (if relevant):
File used to reproduce:
to-reproduce-it.zip
The text was updated successfully, but these errors were encountered: