-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically map all numerics to floats by default? #16018
Comments
I think you'd make half the people happy, and the other half unhappy. It's so easy to add a dynamic mapping rule that allows you to add all numeric fields as |
I think this should be done because the defaults should try to favor usability over performance (or storage in this case) |
This might become more necessary as we are considering rejecting numbers that have a decimal part on integer types: #25861. |
+1, this is very trappy. |
@elastic/es-search-aggs |
We chatted about this in the search/aggs meeting a little while ago (forgot to update, sorry). We decided that the breaking change + potential confusion around floating point error made this less than ideal. In our experience, floating point errors are difficult to understand for even relatively savvy users. Especially if we were to map to floats instead of doubles, it was feared many users could be bitten by rounding without understanding what was happening and start seeing strange search results because of it. Ranges can look very strange when fp rounding errors happen. And while a We felt it would be at least as tricky as truncation errors, so breaking for a different set of hard-to-understand semantics wasn't worth it. I just realized we didn't discuss the decision made in #25861 to remove |
@polyfractal would you be able to clarify how removing |
@jtibshirani I think it was related to Adrien's earlier comment in #16018 (comment) E.g. if But I'm not positive, that's just my guess based on Adrien's comment. :) |
Thanks for the additional context! To me, even with the |
Agreed! To be sure we are on the same page, the behavior you are describing is our current default behavior.
This argument convinces me that we should not map all numbers as floats by default, so I'll close this issue. |
Elasticsearch assumes that if a number contains a dot, then it should be mapped as a floating-point number (double in 2.x and float in master) and otherwise as an long. But this is quite trappy as it means that we expect that floating point numbers are consistently serialized with a dot (see eg. https://twitter.com/bitemyapp/status/687415657651154944 or #15961).
Instead, we could map all numerics to floats by default (but you could still use dynamic templates to override it if you want). This would have two drawbacks:
I ran some simulations to see how worse it would be to store integers as floating point numbers, the good news being that since most bits will be zeros on the right side of the mantissa, gcd compression will help save some bits:
I'm not sold yet about what we should do but thought we should have this discussion. Again, note that it would only apply to dynamically mapped fields, integers that are mapped as integers would remain as efficient as they are today.
The text was updated successfully, but these errors were encountered: