-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XContentParser shouldn't lose data from floating-point numbers. #46531
base: main
Are you sure you want to change the base?
Conversation
Currently the JSON and YAML parsers parse all floating-point numbers as doubles. This is problematic with numbers that have a higher accuracy than what doubles can handle. This changes proposes to parse as a BigDecimal instead when the decimal number in the json is not equal to the string representation of any double. Closes elastic#46261
Pinging @elastic/es-core-infra |
Quick note: this change doesn't mean that Elasticsearch won't alter the source when applying filtering. For instance leading zeroes would be removed. However it wouldn't alter the accuracy of floating-point numbers. I did my best to prevent this change from hurting performance too much, however there might be cases that are significantly affected. For this reason I'm considering only merging into master (8.0). |
@@ -105,7 +105,7 @@ And the following may be the response: | |||
] | |||
}, | |||
"avg_monthly_sales": { | |||
"value": 328.33333333333333 | |||
"value": 328.3333333333333 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to reviewers: this only passed before because parsing was lossy
Thank you for working on this! Looks good IMHO :) |
I've discussed this change with other people at Elastic, here are some thoughts:
|
I ran microbenchmark on the two functions
I tried adding one optimization in isDoubleSlow, which improves case for scientific notation:
|
@evangoetzman Thanks for running these benchmarks. I reviewed usage of My reading of the benchmark is that it's unlikely that these operations get dramatically slow if we remain on the fast path. For instance a Your idea for the optimization is good, but we need to check the scale too? If the exponent is extremely high, then it would parse to an infinity as a double? |
@jpountz Right, good catch about the optimization. Yes, we would need to check scale() as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Sooooo ... can we get this merged/fixed please? |
Hey guys hope you are fine in this tricky time! Would it be possible to schedule some fix for this? Seems others are struggling too but don't really use bug search first ;-) https://discuss.elastic.co/t/bug-big-decimal-value-losed-precision/253105 |
Hi @jpountz, I've created a changelog YAML for you. |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Can we make this happen? Our customers need full precision now and we would be basically forced to do our own implementation of this :/ I'm guessing the probability of a quick fix is low so how would you maybe suggest to work around it? |
Currently the JSON and YAML parsers parse all floating-point numbers as doubles.
This is problematic with numbers that have a higher accuracy than what doubles
can handle. This changes proposes to parse as a BigDecimal instead when the
decimal number in the json is not equal to the string representation of any
double.
Closes #46261