-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17995: [C++] Fix json decimals not being rescaled based on the explicit schema #14380
ARROW-17995: [C++] Fix json decimals not being rescaled based on the explicit schema #14380
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @stiga-huang . I think we should be stricter and not let precision loss happen silently.
cpp/src/arrow/json/converter_test.cc
Outdated
{"" : "02.0000000000"} | ||
{"" : "30.0000000000"} | ||
{"" : "30.01"} | ||
{"" : "30.0000000000123"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want truncation to happen silently or would we rather get an error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, our CSV reader would emit an error instead of dropping some digits like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think users would like different behaviors. E.g. in Impala, we'd like Arrow to continue reading the remaining rows instead of throwing an error and stop.
I think we can add an option to choose the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that the error handling in other converters are also stop and return errors, e.g.
arrow/cpp/src/arrow/json/converter.cc
Lines 132 to 134 in d67a210
if (!arrow::internal::ParseValue(numeric_type_, repr.data(), repr.size(), &value)) { | |
return GenericConversionError(*out_type_, ", couldn't parse:", repr); | |
} |
So I tend to the same behavior now. I think we can add the option for configurable error handling behavior in a new JIRA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Feel free to ping once that is done. Hopefully we can get fixed in time for 10.0.0.
907c69f
to
eef9f58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks a lot @stiga-huang
Thanks for your review! @pitrou |
…explicit schema (apache#14380) arrow::json::DecimalConverter::Convert() currently read the decimal values using the parsed precision and scale. This produces wrong results if the parsed scale doesn't match the output scale (specified by explicit schema). More details on how to reproduce the issue are in the JIRA description. This patch fixes json::DecimalConverter::Convert() to rescale the values based on the output scale. Unit tests are added as well. Lead-authored-by: stiga-huang <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Benchmark runs are scheduled for baseline = aeba616 and contender = 289e0c9. 289e0c9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…explicit schema (apache#14380) arrow::json::DecimalConverter::Convert() currently read the decimal values using the parsed precision and scale. This produces wrong results if the parsed scale doesn't match the output scale (specified by explicit schema). More details on how to reproduce the issue are in the JIRA description. This patch fixes json::DecimalConverter::Convert() to rescale the values based on the output scale. Unit tests are added as well. Lead-authored-by: stiga-huang <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
arrow::json::DecimalConverter::Convert() currently read the decimal values using the parsed precision and scale. This produces wrong results if the parsed scale doesn't match the output scale (specified by explicit schema).
More details on how to reproduce the issue are in the JIRA description. This patch fixes json::DecimalConverter::Convert() to rescale the values based on the output scale. Unit tests are added as well.