-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8052][SQL] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble #6645
Conversation
Regression test? |
@@ -323,7 +323,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w | |||
private[this] def castToDecimal(from: DataType, target: DecimalType): Any => Any = from match { | |||
case StringType => | |||
buildCast[UTF8String](_, s => try { | |||
changePrecision(Decimal(s.toString.toDouble), target) | |||
changePrecision(Decimal(new java.math.BigDecimal(s.toString)), target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just import this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
I will update the test later. |
Test build #34191 has finished for PR 6645 at commit
|
Test build #34240 has finished for PR 6645 at commit
|
ping @srowen |
I'm not qualified to review this, but, I'm wondering why this query involves a conversion to a decimal type at all? The target type is bigint. while this may patch the issue, there are deeper implications to converting a bunch of stuff to BigDecimal instead of double, so I'm not sure this is the source of the problem? |
Also per the JIRA, is this even a Spark issue? |
In This problematic case here is when the given string representation is over the double range, |
Yeah, that's a problem then. The conversion to floating-point could be lossy as your test case indicates. I don't know that this is the fix though, since it has non-trivial side effects. Converting to a decimal type seems wrong. But is this specific to Hive on Spark? |
As we can't know whether the string represents a fractional number or not, casting to decimal seems being the most feasible approach? Because |
@viirya This change make sense to me. |
Thanks! Merging to master. |
…imal instead of using toDouble JIRA: https://issues.apache.org/jira/browse/SPARK-8052 Author: Liang-Chi Hsieh <[email protected]> Closes apache#6645 from viirya/cast_string_integraltype and squashes the following commits: e19c6a3 [Liang-Chi Hsieh] For comment. c3e472a [Liang-Chi Hsieh] Add test. 7ced9b0 [Liang-Chi Hsieh] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble.
…imal instead of using toDouble JIRA: https://issues.apache.org/jira/browse/SPARK-8052 Author: Liang-Chi Hsieh <[email protected]> Closes #6645 from viirya/cast_string_integraltype and squashes the following commits: e19c6a3 [Liang-Chi Hsieh] For comment. c3e472a [Liang-Chi Hsieh] Add test. 7ced9b0 [Liang-Chi Hsieh] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble. (cherry picked from commit ddec452) Signed-off-by: Reynold Xin <[email protected]>
JIRA: https://issues.apache.org/jira/browse/SPARK-8052