-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8359][SQL] Fix incorrect decimal precision after multiplication #6814
Conversation
Test build #34882 has finished for PR 6814 at commit
|
@@ -261,7 +261,7 @@ final class Decimal extends Ordered[Decimal] with Serializable { | |||
|
|||
def - (that: Decimal): Decimal = Decimal(toBigDecimal - that.toBigDecimal) | |||
|
|||
def * (that: Decimal): Decimal = Decimal(toBigDecimal * that.toBigDecimal) | |||
def * (that: Decimal): Decimal = Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of workaround, didn't fix the root cause.
The root cause should be in toBigDecimal
, which doesn't carry on the information about precision to BigDecimal. Could you fix that?
cc @mateiz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toBigDecimal
just creates scala BigDecimal
with its information. I think it is correct. The problem looks like the scala BigDecimal
produces wrong different result, compared with java BigDecimal
when doing this multiplication.
To show that, we can create a scala BigDecimal
. We find that it has correct precision as same as its underlying java BigDecimal
:
scala> val d = BigDecimal(Long.MaxValue, 0)
d: scala.math.BigDecimal = 9223372036854775807
scala> d.precision
res16: Int = 19
scala> d.underlying.precision
res17: Int = 19
scala> d.scale
res18: Int = 0
scala> d.underlying.scale
res19: Int = 0
When we multiply two scala BigDecimal
carrying Long.MaxValue
, we get the wrong result:
scala> val t = BigDecimal(Long.MaxValue, 0) * BigDecimal(Long.MaxValue, 0)
t: scala.math.BigDecimal = 8.507059173023461584739690778423250E+37
scala> t.precision
res20: Int = 34
scala> t.scale
res21: Int = -4
scala> t.underlying.unscaledValue.toString
res22: String = 8507059173023461584739690778423250
When we multiply two java BigDecimal
carrying Long.MaxValue
, the result is different:
scala> val j = d.underlying.multiply(d.underlying)
j: java.math.BigDecimal = 85070591730234615847396907784232501249
scala> j.precision
res23: Int = 38
scala> j.scale
res24: Int = 0
scala> j.toString
res25: String = 85070591730234615847396907784232501249
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I just found their results are different due to the MathContext
. As @davies said, its precision is 34. Since we ask to print the unscaledValue
, so the scale -4 is not applied on the value. So this should not be a bug. Maybe we don't need to do this modification as we should not directly use the unscaledValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, as decimal type in Hive is based on Java's BigDecimal. As I tested, Hive can output the accurate result of select CAST(9223372036854775807 as DECIMAL(38,0)) * CAST(9223372036854775807 as DECIMAL(38,0));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can Hive support higher precision than 38? I think should match the behavior in Hive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't. But 85070591730234615847396907784232501249
can be handled by precision 38. As the MathContext
has only precision 34, we will have scale -4 and get different result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be fixed by in toBigDecimal
:
BigDecimal(longVal, _scale, new MathContext(precision, RoundingMode.HALF_EVEN))
Also, in order to have the same behavior as other datebase or Hive, we should throw an exception if precision is higher than 38, this could be another PR.
Hive doesn't actually support BigDecimals with precision above 38. Why did you want to add these? It may be okay to add them, but I think the current code works fine for decimals with lower precision. |
@mateiz We use DECIMAL128 as the MathContext to create BigDecimal, which has precision as 34, it's lower than 38 in Hive.
|
Ah, okay. We should make sure we do exactly the same thing as Hive -- it's possible that Hive also uses this context internally. |
Test build #35300 has finished for PR 6814 at commit
|
The style error is from #3347. |
retest this please |
Test build #35308 has finished for PR 6814 at commit
|
} else { | ||
BigDecimal(longVal, _scale) | ||
BigDecimal(longVal, _scale)(new MathContext(Decimal.MAX_PRECISION, RoundingMode.HALF_EVEN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the _precesion
in Decimal object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_precesion
is the current precision of this decimal. But here we want to assign a maximum precision that affects in later decimal operation like multiplication.
Test build #35362 has finished for PR 6814 at commit
|
LGTM, merging this into master, thanks! |
JIRA: https://issues.apache.org/jira/browse/SPARK-8359