[SPARK-8359][SQL] Fix incorrect decimal precision after multiplication #6814

viirya · 2015-06-14T09:59:34Z

JIRA: https://issues.apache.org/jira/browse/SPARK-8359

SparkQA · 2015-06-14T11:49:18Z

Test build #34882 has finished for PR 6814 at commit 44c9348.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class JoinedRow extends InternalRow
- class JoinedRow2 extends InternalRow
- class JoinedRow3 extends InternalRow
- class JoinedRow4 extends InternalRow
- class JoinedRow5 extends InternalRow
- class JoinedRow6 extends InternalRow
- class BaseOrdering extends Ordering[InternalRow]

davies · 2015-06-18T18:51:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

@@ -261,7 +261,7 @@ final class Decimal extends Ordered[Decimal] with Serializable {

  def - (that: Decimal): Decimal = Decimal(toBigDecimal - that.toBigDecimal)

-  def * (that: Decimal): Decimal = Decimal(toBigDecimal * that.toBigDecimal)
+  def * (that: Decimal): Decimal = Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal))


This is kind of workaround, didn't fix the root cause.

The root cause should be in toBigDecimal, which doesn't carry on the information about precision to BigDecimal. Could you fix that?

cc @mateiz

toBigDecimal just creates scala BigDecimal with its information. I think it is correct. The problem looks like the scala BigDecimal produces ~~wrong~~ different result, compared with java BigDecimal when doing this multiplication.

To show that, we can create a scala BigDecimal. We find that it has correct precision as same as its underlying java BigDecimal:

scala> val d = BigDecimal(Long.MaxValue, 0) d: scala.math.BigDecimal = 9223372036854775807 scala> d.precision res16: Int = 19 scala> d.underlying.precision res17: Int = 19 scala> d.scale res18: Int = 0 scala> d.underlying.scale res19: Int = 0

When we multiply two scala BigDecimal carrying Long.MaxValue, we get the ~~wrong~~ result:

scala> val t = BigDecimal(Long.MaxValue, 0) * BigDecimal(Long.MaxValue, 0) t: scala.math.BigDecimal = 8.507059173023461584739690778423250E+37 scala> t.precision res20: Int = 34 scala> t.scale res21: Int = -4 scala> t.underlying.unscaledValue.toString res22: String = 8507059173023461584739690778423250

When we multiply two java BigDecimal carrying Long.MaxValue, the result is different:

scala> val j = d.underlying.multiply(d.underlying) j: java.math.BigDecimal = 85070591730234615847396907784232501249 scala> j.precision res23: Int = 38 scala> j.scale res24: Int = 0 scala> j.toString res25: String = 85070591730234615847396907784232501249

OK. I just found their results are different due to the MathContext. As @davies said, its precision is 34. Since we ask to print the unscaledValue, so the scale -4 is not applied on the value. So this should not be a bug. Maybe we don't need to do this modification as we should not directly use the unscaledValue?

However, as decimal type in Hive is based on Java's BigDecimal. As I tested, Hive can output the accurate result of select CAST(9223372036854775807 as DECIMAL(38,0)) * CAST(9223372036854775807 as DECIMAL(38,0));

Can Hive support higher precision than 38? I think should match the behavior in Hive.

It can't. But 85070591730234615847396907784232501249 can be handled by precision 38. As the MathContext has only precision 34, we will have scale -4 and get different result.

I think this can be fixed by in toBigDecimal:

BigDecimal(longVal, _scale, new MathContext(precision, RoundingMode.HALF_EVEN))

Also, in order to have the same behavior as other datebase or Hive, we should throw an exception if precision is higher than 38, this could be another PR.

mateiz · 2015-06-18T19:06:58Z

Hive doesn't actually support BigDecimals with precision above 38. Why did you want to add these? It may be okay to add them, but I think the current code works fine for decimals with lower precision.

davies · 2015-06-18T19:30:07Z

@mateiz We use DECIMAL128 as the MathContext to create BigDecimal, which has precision as 34, it's lower than 38 in Hive.

scala> val d = Decimal(2L<<60, 38, 0)
d: org.apache.spark.sql.types.Decimal = 2305843009213693952
scala> d * d
res0: org.apache.spark.sql.types.Decimal = 5.316911983139663491615228241121378E+36
scala> (d * d).toJavaBigDecimal.unscaledValue
res5: java.math.BigInteger = 5316911983139663491615228241121378  // this is wrong

mateiz · 2015-06-18T19:35:47Z

Ah, okay. We should make sure we do exactly the same thing as Hive -- it's possible that Hive also uses this context internally.

SparkQA · 2015-06-19T18:10:41Z

Test build #35300 has finished for PR 6814 at commit a43bfc3.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-06-19T18:21:51Z

The style error is from #3347.

andrewor14 · 2015-06-19T18:55:08Z

retest this please

SparkQA · 2015-06-19T20:52:51Z

Test build #35308 has finished for PR 6814 at commit a43bfc3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-06-19T21:03:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

    } else {
-      BigDecimal(longVal, _scale)
+      BigDecimal(longVal, _scale)(new MathContext(Decimal.MAX_PRECISION, RoundingMode.HALF_EVEN))


I think we should use the _precesion in Decimal object.

_precesion is the current precision of this decimal. But here we want to assign a maximum precision that affects in later decimal operation like multiplication.

SparkQA · 2015-06-20T16:59:21Z

Test build #35362 has finished for PR 6814 at commit 071a757.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-06-23T06:11:39Z

LGTM, merging this into master, thanks!

Fix incorrect decimal precision after multiplication.

44c9348

davies reviewed Jun 18, 2015
View reviewed changes

viirya added 2 commits June 20, 2015 01:59

Merge remote-tracking branch 'upstream/master' into fix_decimal2

72eeb3f

Add MathContext with maximum supported precision.

a43bfc3

davies reviewed Jun 19, 2015
View reviewed changes

viirya added 2 commits June 20, 2015 23:00

Merge remote-tracking branch 'upstream/master' into fix_decimal2

df217d4

Remove maximum precision and use MathContext.UNLIMITED.

071a757

asfgit closed this in 31bd306 Jun 23, 2015

JoshRosen mentioned this pull request Jul 3, 2015

[SPARK-8802] [WIP] [SQL] Decimal.apply(BigDecimal).toBigDecimal may throw NumberFormatException #7198

Closed

viirya deleted the fix_decimal2 branch December 27, 2023 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-8359][SQL] Fix incorrect decimal precision after multiplication #6814

[SPARK-8359][SQL] Fix incorrect decimal precision after multiplication #6814

viirya commented Jun 14, 2015

SparkQA commented Jun 14, 2015

davies Jun 18, 2015

viirya Jun 19, 2015

viirya Jun 19, 2015

viirya Jun 19, 2015

davies Jun 19, 2015

viirya Jun 19, 2015

davies Jun 19, 2015

mateiz commented Jun 18, 2015

davies commented Jun 18, 2015

mateiz commented Jun 18, 2015

SparkQA commented Jun 19, 2015

viirya commented Jun 19, 2015

andrewor14 commented Jun 19, 2015

SparkQA commented Jun 19, 2015

davies Jun 19, 2015

viirya Jun 20, 2015

SparkQA commented Jun 20, 2015

davies commented Jun 23, 2015

[SPARK-8359][SQL] Fix incorrect decimal precision after multiplication #6814

[SPARK-8359][SQL] Fix incorrect decimal precision after multiplication #6814

Conversation

viirya commented Jun 14, 2015

SparkQA commented Jun 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mateiz commented Jun 18, 2015

davies commented Jun 18, 2015

mateiz commented Jun 18, 2015

SparkQA commented Jun 19, 2015

viirya commented Jun 19, 2015

andrewor14 commented Jun 19, 2015

SparkQA commented Jun 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 20, 2015

davies commented Jun 23, 2015