Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8359][SQL] Fix incorrect decimal precision after multiplication #6814

Closed
wants to merge 5 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jun 14, 2015

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34882 has finished for PR 6814 at commit 44c9348.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JoinedRow extends InternalRow
    • class JoinedRow2 extends InternalRow
    • class JoinedRow3 extends InternalRow
    • class JoinedRow4 extends InternalRow
    • class JoinedRow5 extends InternalRow
    • class JoinedRow6 extends InternalRow
    • class BaseOrdering extends Ordering[InternalRow]

@@ -261,7 +261,7 @@ final class Decimal extends Ordered[Decimal] with Serializable {

def - (that: Decimal): Decimal = Decimal(toBigDecimal - that.toBigDecimal)

def * (that: Decimal): Decimal = Decimal(toBigDecimal * that.toBigDecimal)
def * (that: Decimal): Decimal = Decimal(toJavaBigDecimal.multiply(that.toJavaBigDecimal))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of workaround, didn't fix the root cause.

The root cause should be in toBigDecimal, which doesn't carry on the information about precision to BigDecimal. Could you fix that?

cc @mateiz

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toBigDecimal just creates scala BigDecimal with its information. I think it is correct. The problem looks like the scala BigDecimal produces wrong different result, compared with java BigDecimal when doing this multiplication.

To show that, we can create a scala BigDecimal. We find that it has correct precision as same as its underlying java BigDecimal:

 scala> val d = BigDecimal(Long.MaxValue, 0)
 d: scala.math.BigDecimal = 9223372036854775807

 scala> d.precision
 res16: Int = 19

 scala> d.underlying.precision
 res17: Int = 19

 scala> d.scale
 res18: Int = 0

 scala> d.underlying.scale
 res19: Int = 0

When we multiply two scala BigDecimal carrying Long.MaxValue, we get the wrong result:

 scala> val t = BigDecimal(Long.MaxValue, 0) * BigDecimal(Long.MaxValue, 0)
 t: scala.math.BigDecimal = 8.507059173023461584739690778423250E+37

 scala> t.precision
 res20: Int = 34

 scala> t.scale
 res21: Int = -4

 scala> t.underlying.unscaledValue.toString
 res22: String = 8507059173023461584739690778423250

When we multiply two java BigDecimal carrying Long.MaxValue, the result is different:

 scala> val j = d.underlying.multiply(d.underlying)
 j: java.math.BigDecimal = 85070591730234615847396907784232501249

 scala> j.precision
 res23: Int = 38

 scala> j.scale
 res24: Int = 0

 scala> j.toString
 res25: String = 85070591730234615847396907784232501249

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I just found their results are different due to the MathContext. As @davies said, its precision is 34. Since we ask to print the unscaledValue, so the scale -4 is not applied on the value. So this should not be a bug. Maybe we don't need to do this modification as we should not directly use the unscaledValue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, as decimal type in Hive is based on Java's BigDecimal. As I tested, Hive can output the accurate result of select CAST(9223372036854775807 as DECIMAL(38,0)) * CAST(9223372036854775807 as DECIMAL(38,0));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Hive support higher precision than 38? I think should match the behavior in Hive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't. But 85070591730234615847396907784232501249 can be handled by precision 38. As the MathContext has only precision 34, we will have scale -4 and get different result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be fixed by in toBigDecimal:

BigDecimal(longVal, _scale, new MathContext(precision, RoundingMode.HALF_EVEN))

Also, in order to have the same behavior as other datebase or Hive, we should throw an exception if precision is higher than 38, this could be another PR.

@mateiz
Copy link
Contributor

mateiz commented Jun 18, 2015

Hive doesn't actually support BigDecimals with precision above 38. Why did you want to add these? It may be okay to add them, but I think the current code works fine for decimals with lower precision.

@davies
Copy link
Contributor

davies commented Jun 18, 2015

@mateiz We use DECIMAL128 as the MathContext to create BigDecimal, which has precision as 34, it's lower than 38 in Hive.

scala> val d = Decimal(2L<<60, 38, 0)
d: org.apache.spark.sql.types.Decimal = 2305843009213693952
scala> d * d
res0: org.apache.spark.sql.types.Decimal = 5.316911983139663491615228241121378E+36
scala> (d * d).toJavaBigDecimal.unscaledValue
res5: java.math.BigInteger = 5316911983139663491615228241121378  // this is wrong

@mateiz
Copy link
Contributor

mateiz commented Jun 18, 2015

Ah, okay. We should make sure we do exactly the same thing as Hive -- it's possible that Hive also uses this context internally.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35300 has finished for PR 6814 at commit a43bfc3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 19, 2015

The style error is from #3347.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35308 has finished for PR 6814 at commit a43bfc3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
BigDecimal(longVal, _scale)
BigDecimal(longVal, _scale)(new MathContext(Decimal.MAX_PRECISION, RoundingMode.HALF_EVEN))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the _precesion in Decimal object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_precesion is the current precision of this decimal. But here we want to assign a maximum precision that affects in later decimal operation like multiplication.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35362 has finished for PR 6814 at commit 071a757.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Jun 23, 2015

LGTM, merging this into master, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants