Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21552][SQL] Add DecimalType support to ArrowWriter. #18754

Closed
wants to merge 6 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Jul 28, 2017

What changes were proposed in this pull request?

Decimal type is not yet supported in ArrowWriter.
This is adding the decimal type support.

How was this patch tested?

Added a test to ArrowConvertersSuite.

@SparkQA
Copy link

SparkQA commented Jul 28, 2017

Test build #80014 has finished for PR 18754 at commit 9e60762.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

valueMutator.setIndexDefined(count)
val decimal = input.getDecimal(ordinal, precision, scale)
decimal.changePrecision(precision, scale)
DecimalUtility.writeBigDecimalToArrowBuf(decimal.toJavaBigDecimal, valueVector.getBuffer, count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be easier in Arrow 0.7. There is now APIs for directly setting values with setSafe for BigDecimal.

Copy link
Member Author

@ueshin ueshin Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler Thanks, I'll update it to use setSafe after upgrading Arrow to 0.7.

Btw, when I tested upgrading to 0.7 locally, ArrowConvertersSuite.string type conversion came to fail. Do you have any ideas about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried running those tests for 0.7 yet, let me see if I can reproduce

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the same error, let me dig into it a little

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an issue with StringWriter, I put the fix in #19284 please take a look, thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed it fixes the failure. Thanks!

@@ -1617,7 +1617,7 @@ def to_arrow_type(dt):
elif type(dt) == DoubleType:
arrow_type = pa.float64()
elif type(dt) == DecimalType:
arrow_type = pa.decimal(dt.precision, dt.scale)
arrow_type = pa.decimal128(dt.precision, dt.scale)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm @BryanCutler Is this a right way to define decimal type for Arrow?
I also wonder if there is a limit for precision and scale?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the right way - it is now fixed at 128 bits internally. I believe the Arrow Java limit is the same as Spark 38/38, not sure if pyarrow is the same but I think so.

@SparkQA
Copy link

SparkQA commented Dec 22, 2017

Test build #85293 has finished for PR 18754 at commit e29f833.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin ueshin changed the title [WIP][SPARK-21552][SQL] Add DecimalType support to ArrowWriter. [SPARK-21552][SQL] Add DecimalType support to ArrowWriter. Dec 22, 2017
Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just had one minor question about a call to Decimal.changePrecision


override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
val decimal = input.getDecimal(ordinal, precision, scale)
decimal.changePrecision(precision, scale)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to call changePrecision even though getDecimal already takes the precision/scale as input - is it not guaranteed to return a decimal with that scale?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it depends on the implementation of getDecimal for now.
Btw, I guess we need to check the return value of changePrecision() and set null if the value is false, which means overflow.

@@ -1617,7 +1617,7 @@ def to_arrow_type(dt):
elif type(dt) == DoubleType:
arrow_type = pa.float64()
elif type(dt) == DecimalType:
arrow_type = pa.decimal(dt.precision, dt.scale)
arrow_type = pa.decimal128(dt.precision, dt.scale)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the right way - it is now fixed at 128 bits internally. I believe the Arrow Java limit is the same as Spark 38/38, not sure if pyarrow is the same but I think so.

@HyukjinKwon
Copy link
Member

Just took a quick look and looks fine to me too.

@SparkQA
Copy link

SparkQA commented Dec 25, 2017

Test build #85365 has finished for PR 18754 at commit 025f298.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 25, 2017

Test build #85371 has finished for PR 18754 at commit 025f298.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in eb386be Dec 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants