-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21552][SQL] Add DecimalType support to ArrowWriter. #18754
Conversation
Test build #80014 has finished for PR 18754 at commit
|
valueMutator.setIndexDefined(count) | ||
val decimal = input.getDecimal(ordinal, precision, scale) | ||
decimal.changePrecision(precision, scale) | ||
DecimalUtility.writeBigDecimalToArrowBuf(decimal.toJavaBigDecimal, valueVector.getBuffer, count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be easier in Arrow 0.7. There is now APIs for directly setting values with setSafe
for BigDecimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler Thanks, I'll update it to use setSafe
after upgrading Arrow to 0.7.
Btw, when I tested upgrading to 0.7 locally, ArrowConvertersSuite.string type conversion
came to fail. Do you have any ideas about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried running those tests for 0.7 yet, let me see if I can reproduce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the same error, let me dig into it a little
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an issue with StringWriter, I put the fix in #19284 please take a look, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've confirmed it fixes the failure. Thanks!
@@ -1617,7 +1617,7 @@ def to_arrow_type(dt): | |||
elif type(dt) == DoubleType: | |||
arrow_type = pa.float64() | |||
elif type(dt) == DecimalType: | |||
arrow_type = pa.decimal(dt.precision, dt.scale) | |||
arrow_type = pa.decimal128(dt.precision, dt.scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wesm @BryanCutler Is this a right way to define decimal type for Arrow?
I also wonder if there is a limit for precision and scale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the right way - it is now fixed at 128 bits internally. I believe the Arrow Java limit is the same as Spark 38/38, not sure if pyarrow is the same but I think so.
Test build #85293 has finished for PR 18754 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just had one minor question about a call to Decimal.changePrecision
|
||
override def setValue(input: SpecializedGetters, ordinal: Int): Unit = { | ||
val decimal = input.getDecimal(ordinal, precision, scale) | ||
decimal.changePrecision(precision, scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to call changePrecision
even though getDecimal
already takes the precision/scale as input - is it not guaranteed to return a decimal with that scale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it depends on the implementation of getDecimal
for now.
Btw, I guess we need to check the return value of changePrecision()
and set null
if the value is false
, which means overflow.
@@ -1617,7 +1617,7 @@ def to_arrow_type(dt): | |||
elif type(dt) == DoubleType: | |||
arrow_type = pa.float64() | |||
elif type(dt) == DecimalType: | |||
arrow_type = pa.decimal(dt.precision, dt.scale) | |||
arrow_type = pa.decimal128(dt.precision, dt.scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the right way - it is now fixed at 128 bits internally. I believe the Arrow Java limit is the same as Spark 38/38, not sure if pyarrow is the same but I think so.
Just took a quick look and looks fine to me too. |
Test build #85365 has finished for PR 18754 at commit
|
retest this please |
Test build #85371 has finished for PR 18754 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged to master. |
What changes were proposed in this pull request?
Decimal type is not yet supported in
ArrowWriter
.This is adding the decimal type support.
How was this patch tested?
Added a test to
ArrowConvertersSuite
.