-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9861: [Java] Support big-endian in DecimalVector #8056
Conversation
@kiszk you'll probably need to make some fixes for Decimal256 as well, I think. @BryanCutler you mentioned you could devote some time to reviewing Big Endian changes? Would you mind taking a look through this one and @kiszk other Java changes? |
Sure, I can take a look. It might a day or two before I can though @kiszk . |
@emkornfield Sure, I will work for supporting Decimal256. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a couple minor things
PlatformDependent.putLong(addressOfValue + Long.BYTES, padValue); | ||
if (LITTLE_ENDIAN) { | ||
PlatformDependent.putLong(addressOfValue, value); | ||
PlatformDependent.putLong(addressOfValue + Long.BYTES, padValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit odd that it's writing long values at 2 different indices, but I guess that was here before. Do you know what it's trying to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess it's to write a long value to be used as BigDecimal
? So it writes the long in 8-bytes and then pads the remaining 8-bytes. It would be nice if the doc was a little better, but no big deal..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for DecialVector
that is a fixed-width (16-byte) vector. This routine extends the signed bit to a new long. I will write a document in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, this PR will support to write it to a 256-bit entry for the future.
} else { | ||
throw new IllegalArgumentException( | ||
"Invalid decimal value length. Valid length in [1 - 16], got " + length); | ||
if (length <= TYPE_WIDTH) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not return
if length == TYPE_WIDTH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add if (length == TYPE_WIDTH) return
before line 244, no data is copied from value
to outAddress
. I will add a comment copy data from value to outAddress
after line 244.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I see above it is already set during the swap. I guess there is no harm in calling PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH - length, pad)
if length == TYPE_WIDTH
} | ||
throw new IllegalArgumentException( | ||
"Invalid decimal value length. Valid length in [1 - 16], got " + length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be in a Preconditions.checkArgument
, but we are not trying to change things in this PR so don't need to do that here
} | ||
|
||
// Write LE data | ||
byte [] padByes = bytes[0] < 0 ? minus_one : zeroes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could move padBytes
out of the if
statements
Assert.assertEquals(expected, actual); | ||
} | ||
|
||
long [] longValues = new long[] {Long.MIN_VALUE, 0 , Long.MAX_VALUE}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need 2 loops here, just move the Integer.MAX_VALUE
and MIN_VALUE
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will drop lines 49-57.
I will update the Decimal256Vector class late today or tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few minor nits
} else { | ||
throw new IllegalArgumentException( | ||
"Invalid decimal value length. Valid length in [1 - 16], got " + length); | ||
if (length <= TYPE_WIDTH) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I see above it is already set during the swap. I guess there is no harm in calling PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH - length, pad)
if length == TYPE_WIDTH
PlatformDependent.putLong(addressOfValue, value); | ||
public static void writeLongToArrowBuf(long value, ArrowBuf bytebuf, int index, int byteWidth) { | ||
if (byteWidth != 16 && byteWidth != 32) { | ||
throw new UnsupportedOperationException("DeciimalUtility.writeLongToArrowBuf() currently supports " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: DeciimalUtility
-> DecimalUtility
@@ -526,7 +526,7 @@ public void testCopyFixedSizedListOfDecimalsVector() { | |||
to.addOrGetVector(FieldType.nullable(new ArrowType.Decimal(3, 0, 128))); | |||
|
|||
DecimalHolder holder = new DecimalHolder(); | |||
holder.buffer = allocator.buffer(DecimalUtility.DECIMAL_BYTE_LENGTH); | |||
holder.buffer = allocator.buffer(16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use DecimalVector.TYPE_WIDTH
here?
@@ -310,7 +310,7 @@ public void listDecimalType() { | |||
listVector.allocateNew(); | |||
UnionListWriter listWriter = new UnionListWriter(listVector); | |||
DecimalHolder holder = new DecimalHolder(); | |||
holder.buffer = allocator.buffer(DecimalUtility.DECIMAL_BYTE_LENGTH); | |||
holder.buffer = allocator.buffer(16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Looks like a flaky test, will try again.
The test error with Java JNI looks unrelated and seems to be an env issue with ORC, I'll go ahead with merging this. |
merged to master, thanks @kiszk ! |
@BryanCutler Thank you. One comment.
I am working for benchmark bot for Java here. It would be good to merge new features after this bot will be available. cc @emkornfield |
I did not see anything that looks like it would affect performance here, but I agree we should get some benchmarks going to be sure. I will look at you other PR next. |
This PR fixes failures in DecimalVectorTest on a big-endian platform Closes apache#8056 from kiszk/ARROW-9861 Authored-by: Kazuaki Ishizaki <[email protected]> Signed-off-by: Bryan Cutler <[email protected]>
This PR fixes failures in DecimalVectorTest on a big-endian platform