Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9861: [Java] Support big-endian in DecimalVector #8056

Closed
wants to merge 6 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Aug 26, 2020

This PR fixes failures in DecimalVectorTest on a big-endian platform

@kiszk kiszk changed the title ARROW-9861: [java] Support big-endian in DecimalVector ARROW-9861: [Java] Support big-endian in DecimalVector Aug 26, 2020
@github-actions
Copy link

@emkornfield
Copy link
Contributor

@kiszk you'll probably need to make some fixes for Decimal256 as well, I think.

@BryanCutler you mentioned you could devote some time to reviewing Big Endian changes? Would you mind taking a look through this one and @kiszk other Java changes?

@BryanCutler
Copy link
Member

Sure, I can take a look. It might a day or two before I can though @kiszk .

@kiszk
Copy link
Member Author

kiszk commented Oct 28, 2020

@emkornfield Sure, I will work for supporting Decimal256.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple minor things

PlatformDependent.putLong(addressOfValue + Long.BYTES, padValue);
if (LITTLE_ENDIAN) {
PlatformDependent.putLong(addressOfValue, value);
PlatformDependent.putLong(addressOfValue + Long.BYTES, padValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit odd that it's writing long values at 2 different indices, but I guess that was here before. Do you know what it's trying to do here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess it's to write a long value to be used as BigDecimal? So it writes the long in 8-bytes and then pads the remaining 8-bytes. It would be nice if the doc was a little better, but no big deal..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for DecialVector that is a fixed-width (16-byte) vector. This routine extends the signed bit to a new long. I will write a document in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, this PR will support to write it to a 256-bit entry for the future.

} else {
throw new IllegalArgumentException(
"Invalid decimal value length. Valid length in [1 - 16], got " + length);
if (length <= TYPE_WIDTH) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return if length == TYPE_WIDTH?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add if (length == TYPE_WIDTH) return before line 244, no data is copied from value to outAddress. I will add a comment copy data from value to outAddress after line 244.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see above it is already set during the swap. I guess there is no harm in calling PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH - length, pad) if length == TYPE_WIDTH

}
throw new IllegalArgumentException(
"Invalid decimal value length. Valid length in [1 - 16], got " + length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in a Preconditions.checkArgument, but we are not trying to change things in this PR so don't need to do that here

}

// Write LE data
byte [] padByes = bytes[0] < 0 ? minus_one : zeroes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could move padBytes out of the if statements

Assert.assertEquals(expected, actual);
}

long [] longValues = new long[] {Long.MIN_VALUE, 0 , Long.MAX_VALUE};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need 2 loops here, just move the Integer.MAX_VALUE and MIN_VALUE here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will drop lines 49-57.

@kiszk
Copy link
Member Author

kiszk commented Nov 2, 2020

I will update the Decimal256Vector class late today or tomorrow.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor nits

} else {
throw new IllegalArgumentException(
"Invalid decimal value length. Valid length in [1 - 16], got " + length);
if (length <= TYPE_WIDTH) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see above it is already set during the swap. I guess there is no harm in calling PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH - length, pad) if length == TYPE_WIDTH

PlatformDependent.putLong(addressOfValue, value);
public static void writeLongToArrowBuf(long value, ArrowBuf bytebuf, int index, int byteWidth) {
if (byteWidth != 16 && byteWidth != 32) {
throw new UnsupportedOperationException("DeciimalUtility.writeLongToArrowBuf() currently supports " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: DeciimalUtility -> DecimalUtility

@@ -526,7 +526,7 @@ public void testCopyFixedSizedListOfDecimalsVector() {
to.addOrGetVector(FieldType.nullable(new ArrowType.Decimal(3, 0, 128)));

DecimalHolder holder = new DecimalHolder();
holder.buffer = allocator.buffer(DecimalUtility.DECIMAL_BYTE_LENGTH);
holder.buffer = allocator.buffer(16);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use DecimalVector.TYPE_WIDTH here?

@@ -310,7 +310,7 @@ public void listDecimalType() {
listVector.allocateNew();
UnionListWriter listWriter = new UnionListWriter(listVector);
DecimalHolder holder = new DecimalHolder();
holder.buffer = allocator.buffer(DecimalUtility.DECIMAL_BYTE_LENGTH);
holder.buffer = allocator.buffer(16);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looks like a flaky test, will try again.

@BryanCutler
Copy link
Member

The test error with Java JNI looks unrelated and seems to be an env issue with ORC, I'll go ahead with merging this.

@BryanCutler
Copy link
Member

merged to master, thanks @kiszk !

@kiszk
Copy link
Member Author

kiszk commented Nov 5, 2020

@BryanCutler Thank you. One comment.
According to the document, benchmarking is necessary before merging features (I think that the test code does not affect the performance)

Benchmarks for performance critical parts of the code to demonstrate no regression.

I am working for benchmark bot for Java here. It would be good to merge new features after this bot will be available.

cc @emkornfield

@BryanCutler
Copy link
Member

I did not see anything that looks like it would affect performance here, but I agree we should get some benchmarks going to be sure. I will look at you other PR next.

GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This PR fixes failures in DecimalVectorTest on a big-endian platform

Closes apache#8056 from kiszk/ARROW-9861

Authored-by: Kazuaki Ishizaki <[email protected]>
Signed-off-by: Bryan Cutler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants