-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5583: [Java] When the isSet of a NullableValueHolder is 0, the buffer field should not be used #4543
Conversation
…buffer field should not be used
Codecov Report
@@ Coverage Diff @@
## master #4543 +/- ##
==========================================
+ Coverage 88.55% 89.47% +0.92%
==========================================
Files 796 651 -145
Lines 103239 92152 -11087
Branches 1253 0 -1253
==========================================
- Hits 91425 82456 -8969
+ Misses 11569 9696 -1873
+ Partials 245 0 -245
Continue to review full report at Codecov.
|
@@ -242,10 +242,12 @@ public void set(int index, NullableVarBinaryHolder holder) { | |||
assert index >= 0; | |||
fillHoles(index); | |||
BitVectorHelper.setValidityBit(validityBuffer, index, holder.isSet); | |||
final int dataLength = holder.end - holder.start; | |||
final int dataLength = holder.isSet == 0 ? 0 : holder.end - holder.start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the arrow spec, when a variable width slot is null the length should be zero. What symptom were you seeing that this is trying to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see this is on the write path.
I think it is clearer if you handle this the other way:
if (holder.isSet) {
/// set the bytes
}
lastSet = index;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion.
I have revised the code. Please see if it looks better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is fixing the right thing, can you explain this some more?
@@ -242,10 +242,12 @@ public void set(int index, NullableVarBinaryHolder holder) { | |||
assert index >= 0; | |||
fillHoles(index); | |||
BitVectorHelper.setValidityBit(validityBuffer, index, holder.isSet); | |||
final int dataLength = holder.end - holder.start; | |||
final int dataLength = holder.isSet == 0 ? 0 : holder.end - holder.start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see this is on the write path.
I think it is clearer if you handle this the other way:
if (holder.isSet) {
/// set the bytes
}
lastSet = index;
Sorry, I should have made this clearer. In our scenario, we often set a null value in the VarCharVector with the following code snippet: NullableVarCharHolder holder = new NullableVarCharHolder(); Please note that in the code above, the holder.buffer is not set, so it is null. According to the VarCharVector#set method, it will set the bytes using holder.buffer even if holder.isSet equals 0. This will lead to an exception. |
This change looks OK to me but I'm not sure if there might be performance concerns with introducing a branch here? @pravindra? |
@emkornfield , thanks for the consideration. I think anyway, there will be performance penalty here. To remove the branch, the user needs to set NullableVarCharHolder#buffer to a valid ArrowBuf, even if the value is null. I do not think this makes much sense to the user. If the user is sure that the value is not null, to avoid the performance penalty, he/she should use the set method with VarCharHolder as the parameter. There is no branch in that set method. @pravindra, would you please give some comments? |
The change lgtm. I see the additional branch but I think it's useful if the holder.buffer in the null or if it has a non-zero length. @liyafan82 - isn't the same change required in setSafe() too ? |
@pravindra thanks a lot for your kind reminder. |
+1, LGTM |
For each variable-width vector, like the VarCharVector, it has a set method that uses a NullableValueHolder as the input parameter. When the isSet field is set to 0, it means the value to set is null, so the buffer field of the NullableValueHolder is invalid, and should not be used.
For example, the user may set a null value in the VarCharVector with the following code snippet:
NullableVarCharHolder holder = new NullableVarCharHolder();
holder.isSet = 0;
...
varCharVector.set(i, holder);
Please note that in the code above, the holder.buffer is not set, so it is null. According to the VarCharVector#set method, it will set the bytes using holder.buffer even if holder.isSet equals 0. This will lead to an exception.