-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-14795: [C++] Fix issue on replace with mask for null values #11759
ARROW-14795: [C++] Fix issue on replace with mask for null values #11759
Conversation
Thanks for the pull request. The issue should be ARROW-14795 not ARROW-14748 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me. I requested few changes, once is ready lets see what CI shows.
const uint8_t* GetInBitmap() const { | ||
return in_bitmap; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this method, this is a struct so properties are public by default. I recommend to check this reference: https://www.educative.io/collection/page/10370001/5111386763952128/6533026123087872
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add this because the CopyBitmap is a template with typename for this struct and can't acces to the in_bitmap property.
void SetBitTrue(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const { | ||
BitUtil::SetBitTo(out_bitmap, out_offset, true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to change method name to SetBitToTrue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And just for reference, BitUtil calls this SetBit
:
arrow/cpp/src/arrow/util/bit_util.h
Line 309 in eb2fa2d
static inline void SetBit(uint8_t* bits, int64_t i) { bits[i / 8] |= kBitmask[i % 8]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solved
void SetBitTrue(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const { | ||
BitUtil::SetBitTo(out_bitmap, out_offset, true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use same method name here as suggested before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solved
const uint8_t* GetInBitmap() const { | ||
return nullptr; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe instead of creating this weird method, you can create a property like this:
const static uint8_t* in_bitmap = nullptr;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solved
this->Assert(ReplaceWithMask, this->array("[1, null, 1]"), | ||
this->mask("[false, true, false]"), | ||
this->array("[7]"), | ||
this->array("[1, 7, 1]")); | ||
this->Assert(ReplaceWithMask, this->array("[1, null, 1, 7]"), | ||
this->mask("[false, true, false, true]"), | ||
this->array("[7, 20]"), | ||
this->array("[1, 7, 1, 20]")); | ||
this->Assert(ReplaceWithMask, this->array("[1, 2, 3, 4]"), | ||
this->mask("[false, true, false, true]"), | ||
this->array("[null, null]"), | ||
this->array("[1, null, 3, null]")); | ||
this->Assert(ReplaceWithMask, this->array("[null, 2, 3, 4]"), | ||
this->mask("[true, true, false, true]"), | ||
this->array("[1, null, null]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to move all these new tests to a new test function to clarify the purpose of this PR.
TYPED_TEST(TestReplaceNumeric, ReplaceWithMaskForNullValuesAndMaskEnabled
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solved
Please @lidavidm enable to run the CI on this PR. |
It looks like there's some lint failures; there's documentation on how to run the linter locally if that helps: https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci |
5dd5dc9
to
b059ba0
Compare
@AlvinJ15 Please change your title to "ARROW-14795: [C++] Fix issue on replace with mask for null values" so that the correct JIRA issue gets picked up. |
|
The following test case fails:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fix can be made simpler. This is the gist of the issue:
diff --git a/cpp/src/arrow/compute/kernels/vector_replace.cc b/cpp/src/arrow/compute/kernels/vector_replace.cc
index 7f204b529..1ed50919d 100644
--- a/cpp/src/arrow/compute/kernels/vector_replace.cc
+++ b/cpp/src/arrow/compute/kernels/vector_replace.cc
@@ -81,7 +81,7 @@ struct CopyArrayBitmap {
void SetBit(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const {
BitUtil::SetBitTo(out_bitmap, out_offset,
- BitUtil::GetBit(in_bitmap, in_offset + offset));
+ in_bitmap ? BitUtil::GetBit(in_bitmap, in_offset + offset) : true);
}
};
@@ -133,10 +133,8 @@ void ReplaceWithArrayMaskImpl(const ArrayData& array, const ArrayData& mask,
BitUtil::GetBit(mask_bitmap, write_offset + mask.offset + i))) {
Functor::CopyData(*array.type, out_values, out_offset + write_offset + i,
replacements, replacements_offset, /*length=*/1);
- if (replacements_bitmap) {
- copy_bitmap.SetBit(out_bitmap, out_offset + write_offset + i,
- replacements_offset);
- }
+ copy_bitmap.SetBit(out_bitmap, out_offset + write_offset + i,
+ replacements_offset);
replacements_offset++;
}
}
That is, in one case, we should always update the output bitmap.
We can try these additional test cases:
+ this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
+ this->mask("[true, true, true]"), this->array("[7, 7, 7]"),
+ this->array("[7, 7, 7]"));
+ this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
+ this->mask("[true, true, true]"), this->array("[null, null, null]"),
+ this->array("[null, null, null]"));
+ this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
+ this->mask("[false, true, false]"), this->scalar("null"),
+ this->array("[1, null, 1]"));
+ this->Assert(ReplaceWithMask, this->array("[1, null, 1]"),
+ this->mask("[true, true, true]"), this->scalar("null"),
+ this->array("[null, null, null]"));
b059ba0
to
3ebc2e2
Compare
I updated this PR addressing latest feedback comments. I think this PR is ready for merging. Let me know if anything needs to be taken care of from my end. cc @lidavidm |
3ebc2e2
to
242646c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually an unspotted bug on line 169 that I think would replace most changes here:
const bool replacements_bitmap = replacements.is_array()
? replacements.array()->MayHaveNulls()
: !replacements.scalar()->is_valid;
should instead read
const bool replacements_bitmap = replacements.is_array()
? replacements.array()->MayHaveNulls()
: true;
@@ -96,6 +105,10 @@ struct CopyScalarBitmap { | |||
void SetBit(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const { | |||
BitUtil::SetBitTo(out_bitmap, out_offset, is_valid); | |||
} | |||
|
|||
void SetBitToTrue(uint8_t* out_bitmap, int64_t out_offset, int64_t offset) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nor is this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidavidm all comments has been solved
242646c
to
15d4b11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. Just a couple small things.
15d4b11
to
04b7a83
Compare
@lidavidm all comments solved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for working through all the updates.
Benchmark runs are scheduled for baseline = 236362a and contender = 72f7fbe. 72f7fbe is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
ARROW-14795: [C++] Fix issue on vector replace with mask for null values, which weren't updated on null bitmaps Closes apache#11759 from AlvinJ15/achunga/14795-fix_vector_replace_with_mask Authored-by: alvinj15 <Alvin258461@> Signed-off-by: David Li <[email protected]>
ARROW-14795: [C++] Fix issue on vector replace with mask for null values, which weren't updated on null bitmaps