-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve take
kernel performance on primitive arrays, fix bad null index handling (#4404)
#4405
Conversation
@@ -128,6 +128,7 @@ impl BooleanBuffer { | |||
/// # Panics | |||
/// | |||
/// Panics if `i >= self.len()` | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is vital for the performance of take_bits
let output_slice = output_buffer.as_slice_mut(); | ||
|
||
let indices_has_nulls = indices.null_count() > 0; | ||
#[inline(never)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had issues with LLVM inlining and then not optimising correctly, this just forces it to not be stupid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise this may be worth a comment in the code as well
@@ -2155,4 +1977,19 @@ mod tests { | |||
UInt32Array::from(vec![9, 10, 11, 6, 7, 8, 3, 4, 5, 6, 7, 8, 0, 1, 2]) | |||
); | |||
} | |||
|
|||
#[test] | |||
fn test_take_null_indices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for #4404
@@ -140,6 +140,12 @@ impl<T: ArrowNativeType> From<Vec<T>> for ScalarBuffer<T> { | |||
} | |||
} | |||
|
|||
impl<T: ArrowNativeType> FromIterator<T> for ScalarBuffer<T> { | |||
fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Self { | |||
iter.into_iter().collect::<Vec<_>>().into() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important thing to note is that Vec: FromIterator
has a specialization for TrustedLen
iterators, such as those from slices. This allows us to not need Buffer::try_from_trusted_len_iter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this (very interesting) information as a comment inline?
} | ||
}), | ||
None => indices.values().iter().enumerate().for_each(|(i, index)| { | ||
if values.value(index.as_usize()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't there a faster method to create a bitmap rather than doing set_bit
in sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory BooleanBuffer::collect_bool should be faster, in this case it turned out to be slower for some reason - likely something to do with what LLVM is doing with the bound checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah collect_bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks @tustvold
take
kernel performance on primitive arrays, fix bad null index handling (#4404)
@@ -140,6 +140,12 @@ impl<T: ArrowNativeType> From<Vec<T>> for ScalarBuffer<T> { | |||
} | |||
} | |||
|
|||
impl<T: ArrowNativeType> FromIterator<T> for ScalarBuffer<T> { | |||
fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Self { | |||
iter.into_iter().collect::<Vec<_>>().into() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this (very interesting) information as a comment inline?
let output_slice = output_buffer.as_slice_mut(); | ||
|
||
let indices_has_nulls = indices.null_count() > 0; | ||
#[inline(never)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise this may be worth a comment in the code as well
values: Option<&NullBuffer>, | ||
indices: &PrimitiveArray<I>, | ||
) -> Option<NullBuffer> { | ||
match values.filter(|n| n.null_count() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is certainly neat to see nice Rust code like this and then know rustc
/ LLVM
did the right thing to make it fast
Which issue does this PR close?
Closes #4404
Rationale for this change
Fixes #4404 and improves performance significantly
Using the benchmarks in #4403
What changes are included in this PR?
Are there any user-facing changes?
Previously a negative index would return an error even when
TakeOptions::check_bound
wasfalse
. The code will now consistently panic on out of bounds errors, regardless of if that is the result of a wrapping conversion of a negative numbers. This yields a non-trivial speedup, as the additional branch seemed to cause LLVM some issues.