-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unary and Binary functions trait #726
Conversation
3f2adda
to
2079385
Compare
vortex-array/benches/iter.rs
Outdated
@@ -58,6 +61,30 @@ fn vortex_iter_flat(c: &mut Criterion) { | |||
}); | |||
} | |||
|
|||
fn vortex_binary_add(c: &mut Criterion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as of dc09067d
, the results are (can be reproduced by running cargo bench --bench iter unary_add
):
vortex_unary_add time: [662.83 µs 665.39 µs 668.35 µs]
arrow_unary_add time: [199.29 µs 203.01 µs 207.45 µs]
so Arrow is faster, but its a much smaller difference than the previous iterators. I tried some approaches that used BytesMut
and building everything through a buffer but that was much slower (1-2ms), still trying to restructure things so this isn't the final performance number.
Couple of things:
|
Handling known fixed sizes gives us a nice boost, I also split the benchmarks to a new file. Seems like just removing all the bound checks yields a pretty nice boost.
|
da5c130
to
c63d246
Compare
nice, you got rid of vec push, that was the one other thing that I noticed initially that could have been better |
pushed a version with a macro-based dispatch for the different iterators. |
Interesting I thought not boxing the iterator would be an improvement but maybe not… if there’s no difference in perf the simpler version might be better |
Agreed. I'll try and find some additional optimizations here, but if I can't find anything significant I would rather maintain a large |
I guess we already have one indirection when loading the data to decompress so another indirection to the iterator doesn’t fundamentally change the performance profile, ie not having completely stack allocated iterator is all the difference. Let me know if I can help but doubt we will find anything here |
no reason
|
@@ -314,8 +316,141 @@ impl Array { | |||
} | |||
} | |||
|
|||
// This is an arbitrary value, tried a few seems like this is a better value than smaller ones, | |||
// I assume there's some hardware dependency here but this seems to be good enough | |||
const CHUNK_SIZE: usize = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW given fastlanes is 1024 this will likely be the best value overall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note about code to remove
Introducing specialized traits for unary (like map) and binary (like &) operations on arrays. This will allow us to have specialized and ergonomic implementations for cases when we know the encoding of an array.
Benchmarks
As of 08bbd8d running on my M3 Max macbook