-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BitArray performance #2360
Comments
I managed to get the same performance as |
Thanks, Carlo! |
This still seems to be a problem: macro timeit(ex, n)
quote
t = Inf
for i = 1:$n
t = min(t, @elapsed $ex)
end
if print_output
n = $n
println("best of $n: ", t, " seconds")
end
end
end
function gt{T}(x::Matrix{T}, y::T)
m, n = size(x)
out = Array(Bool, m, n)
for i = 1:m, j = 1:n
out[i, j] = x[i, j] > y
end
out
end
x = rand(2_000_000, 16)
# BitArray version
@timeit x .> 0.5 10
# best of 10: best of 10: 1.965562105178833 seconds
# Array{Bool} version
@timeit gt(x, 0.5) 10
# best of 10: best of 10: 0.0854039192199707 seconds |
@pao Thanks! Wasn't aware of that package. I usually pull and build at least once per day. |
Definitely agree something is wrong here, using the same
|
Some ideas: Looks like in iterations |
Wouldn't bool arrays also call the abstract array methods as well? |
I playing with a uncheck
|
This last can be related to #1392 |
I don't fully understand BitArray yet. Tell me if I'm wrong with something here. Would be faster some kind of double iteration when looping over BitArray ? I imagine something like, maybe is going to be more cache friendly:
I read about some BitVector64/32 (a single Int) available on other languages: https://github.com/gingi/fastbit/blob/master/src/bitvector64.h They are only a single Int value and only can store 64/32 bits. But Is faster than a BitArray for this kind of cases as is declared here http://msdn.microsoft.com/en-us/library/system.collections.specialized.bitvector32.aspx Now Julia have immutable types, I imagine that something like BitVector64 can be relatively easy to add. Maybe ImmutableArrays.jl can be the right place for something like that? A BitVector of 256 bits [ 4 chunks ] can gain speed up using SIMD instructions JuliaGeometry/OldImmutableArrays.jl#6 Maybe defining and abstract type over BitArray can make easy the creation of this kind of type. People who is still learning what happens at bit level (like me) are going to be happy with that :) Maybe related with performance for BitArray... Have a lot of bits in only one number can give some extra information to compiler for branch prediction ? I can only imagine a perfect branch prediction for the extremes cases 0xffffffffffffffff and 0x0000000000000000 |
So, it took a while, but I just pushed another fix, this time for the element-wise functions (
With respect to this, the previous Please keep reopening in case more performance issues show up. |
Might be better to open new performance issues in the future? Reopening the same one repeatedly seems weird. |
Another idea: BitArray could have a custom immutable iteration state holding the mask and index. |
I took a brief look at doing this and it seemed like it wasn't really a performance win. |
Yeah, I realized later that |
How can I do that (holding the mask and index) ? The 2 bit DNA sequence are two BitArrays, and I was iterating and indexing both for example on convertion from |
I've been experimenting with "manually" removing the boundary checks on Arrays and I get some substantial performance gains. This could be done with regular Array methods too. Example:
performance on a 1 million long
In some more complex cases, it may be even better to use BTW Jeff, thanks for fixing #3240 so quickly ;) |
Oh and I forgot to mention: about the |
Improvements to scalar getindex and setindex here: f3f6fa2 |
Only 1.3x slowdown for bitarrays is excellent. |
Only 1.3x is great!!! :) I going to test it in the following days. |
Carlo, you've moved some mountains here :-). |
BitArrays are substantially slower for many operations than and
Array{Bool}
would be (see JuliaData/DataFrames.jl#186). It would be a big win if we could speed them up.The text was updated successfully, but these errors were encountered: