Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean operators don't have specialized versions for sparse matrices #13024

Closed
sbromberger opened this issue Sep 9, 2015 · 18 comments
Closed
Assignees
Labels
performance Must go faster sparse Sparse arrays

Comments

@sbromberger
Copy link
Contributor

Ref https://groups.google.com/d/msg/julia-users/bo6YMXzWPdA/jKEWz_rfAQAJ

The following code runs in ~15 seconds:

_column(a::AbstractSparseArray, i::Integer) = sub(a.rowval, a.colptr[i]:a.colptr[i+1]-1)

# material nonimplication
(p::Bool, q::Bool) = p & !q 

function (a::SparseMatrixCSC, b::SparseMatrixCSC) 
    (m,n) = size(a) 
    resultmx = spzeros(Bool,m,n) 
    for c = 1:n 
        for r in _column(a,n) 
            # info("row $r, col $c") 
            resultmx[r,c] = (a[r,c], b[r,c]) 
        end 
    end 
    return resultmx 
end

a = sprandbool(1000000,1000000,0.0001);
b = sprandbool(1000000,1000000,0.0001);

julia> @time z = a  b;
 15.272250 seconds (1.10 M allocations: 73.870 MB)

Replacing @time z = a ⊅ b with @time z = a & b results in a > 10 minute execution (I stopped it after 10 minutes or so).

Interestingly enough, substituting the ⊅-equivalent resultmx[r,c] = !b[r,c] within the function itself also causes poor performance relative to the above code.

@sbromberger sbromberger changed the title possible performance issues with sparse matrices and booleans possible performance issues with sparse matrices and boolean operators Sep 9, 2015
@tkelman tkelman added performance Must go faster sparse Sparse arrays labels Sep 9, 2015
@IainNZ
Copy link
Member

IainNZ commented Sep 9, 2015

EDIT: disregard, issue was updated

I tried

function other(a::SparseMatrixCSC, b::SparseMatrixCSC) 
    (m,n) = size(a) 
    resultmx = spzeros(Bool,m,n) 
    for c = 1:n 
        for r in _column(a,n) 
            # info("row $r, col $c") 
            resultmx[r,c] = a[r,c] & !b[r,c]
        end 
    end 
    return resultmx 
end

and got

julia> @time z = a ⊅ b;
 12.544197 seconds (1.12 M allocations: 74.449 MB)

julia> @time z = other(a,b);
 12.134954 seconds (1.01 M allocations: 69.430 MB, 0.27% gc time)

@sbromberger
Copy link
Contributor Author

This is on

Version 0.4.0-pre+7107 (2015-08-31 16:51 UTC)
Commit 4e44a1c (8 days old master)

@sbromberger
Copy link
Contributor Author

@IainNZ ah - that seems to imply that the problem with & is when it's used with sparse matrices, not with bools.

@IainNZ
Copy link
Member

IainNZ commented Sep 9, 2015

julia> @which (&)(sprand(5,5,0.1), sprand(5,5,0.1))
&{S,T}(A::AbstractArray{S,N}, B::AbstractArray{T,N}) at arraymath.jl:96

@IainNZ IainNZ changed the title possible performance issues with sparse matrices and boolean operators Boolean operators don't have specialized versions for sparse matrices Sep 9, 2015
@sbromberger
Copy link
Contributor Author

so it looks like there's no method on & optimized for sparse matrices, and so it's iterating over all r,c pairs. That's... silly. Should I make a PR?

@tkelman
Copy link
Contributor

tkelman commented Sep 9, 2015

Same as in #12118, can probably just add & and | to this loop

for op in (+, -, min, max)
, add tests and done. Since & is zero-preserving (result is zero if either input is zero) slightly fancier things could be done for it, but we don't have distinct scalar & vs broadcasting .& like we do for most other operators so probably not worth dealing with that right now.

edit: actually I think eltype_plus should be changed to something different for max, min, & and |

@jiahao
Copy link
Member

jiahao commented Sep 9, 2015

so it looks like there's no method on & optimized for sparse matrices

Yes, this is true for much of the sparse matrix functionality, simply because no one has had the time or use case to warrant implementing them. PR welcomed although it probably will go into 0.5.

It might be as simple as extending this loop to include &, |, $, etc.

@tkelman
Copy link
Contributor

tkelman commented Sep 9, 2015

There's actually a bug there,

julia> A = sprandbool(5,5,0.3)
5x5 sparse matrix with 7 Bool entries:
        [1, 1]  =  true
        [2, 1]  =  true
        [2, 2]  =  true
        [5, 2]  =  true
        [1, 4]  =  true
        [1, 5]  =  true
        [2, 5]  =  true

julia> B = sprandbool(5,5,0.3)
5x5 sparse matrix with 5 Bool entries:
        [3, 1]  =  true
        [1, 3]  =  true
        [2, 3]  =  true
        [5, 4]  =  true
        [5, 5]  =  true

julia> max(A, B)
5x5 sparse matrix with 12 Int64 entries:
        [1, 1]  =  1
        [2, 1]  =  1
        [3, 1]  =  1
        [2, 2]  =  1
        [5, 2]  =  1
        [1, 3]  =  1
        [2, 3]  =  1
        [1, 4]  =  1
        [5, 4]  =  1
        [1, 5]  =  1
        [2, 5]  =  1
        [5, 5]  =  1

May as well fix both issues now. If anyone were relying on the present behavior they probably would have complained and asked to change it already.

@sbromberger
Copy link
Contributor Author

@tkelman - Sorry - it's late. Why is that a bug? (is it because it's being converted to Int, when max(x::Bool, y::Bool) returns a boolean?)

@simonster
Copy link
Member

I think because max(true, true) is true, not 1.

@sbromberger
Copy link
Contributor Author

I opened up #13029 to track the performance of ! on an element separately, since I don't think it's related.

@ViralBShah
Copy link
Member

My preference is to have these things go to 0.5. However, it is not a problem to get them into 0.4.x if someone really needs it (perhaps @sbromberger does), since this will not break the 0.4 API.

@sbromberger
Copy link
Contributor Author

I'm happy to work on a PR. Obviously it won't go into -release. I've been looking at the underlying code and it will take time for me to understand it fully. :)

Edit to add: I do really need this functionality, actually. It puzzles me that nobody's brought this issue up before: are sparse matrices not widely used?

@ViralBShah
Copy link
Member

Sparse matrices are increasingly getting widely used. :-)

@IainNZ
Copy link
Member

IainNZ commented Sep 9, 2015

Sparse matrices of Float64s with Int64 indices get used extensively, but anything else, not as often. Hence #12984 and this

@tkelman
Copy link
Contributor

tkelman commented Sep 9, 2015

I'll open a PR that fixes this today. Won't be hard, and no reason to wait on fixing this since the current behavior is slow and incorrect.

@tkelman tkelman self-assigned this Sep 9, 2015
tkelman added a commit that referenced this issue Sep 9, 2015
also fix eltype promotion in sparse max and min
@sbromberger
Copy link
Contributor Author

Thanks. Glad to see it uncovered a few things despite the bugs in my code that prompted the issue.

jakebolewski added a commit that referenced this issue Sep 10, 2015
@sbromberger
Copy link
Contributor Author

Thank you (again) very much - this results in orders of magnitude more efficient random graph generation.

tkelman added a commit that referenced this issue Sep 11, 2015
also fix eltype promotion in sparse max and min

Also add $ because Jiahao asked nicely

(cherry picked from commit cdb6496)
ref #13024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

6 participants