Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve zero checks in sparse and reduce #9325

Closed
wants to merge 1 commit into from
Closed

Conversation

IainNZ
Copy link
Member

@IainNZ IainNZ commented Dec 12, 2014

Just noticed some checks with 0 that don't play nice with custom types. There are probably more of these, but I'm not sure about a systematic way to find them.

This is only my second "real" commit I think, so not sure my test code is kosher style-wise.

@tkelman
Copy link
Contributor

tkelman commented Dec 12, 2014

I guess constructing a zero of a type should be comparable in performance to comparing to an integer 0 literal? And there might be some custom types where the former makes sense but the latter doesn't? I've had types where I wanted to put them in sparse matrices and I had to say comparison to zero (or any number) is always false in order for things to work.

The appveyor failure is due to #9189 until we can get a new nightly built, not this PR's fault.

@rfourquet
Copy link
Member

Comparing x!=0 seems much faster than x!=zero(x) for x::BigInt. Maybe put zero(x) outside of loops?

@ivarne
Copy link
Member

ivarne commented Dec 12, 2014

@JeffBezanson has required that we use a != 0 rather than a!=zero(a) in many instances. If proper promotion rules are in place, it should work exactly the same with the fallbacks for Number.

When would it be wrong to implement ==(::MyType, Integer) and ==(::Integer, ::MyType) for this to work?

As @rfourquet points out: Some types , like BigInt and BigFloat are much more efficient to compare to 0 than to construct a zero() instance for comparison.

@toivoh
Copy link
Contributor

toivoh commented Dec 12, 2014

For sparse arrays, I think that we should consider if we want them to work
with any element type, not just numbers, and what that would mean.

@ivarne
Copy link
Member

ivarne commented Dec 12, 2014

If sparse arrays should work with other types than immutable Numbers, it seems like it really need a customizable definition of zeroness. Neither x == 0 nor x == zero(x), (with zero meaning additive identity when generalized to matrices), seems like a perfect definition.

@StefanKarpinski
Copy link
Member

Hmm. This is a bit of a problem. Maybe we need iszero that is functionally equivalent to x == zero(x) but can be made much more efficient? For containers making x == 0 work but no other scalar comparisons seems like a bad idea.

@ViralBShah
Copy link
Member

As noted above, the reason it uses 0 and not zero(T) is because of what @JeffBezanson had once suggested. That is certainly problematic for non-zero types, but much of the sparse matrix code hasn't been tested with anything but Number.

@StefanKarpinski
Copy link
Member

The real issue is that you may want a zero matrix to cound and zeros(3,3) == 0 is a method error.

@toivoh
Copy link
Contributor

toivoh commented Dec 12, 2014

iszero seems reasonable. Should it default to always false for non-number types? What should a sparse matrix return when you ask for an absent entry? Or should that be an error, and you would have to ask if there is something present first? Or should it work like some kind of NullableArray?

@johnmyleswhite
Copy link
Member

I wouldn't be excited to treat sparsity as equivalent to missingness. In my mind, asking for an element of a container with eltype T should always return something of type T.

@toivoh
Copy link
Contributor

toivoh commented Dec 12, 2014

Maybe it makes little sense to have sparse arrays for non-numbers that
conform to the same interface add with numbers, but it would make sense to
reuse the underlying implementation for something similar.

Or maybe the generalisation is just to supply a default value for the
absent elements. I also seems to remember there has been some talk about
non-zero default values for numeric sparse matrices.

@jiahao
Copy link
Member

jiahao commented Dec 12, 2014

Note that neither zeros(3,3) == 0 nor zeros(3,3) == zero(Matrix) works, since the latter method is not defined. (zero(Matrix) doesn't know what dimensions it should have, since the Matrix type does not carry that information.) So this code would still fail to work for containers of containers, say SparseMatrixCSC{Matrix}.

As @ivarne said, the only such comparison that makes sense to write for matrices currently is M == zero(M).

The other pitfall one can run into is breaking transitivity. Imagine defining zeros(3,3) == 0 and zeros(4,4) == 0, and then erroneously concluding that zeros(3,3) == zeros(4,4).

@IainNZ
Copy link
Member Author

IainNZ commented Dec 12, 2014

Wow this was more controversial than I thought it'd be, given it was already done partially but not consistently

I guess constructing a zero of a type should be comparable in performance to comparing to an integer 0 literal?

It should be free for bitstypes, but I don't know what happens for other types (whether it allocates a fresh zero every time round the loop or it is identifies that it is an invariant)...

Comparing x!=0 seems much faster than x!=zero(x) for x::BigInt. Maybe put zero(x) outside of loops?

... But this suggests it is a problem. Putting it outside the loop is depressingly ugly, but still preferable to the old cold IMO

When would it be wrong to implement ==(::MyType, Integer) and ==(::Integer, ::MyType) for this to work?

When that comparison doesn't make sense, or already has another meaning. e.g. JuMP defines ==(AffExpr, Integer) to return a LinearConstraint.

For sparse arrays, I think that we should consider if we want them to work with any element type, not just numbers, and what that would mean.

I mean, with this PR we have (probable) complete support for custom types for anything that makes sense. I don't see much point in restricting it to <: Number because I can still make custom types that are Numbers (or that I claim are numbers) but don't have a well defined comparison with 0. If a Number must support comparison with 0, that shouts "we need interfaces!" because that is hella subtle.

Hmm. This is a bit of a problem. Maybe we need iszero that is functionally equivalent to x == zero(x) but can be made much more efficient? For containers making x == 0 work but no other scalar comparisons seems like a bad idea.

Maybe, but then you'll need to define Base.zero(::Type{MyType}), Base.zero(x::MyType), AND Base.iszero(x::MyType) - we are shifting the burden to the user and making the standard library lazier, which doesn't feel good to me at all.

As noted above, the reason it uses 0 and not zero(T) is because of what @JeffBezanson had once suggested. That is certainly problematic for non-zero types, but much of the sparse matrix code hasn't been tested with anything but Number.

It seems like it mostly works though - these things I fixed are mainly second-order stuff, and there is already chunks of the code that are designed to work with any type (mainly for type stability reasons though rather than a desire to be generic)

Or maybe the generalisation is just to supply a default value for the
absent elements. I also seems to remember there has been some talk about
non-zero default values for numeric sparse matrices.

I will always push back (as will others) to this idea as I've never seen it in the wild, and it adds even more conceptual overhead to the "structural zeros" situation we are already in.

Note that neither zeros(3,3) == 0 nor zeros(3,3) == zero(Matrix) works, since the latter method is not defined. (zero(Matrix) doesn't know what dimensions it should have, since the Matrix type does not carry that information.) So this code would still fail to work for containers of containers, say SparseMatrixCSC{Matrix}. As @ivarne said, the only such comparison that makes sense to write for matrices currently is M == zero(M).

Thats a good point! I could change the code and add that as test.

@ivarne
Copy link
Member

ivarne commented Dec 12, 2014

@IainNZ: Maybe, but then you'll need to define Base.zero(::Type{MyType}), Base.zero(x::MyType), AND Base.iszero(x::MyType) - we are shifting the burden to the user and making the standard library lazier, which doesn't feel good to me at all.

I can't really see how the standard library can be fully generic without having a well defined set of functions that have one single meaning each. iszero would have a fallback implementation for Number and AbstractArray, and will allow strange structures to implement enough of the "spare interface" without having to define comparison to Integer

@jiahao
Copy link
Member

jiahao commented Dec 12, 2014

The iszero question is yet another example of the general issue of whether X<:Any is by default a scalar unless X<:AbstractArray. Closely related:

@tkelman
Copy link
Contributor

tkelman commented Dec 12, 2014

that shouts "we need interfaces!"

Doing sparse matrices generically really demands this. What we're doing now isn't great, we get into unclear quagmires like this. That reminds me that I have an email from @jiahao to respond to.

There are different levels of how you want to deal with sparse matrices, structurally and numerically. At the structural stage, if something is given a value, any value, you want it stored explicitly. At the numerical stage, sometimes you want to compare the values against zero and remove them - but that doesn't always make sense.

@IainNZ
Copy link
Member Author

IainNZ commented Feb 26, 2015

I'm going to close this in the interests of tidiness, and delete the branch. Here are the tests that (IMO) should pass but don't, for posterities sake. Maybe we can revisit in the future.

# Test proper handling of zeros for user types
immutable SpTestVal
    value::Float64
end
(==)(x::SpTestVal,y::SpTestVal) = (x.value == y.value)
Base.zero(x::SpTestVal) = SpTestVal(0)
Base.zero(::Type{SpTestVal}) = SpTestVal(0)
A = sparse([1,2,3],[1,2,3],[SpTestVal(1),SpTestVal(0),SpTestVal(3)])
@test nnz(A) == 2  # zeros should be stripped by sparse
A[2,2] = SpTestVal(0)
@test nnz(A) == 2  # zeros should be stripped by setindex
A = SparseMatrixCSC(3,3,[1,2,3,4],[1,2,3],
        [SpTestVal(1.0),SpTestVal(0.0),SpTestVal(3.0)])
@test countnz(A) == 2
r,c,v = findnz(A)
@test length(r) == length(c) == length(v) == 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays test This change adds or pertains to unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants