Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Refactor LinearAlgebra to use isa instead of method dispatch to avoid ambiguity hell #33072

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tkf
Copy link
Member

@tkf tkf commented Aug 26, 2019

I think there are too many method definitions in LinearAlgebra just to resolve ambiguities. One simple solution might be to use "type-level" if branches using isa. That is to say, remove definitions like this

-(*)(x::AdjointAbsVec,   A::AbstractMatrix) = (A'*x')'
-(*)(x::TransposeAbsVec, A::AbstractMatrix) = transpose(transpose(A)*transpose(x))

and inline them into more generic definition:

 function (*)(A::AbstractMatrix, B::AbstractMatrix)
+    if A isa AdjointAbsVec
+        return (A'*x')'
+    elseif A isa TransposeAbsVec
+        return transpose(transpose(A)*transpose(x))
+    end
     TS = promote_op(matprod, eltype(A), eltype(B))
     mul!(similar(B, TS, (size(A,1), size(B,2))), A, B)
 end

An obvious benefit is that this lets us remove many (7 by above change; c7fadec) method definitions that existed just for ambiguity resolutions. Also, I think it would make it easier for us to understand when/how certain combination of matrices is computed in a certain way.

A disadvantage may be that it would make @edit A * B useless when the given combination is handled inside the if block. But I think this effect is small now that we have debuggers.

Admittedly, this is not a super elegant solution. If anyone has a more elegant plan to resolve it, I'll be happy to wait for it. But I'm also guessing that if branches might not be so bad. I think it'll likely be better than the very fragile state of the current method definitions. What do you think?

@KristofferC
Copy link
Member

A disadvantage may be that it would make @edit A * B useless when the given combination is handled inside the if block. But I think this effect is small now that we have debuggers.

Agreed.

@timholy
Copy link
Member

timholy commented Aug 26, 2019

This special-cases Base types at the expense of extendability in packages. The way @mbauman and I handled this for getindex! and AbstractArray types was to construct a layered method hierarchy of non-exported functions. See https://docs.julialang.org/en/latest/manual/methods/#man-method-design-ambiguities-1 for some design ideas.

@tkf
Copy link
Member Author

tkf commented Aug 26, 2019

Could you elaborate when it would be a problem with *? In fact, I can't even see a problem if only *(::AbstractMatrix, ::AbstractMatrix) (and similar with vectors and numbers) is defined in LinearAlgebra; i.e., doing something like this:

module LinearAlgebra

function * end

# definitions of "local" * here

Base.:*(A::AbstractMatrix, B::AbstractMatrix) = A * B

end

In current (say Julia 1.2) LinearAlgebra you can only define method that are more specific than the ones defined in LinearAlgebra and Base (without type piracy). So, above extreme version of LinearAlgebra seems to have the same property. Or am I missing something?

@mbauman
Copy link
Member

mbauman commented Aug 26, 2019

I'm not sure the work on getindex is directly relevant here as that's a case where there's a clear hierarchy in the "most important" argument. That is, the way we avoid ambiguities is by demanding that the first argument be specialized before specializing the remainder — the first argument effectively creates its own independent method table that is totally available for specializations. * is significantly harder in that there is no clear "most important" argument. Even worse, there's no "canonical" endpoint to which we can re-compute the arguments.

I think this approach of privileging transposes to auto-unwrap in the abstract/generic implementation makes a lot of sense. Sure, it's not fully extensible (in that others cannot insert similarly privileged types into the abstract implementation), but it addresses one of the biggest pain points here.

One of the best ways to handle this sort of symmetric dispatch problem in general is through the sort of promotion/conversion numeric arithmetic uses. The challenge with arrays is twofold:

  • conversions are expensive.
  • we can implement fully generic behaviors, but they are also expensive.

If we're not careful we might pay the cost for the conversions and still end up with a generic slow implementation. We don't know if the promoted method exists at all, and even if it does, we don't know if it's advantageous to do the conversions.

In other words, the ambiguities are real, but we may want to paper over them.

@mbauman mbauman added the linear algebra Linear algebra label Aug 26, 2019
@tkf
Copy link
Member Author

tkf commented Aug 26, 2019

  • conversions are expensive.

What kind of conversion/promotion you are thinking? Can they be done lazily so that it doesn't have to be expensive at run-time?

It's not based on promotion but I can imagine doing something based on cost estimates (but yet possible to be compiled away):

"""
    MulImpl{Cost, OtherType}(impl)

`Cost` is a pair `(p, c)` s.t. `impl(A, B)` does about `c * n^p` multiplications
when all the (non-unit) sizes of `A` and `B` are `n`.
"""
struct MulImpl{Cost, OtherType, F}
    impl::F
end
MulImpl{Cost, OtherType}(impl::F) where F = MulImpl{Cost, OtherType, F}(impl)
cost(::MulImpl{Cost}) where Cost = Cost
othertype(::MulImpl{<:Any, OtherType}) where OtherType = OtherType

"""
    lmulimpls(A) :: Tuple{Vararg{MulImpl}}
    rmulimpls(B) :: Tuple{Vararg{MulImpl}}
"""
lmulimpls(A) = ()
rmulimpls(B) = ()

function (*)(A::AbstractMatrix, B::AbstractMatrix)
    candidates = (
        filter(impl -> B isa othertype(impl), lmulimpls(A))...,
        filter(impl -> A isa othertype(impl), rmulimpls(B))...,
        MulImpl{(Inf,), Any}(_mulfallback),
    )
    impl = first(sort(candidates, by=cost))
    return impl(A, B)
end

lmulimpls(::UpperTriangular) = (
    MulImpl{(2, 0.5), Number}(...), # UpperTriangular * Number needs about 0.5 * n^2 multiplications
    # and so on...
)

@mbauman
Copy link
Member

mbauman commented Aug 26, 2019

Here's an example: rand(1000,1000) * (1:1000). It's cheaper to "convert" the range to a Vector and then do the Matrix * Vector than it is to do the Matrix * UnitRange since, of course, the former hits BLAS. But like I said, I'm not sure that's totally relevant here because there are lots of permutations of wrappers and dimensions and element types that you'd want to support.

The other alternative, of course, is to request a coarser "memory layout" that specifies a more abstract API and then we dispatch on that — and while it'd be very powerful I think it just effectively pushes the ambiguity problem down one layer. Ref. #25558.

@tkf
Copy link
Member Author

tkf commented Aug 26, 2019

Thanks, that's a very clear case.

But like I said, I'm not sure that's totally relevant here because there are lots of permutations of wrappers and dimensions and element types that you'd want to support.

I see. I guess I misread your last comment because I thought conversion idea could be used for complicated nested type like Adjoint-of-Hermitian-of-something.

The other alternative, of course, is to request a coarser "memory layout"

I was vaguely aware of #25558 but I thought it was mostly about in-place implementations. If so, while it is very relevant, it does not directly solve the problem with binary operations like * that are "allowed" to allocate more proactively than mul! (i.e., * can change the memory layout). But I haven't read the whole thread so I'm not sure.

I wonder if it makes sense to implement the layered dispatch idea only for "algebraic" wrappers like Adjoint, Transposes, Hermitian and Symmetric. My impression is that they are the major source of ambiguity. Maybe we can make it a bit extensible by using traits like isadjointinvariant and istransposesinvariant.

@timholy
Copy link
Member

timholy commented Aug 27, 2019

Agreed that here there isn't a privileged item, and I don't have time now to think this through carefully, but wouldn't

*(A, B) = _mul(MulTrait(A), MulTrait(B), A, B)
_mul(trait1, trait2, A, B) = _mul(MulTrait(trait1, trait2), A, B)
_mul(::HandleByReordering, A, B) = _permutedmul(A, B)

MulTrait(::Adjoint) = HandleByReordering()
...
MulTrait(::T, ::T) where T = T()
MulTrait(::HandleByReordering, ::Any) = HandleByReordering()
MulTrait(::Any, ::HandleByReordering) = HandleByReordering()

go a long ways towards "splitting off" these special cases?

@timholy
Copy link
Member

timholy commented Aug 27, 2019

The redesign of broadcasting (#23939) would have probably been a better analogy.

@tkf
Copy link
Member Author

tkf commented Aug 28, 2019

MulTrait is interesting. I guess an important question here would be how to prevent infinite recursion. _permutedmul would likely to call _mul with different types of A and B so I guess we can have infinite recursion if we are not careful. Can we guarantee some kind of convergence property? Hard-coding recursion limit may not be so bad?

But, because of this unpredictability, I still think cost-based dispatch #33072 (comment) is easier to handle than MulTrait if we were to go to trait-based *. I also still don't think if-based plumbing is so bad (provided that there is no loss in the extensibility #33072 (comment)). We can of course factor out branches as methods. It is then equivalent to "Dispatch on one argument at a time" style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants