Supporting Linear Algebraic Primitives #10

jrevels · 2018-02-20T14:13:52Z

(continued from invenia/Nabla.jl#81, cc @willtebbutt)

This means that we would need a slightly more general interface for linear algebra, and would certainly require different forward- and reverse- mode expressions, than is currently provided by DiffRules.

Agreed, DiffRules only properly handles scalar kernels now. To support linear algebra, we need to add a notion of tensor/scalar, allowing in-place methods, marking adjoint variables, etc. to DiffRules.

Regarding the where possible statement above, there are certainly operations (such as the Cholesky factorisation) for which symbolic expressions might be a bit unweildy (see here). Now that I think about this a bit more, I'm not sure whether these larger implementations are going to be problematic or not.

Anyway, my point is that given symbolic expressions for the linear algebra operations I agree that it's reasonable to hope that compiler optimisations can eliminate redundant code when compiling custom kernel implementations, and that this is a significantly better idea than hand-coding lots of optimisations. (The issue I linked in my previous comment is a good example of this. I would definitely prefer to be able to just forget about this). However, I would contend that you simply can't handle linear algebra properly without a number of hand-coded symbolic expressions for the forward- and reverse-mode sensitivities because they aren't written in Julia. If at some point in the future we have native Julia implementation of (for example) LAPACK, then it would be a really good idea to try and produce an AD tool which is able to produce reasonably-well optimised kernels for each operation. To the best of my knowledge, we shouldn't expect this to happen any time soon (and almost certainly never for BLAS), so a symbolic version of the current implementation of DiffLinearAlgebra will be necessary for Capstan to be able to differentiate arbitrary Julia code even reasonably efficiently.

I think there might've been a misunderstanding with my previous post 😛I definitely am not arguing that we should express e.g. complex LAPACK kernels symbolically, and I didn't mean to imply that DiffRules/DiffLinearAlgebra were directly competing approaches. On the contrary, I think they're quite complementary - if DiffLinearAlgebra didn't exist, I eventually would need to make a "DiffKernels.jl" anyway. DiffRules is useful for mapping primal functions to derivative functions, and is thus useful when generating e.g. instruction primitives/computation graphs within downstream tools (i.e. it solves the problem "what kernels should I call and how should I call them?"). DiffLinearAlgebra (as it stands) is useful for providing implementations of these kernels (i.e. solves the problem "how do I execute the kernels that I'm calling?"). They're both necessary components of the AD ecosystem.

As for deciding what computations should be primitives, I think we're already on the same page; a computation should be defined as a primitive if either/both of the following applies:

it is difficult to express the computation as a composition of existing primitives
a hand-optimized kernel for the computation is sufficiently more performant than the equivalent composition of existing primitives, even after taking into account potential compiler-level optimizations.

jrevels · 2018-02-20T14:21:37Z

Just to be clear, let's address your cholfact example explicitly:

there are certainly operations (such as the Cholesky factorisation) for which symbolic expressions might be a bit unweildy (see here).

I would never argue that cholfact/∇cholfact would be composed of other primitives; cholfact is a primitive itself. By declaring it as such in DiffRules (and pointing to the DiffLinearAlgebra kernel as the actual implementation), downstream tools could then support cholfact as a primitive automatically, without requiring any alteration in the downstream tools' code.

In order to make such a declaration in DiffRules, however, we still need to add API support for linear algebraic primitives (as I mentioned earlier).

willtebbutt · 2018-02-25T21:07:50Z

Great, it looks like we're on pretty much the same page then.

Possibly our only difference of opinion is that I now can't see why you would ever want to have a library of "eager" kernels, as opposed to one that provides the code you need to automatically compile your own in a downstream package. I can't think of a situation in which exposing objects that contain

function name
argument types
argument names
code to implement the reverse pass for each argument

isn't strictly better than providing the first three things + the corresponding implemented method. A downstream package can clearly reconstruct the method given the code (it doesn't really matter how long the code for any particular sensitivity is), and as you pointed out it may be possible to perform optimisations given a symbolic representation of the sensitivity that you can't when you only have access to a method (i.e. it might be useful to perform a CSE optimisation when the sensitivities w.r.t. multiple arguments are required - if you compile a custom sensitivity on the fly using symbolic representations of the sensitivity w.r.t each argument, then you can do such optimisations).

What are your thoughts on this? I may be missing something obvious.

(On a related note, it might be an idea to replace the things in the last two bullet points with a function which accepts the argument names the downstream package wants to use, and returns code using those argument names)

jrevels · 2018-02-27T18:52:53Z

A downstream package can clearly reconstruct the method given the code (it doesn't really matter how long the code for any particular sensitivity is), and as you pointed out it may be possible to perform optimisations given a symbolic representation of the sensitivity that you can't when you only have access to a method

We're in full agreement here.

it might be useful to perform a CSE optimisation when the sensitivities w.r.t. multiple arguments are required

Definitely. Actually, this made me think of a nice API change for helping with manually-optimized "multiple sensitivities" cases (e.g. where CSE etc. can't/doesn't suffice). Currently, DiffRules requires that the rule author provide a standalone derivative expression for each argument. Instead, we could require that rule authors mark differentiated variables explicitly, for example:

@define_diffrule M.f(wrt(x), wrt(y)) = expr_for_dfdx_and_dfdy($x, $y)
@define_diffrule M.f(wrt(x), y) = expr_for_dfdx($x, $y)
@define_diffrule M.f(x, wrt(y)) = expr_for_dfdy($x, $y)

Possibly our only difference of opinion is that I now can't see why you would ever want to have a library of "eager" kernels

Well, it's up to the rule author to decide the level of granularity of the function calls present in the derivative expression. On one extreme end of the spectrum, the rule author can inline as much as possible (i.e. compose the derivative expression using only Core.Builtins), while on the other end, the derivative expression can just contain a single call to an eager kernel. I imagine most cases naturally fall somewhere in between, and IMO, the line should just be drawn on a case-by-case basis by the rule author. There are advantages to both sides of the spectrum; as you mentioned, aggressive inlining provides downstream compilation tools with more raw material, but eager kernels can take advantage of multiple dispatch/method overloading earlier in the computation, and are (obviously) necessary for calling non-Julia code.

(On a related note, it might be an idea to replace the things in the last two bullet points with a function which accepts the argument names the downstream package wants to use, and returns code using those argument names)

Yup, that's the way DiffRules currently works (you can interpolate Exprs as well).

willtebbutt · 2018-03-03T12:26:19Z

I agree with all of the above.

I like your wrt proposal a lot - it solves the problem in a much cleaner way than we're currently allowing (although not properly exploiting) in Nabla / DiffLinearAlgebra.

Running with this, a reverse-mode rule could be something like:

@define_reverse z::Tz z̄::Tz̄ M.f(wrt(x::Tx), y::Ty) = expr_dOdx($z, $z̄, $x, $y)
@define_reverse z::Tz z̄::Tz̄ M.f(wrt(x::Tx), y::Ty) = expr_dOdx!($x̄, $z, $z̄, $x, $y) x̄::Tx̄

where I've just given the macro a different name and added a couple of extra terms at the front end to pass in the , and the second rule is for in-place updates for if x already has a value. Similarly a forward-rule could be something like:

@define_forward ẋ::Tẋ ẏ::Tẏ M.f(x::Tx, y::Ty) = expr_dfdI($x, $y, $ẋ, $ẏ)

Does this sound reasonable?

The above doesn't directly address more complicated method definitions (e.g. involving diagonal dispatch), but I can't see any reason in principle that it couldn't be extended to handle that kind of thing. Also, I'm not sure about the ordering of the arguments for the in-place @define_reverse is optimal.

willtebbutt mentioned this issue Aug 11, 2018

WIP: AD stuff #22

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting Linear Algebraic Primitives #10

Supporting Linear Algebraic Primitives #10

jrevels commented Feb 20, 2018 •

edited

Loading

jrevels commented Feb 20, 2018

willtebbutt commented Feb 25, 2018 •

edited

Loading

jrevels commented Feb 27, 2018

willtebbutt commented Mar 3, 2018

Supporting Linear Algebraic Primitives #10

Supporting Linear Algebraic Primitives #10

Comments

jrevels commented Feb 20, 2018 • edited Loading

jrevels commented Feb 20, 2018

willtebbutt commented Feb 25, 2018 • edited Loading

jrevels commented Feb 27, 2018

willtebbutt commented Mar 3, 2018

jrevels commented Feb 20, 2018 •

edited

Loading

willtebbutt commented Feb 25, 2018 •

edited

Loading