-
Notifications
You must be signed in to change notification settings - Fork 5
Future Development #81
Comments
I'd love it if we could get DiffRules to support both - it shouldn't be that hard, I actually had a half-working implementation of it a long time ago, but the branch has since gone the way of the dinosaur... AFAICT the only thing that blocks DiffRules from being used for mixed-scalar/tensor derivatives right now is simply that the rules don't have dispatch metadata attached to clarify which arguments are scalars and which are tensors. It shouldn't be too bad to support, for example, @define_diffrule M.f(x::Tensor, y::Scalar) = ...
@define_diffrule M.f(x::Tensor, y::Tensor) = ...
⋮
Yes please! It would be so nice to have a package to replace Calculus for finite differencing. |
Apologies for taking ages to properly respond to this. We've now separated out the finite differencing functionality, and the linear algebra ops. FDM is already registered on METADATA, and DiffLinearAlgebra will be shortly. I'm definitely open to getting the linear algebra operations from DiffLinearAlgebra into DiffRules, but there are going to be a few things to think about in terms of the API; there are obviously more performance considerations to make when linear algebra is involved than there are in the scalar case. For example, this issue highlights that there are cases when it is possible + desirable to avoid recomputing certain intermediate quantities if the sensitivity w.r.t. to multiple outputs are requested. Similarly, there are certain mutating implementations of sensitivities that (I think) are only safe to use in particularly situations. I don't have workable solutions for either of these things a the minute, and I'm sure there are more, so I don't want to accidentally tie us in to a sub-optimal solution that could easily be avoided. @jrevels what are your thoughts regarding the best way forward here? It would be great if we could get your thoughts on DiffLinearAlgebra at some point to determine what needs to happen before it would be useful for Capstan. |
Hmm...after looking at DiffLinearAlgebra (great stuff, BTW!), it seems to be solving a very different set of problems than DiffRules. I probably wouldn't categorize it as "DiffRules for linear algebra." IIUC, DiffLinearAlgebra is a library of hand-optimized, adjoint-based derivative kernels intended to be eagerly executed by downstream reverse-mode AD tools (for example, as part of tape evaluation). In contrast, DiffRules provides a library of symbolic descriptions of derivative kernels. DiffRules doesn't provide any eager kernel implementations, and is in fact totally agnostic to, e.g. automatic differentiation (and choice of forward-mode, reverse-mode, etc.). The idea is that downstream tools can use DiffRules to "compile" their own eager kernel implementations, in whatever manner best suits the tool. To facilitate this approach, DiffRules' rules are written out quite naively. For example, the rule for My vision for Capstan is that it will tackle issues like intermediate state caching/recomputation, argument selection, and dependency analysis as compilation problems when generating individual kernel implementations. TBH, it's a pretty idealistic plan, but we'll see how far I get 😄 In that vein, DiffLinearAlgebra will certainly be useful to benchmark against. |
I agree with your first three paragraphs: the `DiffRules for linear algebra' statement is a bit misleading. The reason that DiffLinearAlgebra currently doesn't provide a collection of symbolic expressions is historical; it's a minimal port from Nabla and I have no issue whatsoever transitioning over to symbolic expressions (where possible) if that would make this library more useful (doing so would probably actually make a number of implementation details cleaner and make testing easier). That being said, whereas forward- and reverse-mode sensitivities of scalar operations can always be expressed in the form Regarding the Anyway, my point is that given symbolic expressions for the linear algebra operations I agree that it's reasonable to hope that compiler optimisations can eliminate redundant code when compiling custom kernel implementations, and that this is a significantly better idea than hand-coding lots of optimisations. (The issue I linked in my previous comment is a good example of this. I would definitely prefer to be able to just forget about this). However, I would contend that you simply can't handle linear algebra properly without a number of hand-coded symbolic expressions for the forward- and reverse-mode sensitivities because they aren't written in Julia. If at some point in the future we have native Julia implementation of (for example) LAPACK, then it would be a really good idea to try and produce an AD tool which is able to produce reasonably-well optimised kernels for each operation. To the best of my knowledge, we shouldn't expect this to happen any time soon (and almost certainly never for BLAS), so a symbolic version of the current implementation of DiffLinearAlgebra will be necessary for Capstan to be able to differentiate arbitrary Julia code even reasonably efficiently. Anyway, if you're interested, perhaps we could move this type of discussion over to either the DiffRules or DiffLinearAlgebra repo and figure out what kind of changes are necessary to make the code here useful for other AD packages (i.e. Capstan)? |
In the medium-long term, it makes little sense to continue to support a tape-based AD package when there is excellent work going on elsewhere in the Julia community; see Capstan, which is built on Cassette, which is intended to be the place to go for AD in Julia, and is a something like an extension of the functionality offered separately in ForwardDiff and ReverseDiff.
Nabla makes sense from our perspective in a world where ReverseDiff.jl is the other serious option, primarily because ReverseDiff.jl doesn't support the linear algebra optimisations that we require for our GP work and it's a pain to add them. However, the interface for defining custom sensitivities in Capstan is planned to be quite straightforward. This change in the Julia AD landscape can allow us to get the best of both worlds: a fully-featured tape-based AD package (the nuts-and-bolts of which we don't have to maintain) to which we can contribute the linear algebra optimisations that we require.
To this end I propose the following:
The code for points 2 and 3 is, by design, already very well separated from the rest of the code in Nabla, so creating the two new packages is a quick job if someone sets up the repos (they will obviously need to be public immediately). The small extension of @wesselb 's finite differencing should only take an afternoon. There is an unbounded amount of work that could go into point 3 as there is a lot of scope for linear algebra optimisation in reverse-mode AD. The intention is to communicate with other people in the community to ensure that the library is useful for their purposes; as soon as it's up and running in a basic form I'll whack it on the Julia slack and get a conversation going about it. It's important that this happens sooner rather than later.
@omus as discussed previously, I will separate out the code associated with points 2 and 3 into separate submodules within Nabla once I've merged #78 and #79, and then we can sort out repos for the new packages.
The text was updated successfully, but these errors were encountered: