-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use DI for non-implemented ADTypes #39
Conversation
@tpapp @devmotion thoughts? |
I think the test errors are due to the breaking version of Enzyme, which is why #38 might have higher priority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thank you. The PR should also add mentioning the support of DI in the README.
Thanks for the ping, I was a bit busy last week to review this. This looks like a very lightweight addition that at the same time enables the use of DifferentiationInterfaces (for all supported backends), which extends the functionality of the package, and in the long run also allows replacing existing backends with DI as the code matures. Tests currently do not run, I think Enzyme compat needs to be broadened. |
No worries, thanks for the review, I'll take your remarks into account.
Not possible, the Enzyme v0.13 change was very breaking and DI cannot afford to support every past version, so I used their breaking change as an opportunity to tag mine as well. |
@willtebbutt does this clash with the Mooncake extension for LogDensityProblemsAD? |
I have no idea -- if it does, I'm more than happy to remove my extension and rely on the contents of this PR. Me having to look after less code is never a problem. |
@gdalle: I am wondering if a wrapper like with_preparation(ADgradient(backend, ℓ), zeros(3))) could provide a reasonable API, without keywords. Would not even need a separate |
My idea here was to mimick the existing API as closely as possible. Some constructors of LogDensityProblemsAD.jl/ext/LogDensityProblemsADForwardDiffExt.jl Lines 96 to 99 in 2ce49ce
LogDensityProblemsAD.jl/ext/LogDensityProblemsADReverseDiffExt.jl Lines 45 to 47 in 2ce49ce
They also take other kwargs like config or compile information, but with ADTypes this is stored in the backend object itself so we no longer need to pass it |
Tests pass locally |
@devmotion @tpapp is this better with the latest changes? |
Co-authored-by: David Widmann <[email protected]>
@gdalle: Thanks for the recent updates. I understand and appreciate that you want to keep the API consistent with the existing one. However, that API predates the AD-unification libraries (like DI) and is not particularly well designed because it does not reify the AD process. Specifically, I now believe that the ideal API would be something like ADgradient(how, P) where In contrast, currently we have AGgradient(how_backend, P; how_details...) and your PR (in its current state) extends the existing API in this direction. In fact, DI does not reify So there are two questions:
I appreciate your work on DI and your PRs here, and please understand that I am not pushing back on changes. I think DI is a great idea, but I want to do it right so that this package breaks its own API the fewest times possible (eventually, I want to encourage users to move on the the new API, and deprecate & remove the existing one). @devmotion, what do you think? |
Thanks for your kind answer @tpapp, and for your work on this part of the ecosystem.
In my view, the AD extensions of LogDensityProblemsAD filled a crucial void when DI did not exist. Now that DI is in a pretty good state, I don't know if this Of course, to get peak performance or avoid some bugs, you still want to tune the bindings for every backend. But if every Julia package does that separately, it is a huge waste of time and LOCs. My hope is that this tuning can be done in a single place and fit 99% of use cases, which is what DI is for. I'm always open to suggestions for performance or design improvements.
The DI interface with preparation looks like this: gradient(f, prep, backend, x, contexts...) where prepare_gradient(f, backend, typical_x, typical_contexts...) In your terms:
This shows that there are two sides to the |
So where do you wanna go from here? |
Possibly, but that is speculation. At the moment, there is no generic AD wrapper interface that provides what this package does. Preparation, as you explained above, is one example.
I want to reflect a bit on this, and also hear comments from the users. Currently I am leaning towards cleaning up the interface the following way:
We could of course merge your PR as is, then later deprecate this. |
Well, I would love for DI to provide this. What do you think is missing then?
The idea of this PR was to be minimally invasive, so that you can gradually drop extensions in favor of a better-maintained and tested DI implementation. Therefore, I think it is a good idea to merge it before a possible breaking revamp of LogDensityProblemsAD, especially if you want to use more of DI in the revamp. |
A way to pack everything in
As I said above, that is a possibility I am considering. I will wait for thought from @devmotion. |
If you want to use only DI, this is as simple as something like struct How{B,P}
backend::B
prep::P
end But if you want to also adapt this to your existing extensions, then of course it's a bit more work. I'll let you weigh the pros and cons. |
Hmm... I think conceptually keeping What I dislike about the I think deprecating or directly replacing the |
@devmotion I agree that ADTypes are overall more expressive than symbols, which is why they were introduced. But even deprecating the |
In my previous attempt #29, the main obstacles to full DI adoption were
The first one has been resolved, the second one is much more a Tracker issue than a DI one.
Sure, your own AD interface has already been written, but it still needs to be updated and whenever any backend changes (e.g. #37 and #38 for the latest Enzyme). Since DI is to become the standard (already used in Optimization.jl, NonlinearSolve.jl and more), it will remain actively maintained and react to evolutions of the ecosystem (like the new Mooncake.jl package). The way things work at the moment, you also need to perform the same adaptations in parallel, or give up on the latest features, both of which are a bit of a waste. |
I think it would. I think the only keyword left should be a typical input |
Yes you're right, So if Tamas agrees, I guess the question is whether you want to deprecate |
@gdalle: I would prefer to do it like this:
So the only use case for this is preparation? I will need some time to look into DI code to see what it does exactly: does it need a type (like a I need some time to read up on this, I will be away from my computer for the weekend but I will get back to this topic on Tuesday. Suggestions welcome. @gdalle, I appreciate your work a lot on DI and want to move forward with this, but I need to understand the details so that we can make a smooth transition, and for that I need time. I expect that this package is not fully replaceable by DI, as it does a few extra things (again, a "problem" defined through this API knows about its dimension and AD capabilities), but I agree that we should remove redundancies. |
Fair enough! How about the following toggle? ADgradient(backend::AbstractADType, l, ::Val{DI}=Val(false); kwargs...) where {DI}
DI's design is interesting because preparation is effectively unlimited. We can put whatever we want in the
It needs an actual value, because things like the size of the vector are also important (and they are usually not part of the type). You can read more about the preparation system in the DI docs. |
@gdalle: I have read the DI docs and skimmed the source code. First, kudos on trying to organize all DI approaches into a coherent interface, it is a huge undertaking but should have a large payoff for the ecosystem in the long run. I have some preliminary thoughs wrt to the interface of LogDensityProblems and DI. First, in LogDensityProblems, the interface is focused on being functional:
The interface has no API to handle situations when the caller promises to use the same argument types, or values, in exchange for a potential speed benefit. I am entertaining the idea that we should expose "preparation" in the API (as defined by in the main interface package, LogDensityProblems.jl), where the caller promises to call the problem with the same argument type over and over, in exchange for speedups, and maybe preallocate stuff. The API should allow for querying the argument type above and whether the object is mutable (thread safety). Once we implement that, we can flesh out the AD interface using DI and that API. That is to say, preparation would not be exposed via DI, but our own API that forwards to DI. I am still thinking about the details but this is the general direction I am considering; I need to also understand sparse coloring its relation to preparation. |
This is a big difference indeed, and I understand why you would want to change your interface to accommodate it. Note that, at the moment, some backends already perform preparation when you pass
Coloring is not relevant for gradients because a gradient is always dense (or you have some useless inputs) and can be computed in O(1) function calls-equivalents. Sparse AD is only useful when matrices are returned (Jacobians and Hessians). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I would rather merge this and have the functionality now because I don't want to invest time into revamping the package API (which will be a breaking change anyway). So this LGTM and I would be happy to merge if @devmotion agrees.
Co-authored-by: David Widmann <[email protected]>
Sorry I merged the review suggestions without checking typos, fixed now. The tests pass locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gdalle, thanks for the great work, following up on all suggestions, and your patience and persistence! LGTM.
@devmotion, @gdalle: would a minor version bump be OK for this? After all, we just add new features, even though the change is extensive. |
Yes, I think a minor release is appropriate here. |
This PR adds a teeny tiny extension for DifferentiationInterface (#26).
It can compute gradients for any
ADTypes.AbstractADType
that is not in the following list:AutoEnzyme
AutoForwardDiff
AutoReverseDiff
AutoTracker
AutoZygote
That way, your custom implementations remain the default, but for all other AD backends defined by ADTypes (and not symbols), DifferentiationInterface will kick in. This also allows you to gradually remove custom implementations in favor of DifferentiationInterface, if you so desire.
Ping @willtebbutt @torfjelde @adrhill
Note: since DI imposes Enzyme v0.13 in the tests, it may require merging #38 first.