-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ADTypes and autodiff by default #178
Conversation
Optimization.jl doesn't like this
Seems to be optional, and Turing models don't support it
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #178 +/- ##
==========================================
- Coverage 84.72% 80.61% -4.12%
==========================================
Files 13 13
Lines 609 614 +5
==========================================
- Hits 516 495 -21
- Misses 93 119 +26 ☔ View full report in Codecov by Sentry. |
Increasing these lower bounds ultimately produces a conflict since they require BangBang v0.4, but no Turing version yet is compatible with this version.
Hi @sethaxen, Following up from #93 . For our example, pathfinder fails here with a
Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail. Is that an error on Forward-over-reverse hessian calculation? |
At first glance, I don't think this is a Pathfinder issue. We only use using Optimization, OptimizationOptimJL
prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false)
sol = solve(prob, LBFGS()) If so, this is more or less what Pathfinder is doing internally, so the issue is probably somewhere in Pathfinder. If this fails, then the issue is somewhere else. Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?) Can you post the full stack trace?
Not certain. Neither L-BFGS nor Pathfinder computes the Hessian, so if your model doesn't either, I don't see why this would be forward-over-reverse. But ReverseDiff does use ForwardDiff internally in a few places. |
For some reason that doesn't generate an
EDIT: |
Here you go:
|
Yes, the workflow is that pathfinder is part of initialisation for a HMC/NUTS sampler and the NUTS runs ok (even better with good pre-heating!). |
Thanks for looking into this! |
@SamuelBrand1 this line is using a deprecated syntax: https://github.com/CDCgov/Rt-without-renewal/blob/d6344cc6e451e3e6c4188e4984247f890ae60795/EpiAware/src/EpiLatentModels/utils.jl#L19, which could be the issue here. |
That doesn't solve this problem. However, switching to I think making the model more resilient is our problem though! Thanks for helping so much. |
Ah, okay. Tape compilation is only safe if during program execution the same branches of all control flow are always taken. Not certain what control flow you might be encountering here. If this is the issue, it's maybe a little strange that Turing and Pathfinder could hit different branches. One way this can happen is if the bulk of the probability mass hits one branch, while the mode hits a different branch, since modes can look very different from the bulk; then you expect any good optimizer to hit a different branch. Either way, if
No problem! |
Turing, Optimization, and LogDensityProblems now support specifying an AD backend using ADTypes. Turing and Optimization do this via an
adtype
keyword. This PR adds anadtype
parameter topathfinder
andmultipathfinder
, which then hooks into Optimization.jl's AD machinery to automatically differentiate the log-density function.Since we depend on ForwardDiff indirectly via Optim, we use
AutoForwardDiff
as the defaultadtype
, so that AD functions are automatically differentiated by default.As a result, the PR also adds support for generic log-density functions (not just those that implement the LogDensityProblems interface).
Fixes #93