Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ADTypes and autodiff by default #178

Merged
merged 48 commits into from
Jul 2, 2024
Merged

Support ADTypes and autodiff by default #178

merged 48 commits into from
Jul 2, 2024

Conversation

sethaxen
Copy link
Member

@sethaxen sethaxen commented Mar 16, 2024

Turing, Optimization, and LogDensityProblems now support specifying an AD backend using ADTypes. Turing and Optimization do this via an adtype keyword. This PR adds an adtype parameter to pathfinder and multipathfinder, which then hooks into Optimization.jl's AD machinery to automatically differentiate the log-density function.

Since we depend on ForwardDiff indirectly via Optim, we use AutoForwardDiff as the default adtype, so that AD functions are automatically differentiated by default.

As a result, the PR also adds support for generic log-density functions (not just those that implement the LogDensityProblems interface).

Fixes #93

Copy link

codecov bot commented Mar 16, 2024

Codecov Report

Attention: Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.61%. Comparing base (fae6c4b) to head (dccbc6d).

Files Patch % Lines
ext/PathfinderTuringExt.jl 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #178      +/-   ##
==========================================
- Coverage   84.72%   80.61%   -4.12%     
==========================================
  Files          13       13              
  Lines         609      614       +5     
==========================================
- Hits          516      495      -21     
- Misses         93      119      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SamuelBrand1
Copy link

Hi @sethaxen,

Following up from #93 . For our example, pathfinder fails here with a MethodError on a promote_rule

MethodError: promote_rule(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationReverseDiffExt.OptimizationReverseDiffTag, Float64}, Float64, 12}}) is ambiguous.

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.

Is that an error on Forward-over-reverse hessian calculation?

@sethaxen
Copy link
Member Author

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.

At first glance, I don't think this is a Pathfinder issue. We only use log2π in one place, and its outside of anything AD should be seeing. Does this work fine?

using Optimization, OptimizationOptimJL
prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false)
sol = solve(prob, LBFGS())

If so, this is more or less what Pathfinder is doing internally, so the issue is probably somewhere in Pathfinder. If this fails, then the issue is somewhere else. Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?)

Can you post the full stack trace?

Is that an error on Forward-over-reverse hessian calculation?

Not certain. Neither L-BFGS nor Pathfinder computes the Hessian, so if your model doesn't either, I don't see why this would be forward-over-reverse. But ReverseDiff does use ForwardDiff internally in a few places.

@SamuelBrand1
Copy link

SamuelBrand1 commented Mar 19, 2024

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.
Does this work fine?

using Optimization, OptimizationOptimJL
prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false)
sol = solve(prob, LBFGS())

For some reason that doesn't generate an init; I thought there was a fallback?

ERROR: MethodError: no method matching init(::@NamedTuple{…}, ::LBFGS{…})
Closest candidates are:
init(::OptimizationProblem, ::Any, Any...; kwargs...)
@ SciMLBase ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:164
init(::Roots.ZeroProblem, ::Any; kwargs...)
@ Roots ~/.julia/packages/Roots/neTBD/src/find_zero.jl:306
init(::Roots.AbstractUnivariateZeroMethod, ::Any, ::Roots.AbstractUnivariateZeroState)
@ Roots ~/.julia/packages/Roots/neTBD/src/find_zero.jl:311
...

Stacktrace:
[1] solve(::@NamedTuple{…}, ::Vararg{…}; kwargs::@kwargs{})
@ CommonSolve ~/.julia/packages/CommonSolve/JfpfI/src/CommonSolve.jl:23
[2] top-level scope
@ REPL[9]:1
Some type information was truncated. Use show(err) to see complete types.

EDIT: Turing.optim_problem does create a Turing.Optimisation.Init object as a field of prob but its obviously not dispatching for some reason.

@SamuelBrand1
Copy link

Can you post the full stack trace?

Here you go:

ERROR: MethodError: promote_rule(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationReverseDiffExt.OptimizationReverseDiffTag, Float64}, Float64, 12}}) is ambiguous.

Candidates:
promote_rule(::Type{S}, ::Type{T}) where {S<:AbstractIrrational, T<:Number}
@ Base irrationals.jl:45
promote_rule(::Type{<:AbstractIrrational}, ::Type{T}) where T<:Real
@ Base irrationals.jl:44
promote_rule(::Type{R}, ::Type{ForwardDiff.Dual{T, V, N}}) where {R<:Real, T, V, N}
@ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/dual.jl:427

Possible fix, define
promote_rule(::Type{R}, ::Type{ForwardDiff.Dual{T, V, N}}) where {R<:AbstractIrrational, T, V, N}

Stacktrace:
[1] promote_type(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 12}})
@ Base ./promotion.jl:313
[2] promote_type(::Type, ::Type, ::Type)
@ Base ./promotion.jl:299
[3] ForwardOptimize
@ ~/.julia/packages/ReverseDiff/UJhiD/src/macros.jl:122 [inlined]
[4] +
@ ~/.julia/packages/ReverseDiff/UJhiD/src/derivatives/scalars.jl:17 [inlined]
[5] normlogpdf(z::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing})
@ StatsFuns ~/.julia/packages/StatsFuns/mrf0e/src/distrs/norm.jl:29
[6] normlogpdf(μ::Float64, σ::Float64, x::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing})
@ StatsFuns ~/.julia/packages/StatsFuns/mrf0e/src/distrs/norm.jl:41
[7] logpdf(d::Normal{Float64}, x::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing})
@ Distributions ~/.julia/packages/Distributions/UaWBm/src/univariates.jl:645
[8] logpdf(d::Truncated{…}, x::ReverseDiff.TrackedReal{…})
@ Distributions ~/.julia/packages/Distributions/UaWBm/src/truncate.jl:161
[9] tilde_assume(ctx::Turing.Optimisation.OptimizationContext{…}, dist::Truncated{…}, vn::VarName{…}, vi::DynamicPPL.ThreadSafeVarInfo{…})
@ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:60
[10] tilde_assume
@ ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:63 [inlined]
[11] tilde_assume
@ ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:57 [inlined]
[12] tilde_assume!!(context::DynamicPPL.FixedContext{…}, right::Truncated{…}, vn::VarName{…}, vi::DynamicPPL.ThreadSafeVarInfo{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:138
[13] generate_latent(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::DynamicPPL.FixedContext{…}, latent_model::RandomWalk{…}, n::Int64)
@ EpiAware.EpiLatentModels ~/GitHub/CFA/Rt-without-renewal/EpiAware/src/EpiLatentModels/randomwalk.jl:89
[14] _evaluate!!
@ ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:963 [inlined]
[15] macro expansion
@ ~/.julia/packages/DynamicPPL/pg70d/src/submodel_macro.jl:244 [inlined]
[16] make_epi_aware(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::DynamicPPL.FixedContext{…}, y_t::Vector{…}, time_steps::Int64; epi_model::DirectInfections{…}, latent_model::RandomWalk{…}, observation_model::DelayObservations{…})
@ EpiAware ~/GitHub/CFA/Rt-without-renewal/EpiAware/src/make_epi_aware.jl:8
[17] _evaluate!!(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:963
[18] evaluate_threadsafe!!(model::Model{…}, varinfo::TypedVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:952
[19] evaluate!!(model::Model{…}, varinfo::TypedVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:887
[20] (::LogDensityFunction{…})(z::ReverseDiff.TrackedArray{…})
@ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:111
[21] (::Turing.Optimisation.var"#l#4"{…})(x::ReverseDiff.TrackedArray{…}, ::SciMLBase.NullParameters)
@ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:294
[22] (::OptimizationReverseDiffExt.var"#51#75"{…})(::ReverseDiff.TrackedArray{…})
@ OptimizationReverseDiffExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationReverseDiffExt.jl:157
[23] ReverseDiff.GradientTape(f::OptimizationReverseDiffExt.var"#51#75"{…}, input::Vector{…}, cfg::ReverseDiff.GradientConfig{…})
@ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/api/tape.jl:199
[24] ReverseDiff.GradientTape(f::Function, input::Vector{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 12}})
@ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/api/tape.jl:198
[25] instantiate_function(f::OptimizationFunction{…}, cache::OptimizationBase.ReInitCache{…}, adtype::AutoReverseDiff, num_cons::Int64)
@ OptimizationReverseDiffExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationReverseDiffExt.jl:187
[26] OptimizationCache(prob::OptimizationProblem{…}, opt::LBFGS{…}, data::Base.Iterators.Cycle{…}; callback::Pathfinder.OptimizationCallback{…}, maxiters::Int64, maxtime::Nothing, abstol::Nothing, reltol::Nothing, progress::Bool, kwargs::@kwargs{})
@ OptimizationBase ~/.julia/packages/OptimizationBase/rRpJs/src/cache.jl:27
[27] OptimizationCache
@ ~/.julia/packages/OptimizationBase/rRpJs/src/cache.jl:17 [inlined]
[28] #__init#2
@ ~/.julia/packages/OptimizationOptimJL/fdBis/src/OptimizationOptimJL.jl:101 [inlined]
[29] __init (repeats 2 times)
@ ~/.julia/packages/OptimizationOptimJL/fdBis/src/OptimizationOptimJL.jl:68 [inlined]
[30] #init#599
@ ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:166 [inlined]
[31] init
@ ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:164 [inlined]
[32] solve(::OptimizationProblem{…}, ::LBFGS{…}; kwargs::@kwargs{…})
@ SciMLBase ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:96
[33] optimize_with_trace(prob::OptimizationProblem{…}, optimizer::LBFGS{…}; progress_name::String, progress_id::Base.UUID, maxiters::Int64, callback::Nothing, fail_on_nonfinite::Bool, kwargs::@kwargs{})
@ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/optimize.jl:53
[34] optimize_with_trace
@ ~/.julia/packages/Pathfinder/99eNx/src/optimize.jl:34 [inlined]
[35] _pathfinder(rng::Random._GLOBAL_RNG, prob::OptimizationProblem{…}, logp::Pathfinder.var"#logp#26"{…}; history_length::Int64, optimizer::LBFGS{…}, ndraws_elbo::Int64, executor::SequentialEx{…}, kwargs::@kwargs{…})
@ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:290
[36] _pathfinder
@ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:279 [inlined]
[37] _pathfinder_try_until_succeed(rng::Random._GLOBAL_RNG, prob::OptimizationProblem{…}, logp::Pathfinder.var"#logp#26"{…}; ntries::Int64, init_scale::Int64, init_sampler::Pathfinder.UniformSampler{…}, allow_mutating_init::Bool, kwargs::@kwargs{…})
@ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:265
[38] _pathfinder_try_until_succeed
@ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:253 [inlined]
[39] #25
@ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:182 [inlined]
[40] progress(f::Pathfinder.var"#25#27"{…}; name::String)
@ ProgressLogging ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:262
[41] progress
@ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:258 [inlined]
[42] pathfinder(prob::OptimizationProblem{…}; rng::Random._GLOBAL_RNG, history_length::Int64, optimizer::LBFGS{…}, ndraws_elbo::Int64, ndraws::Int64, input::Model{…}, kwargs::@kwargs{})
@ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:181
[43] pathfinder(model::Model{…}; rng::Random._GLOBAL_RNG, init_scale::Int64, init_sampler::Pathfinder.UniformSampler{…}, init::Nothing, adtype::AutoReverseDiff, kwargs::@kwargs{})
@ PathfinderTuringExt ~/.julia/packages/Pathfinder/99eNx/ext/PathfinderTuringExt.jl:118
[44] top-level scope
@ ~/GitHub/CFA/Rt-without-renewal/EpiAware/docs/src/examples/getting_started.jl:295
Some type information was truncated. Use show(err) to see complete types.

@SamuelBrand1
Copy link

Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?)

Yes, the workflow is that pathfinder is part of initialisation for a HMC/NUTS sampler and the NUTS runs ok (even better with good pre-heating!).

@SamuelBrand1
Copy link

Thanks for looking into this!

@sethaxen
Copy link
Member Author

@SamuelBrand1 this line is using a deprecated syntax: https://github.com/CDCgov/Rt-without-renewal/blob/d6344cc6e451e3e6c4188e4984247f890ae60795/EpiAware/src/EpiLatentModels/utils.jl#L19, which could be the issue here.
Try this instead, and let me know if it works truncated(Normal(0, prior_mean * sqrt(pi) / sqrt(2)); lower=0)

@SamuelBrand1
Copy link

That doesn't solve this problem.

However, switching to AutoReverseDiff(false) does. The model I'm sampling can fail because it can sample some big numbers in the initialisation phase which causes problems with sampling from a Negative binomial, this seems to cause problems with pathfinder in reversediff but either not a problem or a silent problem for AdvancedHMC NUTS calls.

I think making the model more resilient is our problem though!

Thanks for helping so much.

@sethaxen
Copy link
Member Author

However, switching to AutoReverseDiff(false) does

Ah, okay. Tape compilation is only safe if during program execution the same branches of all control flow are always taken. Not certain what control flow you might be encountering here. If this is the issue, it's maybe a little strange that Turing and Pathfinder could hit different branches. One way this can happen is if the bulk of the probability mass hits one branch, while the mode hits a different branch, since modes can look very different from the bulk; then you expect any good optimizer to hit a different branch.

Either way, if AutoReverseDiff(false) is working with Pathfinder, that seems to indicate well that this PR is doing the right thing, and the issue is probably elsewhere.

Thanks for helping so much.

No problem!

@sethaxen sethaxen closed this Jul 2, 2024
@sethaxen sethaxen reopened this Jul 2, 2024
@sethaxen sethaxen marked this pull request as ready for review July 2, 2024 19:59
@sethaxen sethaxen merged commit c334ef7 into main Jul 2, 2024
16 of 17 checks passed
@sethaxen sethaxen deleted the adtypes branch July 2, 2024 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Choosing AD backend in Turing integration
2 participants