Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new backends with DifferentiationInterface.jl #302

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

amontoison
Copy link
Member

@amontoison amontoison commented Sep 11, 2024

Add the following backends:

  • Enzyme
  • Zygote
  • Mooncake
  • Diffractor
  • Tracker
  • Symbolics
  • ChainRules
  • FastDifferentiation
  • FiniteDiff
  • FiniteDifferences
  • PolyesterForwardDiff

Copy link
Contributor

github-actions bot commented Sep 11, 2024

Package name latest stable
CaNNOLeS.jl
DCISolver.jl
DerivativeFreeSolvers.jl
JSOSolvers.jl
NLPModelsIpopt.jl
OptimalControl.jl
OptimizationProblems.jl
Percival.jl
QuadraticModels.jl
SolverBenchmark.jl
SolverTools.jl

Copy link
Member

@tmigot tmigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amontoison for the PR. I have mixed feelings with it. On one side it is progress, on the other side we are loosing Hessian backend for Enzyme and Zygote.

How far are we from making it fully compatible?

Project.toml Outdated Show resolved Hide resolved
docs/src/backend.md Show resolved Hide resolved
src/di.jl Outdated Show resolved Hide resolved
src/di.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/script_OP.jl Show resolved Hide resolved
@amontoison
Copy link
Member Author

amontoison commented Sep 13, 2024

It's only one, so basically with this change we would no longer be able to use Hessian for Enzyme and Zygote.

We can but only for unconstrained problems.
I wanted to remove what was not working before.

The user will no longer be able to use an incorrect Hessian, which is better for everyone.

@amontoison amontoison changed the title Add more backends for Zygote and Enzyme Add new backends with DifferentiationInterface.jl Sep 25, 2024
@JuliaSmoothOptimizers JuliaSmoothOptimizers deleted a comment from tmigot Sep 26, 2024
@amontoison
Copy link
Member Author

@gdalle May I ask you to check what I did wrong in the file di.jl?
I have different errors with buildkite: https://buildkite.com/julialang/adnlpmodels-dot-jl/builds/243

@gdalle
Copy link
Collaborator

gdalle commented Sep 26, 2024

It looks like the problem comes from forgetting to import the function grad? Not a DI thing, presumably an NLPModels thing

@gdalle
Copy link
Collaborator

gdalle commented Sep 26, 2024

@dpo could you perhaps give me acces to the repo so that I may help with this and future PRs?

Copy link
Collaborator

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that if you want to switch to DI, you have to switch to ADTypes as well and stop doing this cumbersome translation between symbol and backend objects. What do you think?

@@ -1,23 +1,25 @@
# How to switch backend in ADNLPModels

`ADNLPModels` allows the use of different backends to compute the derivatives required within NLPModel API.
It uses `ForwardDiff.jl`, `ReverseDiff.jl`, and more via optional depencies.
It uses `ForwardDiff.jl`, `ReverseDiff.jl`, and more via optional dependencies.

The backend information is in a structure [`ADNLPModels.ADModelBackend`](@ref) in the attribute `adbackend` of a `ADNLPModel`, it can also be accessed with [`get_adbackend`](@ref).

The functions used internally to define the NLPModel API and the possible backends are defined in the following table:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just switch fully to the ADTypes specification? You're gonna run into trouble translating symbols into AbstractADType objects

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the symbols don't allow you to set parameters like

  • the number of chunks in ForwardDiff
  • the tape compilation in ReverseDiff
  • aspects of the mode in Enzyme

$\mathcal{L}(x)$ denotes the Lagrangian $f(x) + \lambda^T c(x)$.
Except for the backends based on `ForwardDiff.jl` and `ReverseDiff.jl`, all other backends require the associated AD package to be manually installed by the user to work.
Note that the Jacobians and Hessians computed by the backends above are dense.
The backends `SparseADJacobian`, `SparseADHessian`, and `SparseReverseADHessian` should be used instead if sparse Jacobians and Hessians are required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark for sparse AD, using AutoSparse seems more flexible?

(:ZygoteADGradient , :AutoZygote ),
# (:ForwardDiffADGradient , :AutoForwardDiff ),
# (:ReverseDiffADGradient , :AutoReverseDiff ),
(:MooncakeADGradient , :AutoMooncake ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AutoMooncake constructor requires a keyword, like so:

AutoMooncake(; config=nothing)

(:DiffractorADGradient , :AutoDiffractor ),
(:TrackerADGradient , :AutoTracker ),
(:SymbolicsADGradient , :AutoSymbolics ),
(:ChainRulesADGradient , :AutoChainRules ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AutoChainRules constructor requires a keyword, like so:

AutoChainRules(; ruleconfig=Zygote.ZygoteRuleConfig())

(:ChainRulesADGradient , :AutoChainRules ),
(:FastDifferentiationADGradient , :AutoFastDifferentiation ),
(:FiniteDiffADGradient , :AutoFiniteDiff ),
(:FiniteDifferencesADGradient , :AutoFiniteDifferences ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AutoFiniteDifferences constructor requires a keyword, like so:

AutoFiniteDifferences(; fdm=FiniteDifferences.central_fdm(3, 1))

x0::AbstractVector = rand(nvar),
kwargs...,
)
backend = $fbackend()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail for the three backends mentioned above. And for all other backends, this prevents you from setting any options, which was the goal of ADTypes.jl to begin with: see https://github.com/SciML/ADTypes.jl?tab=readme-ov-file#why-should-ad-users-adopt-this-standard

Comment on lines +175 to +187
for (ADHvprod, fbackend) in ((:EnzymeADHvprod , :AutoEnzyme ),
(:ZygoteADHvprod , :AutoZygote ),
# (:ForwardDiffADHvprod , :AutoForwardDiff ),
# (:ReverseDiffADHvprod , :AutoReverseDiff ),
(:MooncakeADHvprod , :AutoMooncake ),
(:DiffractorADHvprod , :AutoDiffractor ),
(:TrackerADHvprod , :AutoTracker ),
(:SymbolicsADHvprod , :AutoSymbolics ),
(:ChainRulesADHvprod , :AutoChainRules ),
(:FastDifferentiationADHvprod , :AutoFastDifferentiation ),
(:FiniteDiffADHvprod , :AutoFiniteDiff ),
(:FiniteDifferencesADHvprod , :AutoFiniteDifferences ),
(:PolyesterForwardDiffADHvprod, :AutoPolyesterForwardDiff))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diffractor, Mooncake, Tracker and ChainRules probably don't work in second order.
FiniteDiff and FiniteDifferences might give you inaccurate results depending on their configuration (JuliaDiff/DifferentiationInterface.jl#78)

end
end

for (ADHessian, fbackend) in ((:EnzymeADHessian , :AutoEnzyme ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as for HVP about backends incompatible with second order

Comment on lines +46 to +59
ForwardDiff_backend = Dict(
:gradient_backend => ForwardDiffADGradient,
:jprod_backend => ForwardDiffADJprod,
:jtprod_backend => ForwardDiffADJtprod,
:hprod_backend => ForwardDiffADHvprod,
:jacobian_backend => ForwardDiffADJacobian,
:hessian_backend => ForwardDiffADHessian,
:ghjvprod_backend => EmptyADbackend,
:jprod_residual_backend => ForwardDiffADJprod,
:jtprod_residual_backend => ForwardDiffADJtprod,
:hprod_residual_backend => ForwardDiffADHvprod,
:jacobian_residual_backend => ForwardDiffADJacobian,
:hessian_residual_backend => ForwardDiffADHessian
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of DifferentiationInterface is to save a lot people a lot code. However, this PR ends up adding more lines than it removes, precisely because of this kind of disjunction.
@VaibhavDixit2 how did you handle the choice of backend for each operator in OptimizationBase.jl?

@@ -0,0 +1,12 @@
using OptimizationProblems

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using NLPModels

@dpo
Copy link
Member

dpo commented Sep 26, 2024

@gdalle I invited you. Thank you for your work here!!!

@gdalle
Copy link
Collaborator

gdalle commented Oct 1, 2024

@amontoison what do you think about moving away from symbols here?

@amontoison
Copy link
Member Author

amontoison commented Oct 1, 2024

@amontoison what do you think about moving away from symbols here?

It depends on the alternatives, Right now, it's useful to specify that we want optimized backends with :optimized or only matrix-free backends :matrix_free (no Jacobian or Hessian).
But if Enzyme.jl is stable enough, we could drop :optimized and use a boolean for matrix-free backends.

It will be easier to provide an AutoBackend() with the appropriate options.

@gdalle
Copy link
Collaborator

gdalle commented Oct 1, 2024

If I'm not mistaken there are two levels here:

  • the interface you present to the user (:optimized, :matrix_free)
  • the way you represent the backends internally

Right now you base all of the internal representations on Symbols. But as explained here, the whole reason for ADTypes was to move beyond Symbols towards full-fledged types that are 1) more expressive and 2) dispatchable. That's why I was suggesting a similar move here. It doesn't stop you from offering :optimized autodiff options in the front end if you like

@amontoison
Copy link
Member Author

Do you have an example of what you suggest?

@gdalle
Copy link
Collaborator

gdalle commented Oct 1, 2024

I could try to show you in an alternative PR

@gdalle
Copy link
Collaborator

gdalle commented Oct 1, 2024

Okay it is a bit hard to submit a PR since there would be a lot of things to rewrite and I don't understand what each part does. But essentially I was imagining something like this:

using ADTypes
using DifferentiationInterface
using LinearAlgebra
using SparseMatrixColorings
using SparseConnectivityTracer
import ForwardDiff, ReverseDiff

function DefaultAutoSparse(backend::AbstractADType)
    return AutoSparse(
        backend;
        sparsity_detector=TracerSparsityDetector(),
        coloring_algorithm=GreedyColoringAlgorithm(),
    )
end

struct ADModelBackend
    gradient_backend
    hprod_backend
    jprod_backend
    jtprod_backend
    jacobian_backend
    hessian_backend
end

struct ADModelBackendPrep
    gradient_prep
    hprod_prep
    jprod_prep
    jtprod_prep
    jacobian_prep
    hessian_prep
end

function ADModelBackend(forward_backend::AbstractADType, reverse_backend::AbstractADType)
    @assert ADTypes.mode(forward_backend) isa
        Union{ADTypes.ForwardMode,ADTypes.ForwardOrReverseMode}
    @assert ADTypes.mode(reverse_backend) isa
        Union{ADTypes.ReverseMode,ADTypes.ForwardOrReverseMode}

    gradient_backend = reverse_backend
    hprod_backend = SecondOrder(forward_backend, reverse_backend)
    jprod_backend = forward_backend
    jtprod_backend = reverse_backend
    jacobian_backend = DefaultAutoSparse(forward_backend)  # or a size-dependent heuristic
    hessian_backend = DefaultAutoSparse(SecondOrder(forward_backend, reverse_backend))

    return ADModelBackend(
        gradient_backend,
        hprod_backend,
        jprod_backend,
        jtprod_backend,
        jacobian_backend,
        hessian_backend,
    )
end

function ADModelBackendPrep(
    admodel_backend::ADModelBackend,
    obj::Function,
    cons::Function,
    lag::Function,
    x::AbstractVector,
)
    (;
        gradient_backend,
        hprod_backend,
        jprod_backend,
        jtprod_backend,
        jacobian_backend,
        hessian_backend,
    ) = admodel_backend

    c = cons(x)
    λ = similar(c)

    dx = similar(x)
    dc = similar(c)

    gradient_prep = prepare_gradient(lag, gradient_backend, x, Constant(λ))
    hprod_prep = prepare_hvp(lag, hprod_backend, x, (dx,), Constant(λ))
    jprod_prep = prepare_pushforward(cons, jprod_backend, x, (dx,))
    jtprod_prep = prepare_pullback(cons, jtprod_backend, x, (dc,))
    jacobian_prep = prepare_jacobian(cons, jacobian_backend, x)
    hessian_prep = prepare_hessian(lag, hessian_backend, x, Constant(λ))

    return ADModelBackendPrep(
        gradient_prep, hprod_prep, jprod_prep, jtprod_prep, jacobian_prep, hessian_prep
    )
end

admodel_backend = ADModelBackend(AutoForwardDiff(), AutoReverseDiff())

obj(x) = sum(x)
cons(x) = abs.(x)
lag(x, λ) = obj(x) + dot(λ, cons(x))

admodel_backend_prep = ADModelBackendPrep(admodel_backend, obj, cons, lag, rand(3));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants