-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second order for vector to vector functions #206
Comments
xref JuliaDiff/ForwardDiff.jl#61 (and quite possibly others, but I haven't searched) |
As far as naming goes, perhaps this could still be called Hessian. It's a little inconsistent that we give the gradient of a vector-valued function a special name, "Jacobian," but that's basically a historical artifact. The right way to think about it is that the jacobian is not a matrix: it's a 2-tensor that is both contravariant (for the "vector-valued" portion) and covariant (for the gradient portion). There's really no confusion between these, except for the fact that in Julia we handle this by adding an extra dimension and we don't use any encoding to specify which "dimension" is covariant and which is contravariant. The second order derivative of a vector-valued function is, of course, a rank-3 tensor with one contravariant and two covariant "dimensions." Again there doesn't have to be any actual ambiguity about these things, if we encoded the meaning of the dimensions. Heck, one could even define Hessian-(contravariant)vector products just fine with no ambiguity, because there are only two candidates for that (the two covariant "dimensions") and they are symmetric. |
To add to what you are saying, Jacobians are the matrix representation of two linear maps: pushforwards and (transposed) pullbacks, which are at the heart of forward- and reverse-mode AD (and DI). I'm guessing the popularity of Jacobians comes from frameworks like JAX calling their pushforwards Jacobian-vector products and pullbacks vector-Jacobian products, even though they are implemented as matrix-free linear maps. To some degree, this is just a question of semantics. |
@gdalle, are you now satisfied that this isn't "weird"? I could resubmit JuliaDiff/AbstractDifferentiation.jl#134 here. |
I understand the concept, but I've been thinking more about your specific application to optimization with a Lagrangian: I'm still unsure which option is more wasteful:
Perhaps @amontoison would have a clue, we've been talking about this too. |
Either way, if you want the vector Hessian, you can get it very easily as follows: using DifferentiationInterface
import ForwardDiff
function value_jacobian_and_vector_hessian(f, backend, x)
y = f(x)
J(x) = jacobian(f, backend, x)
J, H = value_and_jacobian(J, backend, x)
return y, J, reshape(H, eachindex(y), eachindex(x), eachindex(x))
end I'm reluctant to add it to DifferentiationInterface because:
Since you've already been bitten by the last item in #252, you can look at the other issues discussing it: |
This would be viable, but you probably wouldn't want to throw it away: Newton's method for constrained optimization consists of using a first order expansion of That said, there are cases where you might still want the individual Hessians (your second proposal). For unconstrained optimization, trust-region methods are very popular, and they essentially ask "was the quadratic expansion fairly accurate?" You could imagine asking that about each individual constraint as well, in which case you might want to have them disentangled from Anyway, it's unfortunate that 2nd order is so hard. Presumably Julia is not the only ecosystem grappling with this? |
As of today's main branch, DI allows you to keep some of the arguments constant in a differentiation operator. You can thus prepare a Lagrangian Hessian with some multiplicator If this is satisfactory for constrained optimization, I don't think I'm gonna implement vector-Hessians in the near future, so I'll close this for now. using DifferentiationInterface, LinearAlgebra
using Enzyme: Enzyme
obj(x) = sum(abs2, x)
cons(x) = diff(x) .^ 3
lag(x, λ) = obj(x) + dot(λ, cons(x))
x, λ = float.(1:4), float.(1:3);
prep = prepare_hessian(lag, AutoEnzyme(), x, Constant(rand(size(λ)...)));
hessian(lag, prep, AutoEnzyme(), x, Constant(λ)) |
Sounds very exciting, looking forward to trying it! |
Very exciting, thanks for the great work @gdalle! |
At the moment I don't know if there is wide user interest, but feel free to contribute here.
See initial discussion with @timholy in JuliaDiff/AbstractDifferentiation.jl#134
The text was updated successfully, but these errors were encountered: