Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce default_stepsize(M, O) #180

Closed
mateuszbaran opened this issue Dec 6, 2022 · 15 comments · Fixed by #174
Closed

Introduce default_stepsize(M, O) #180

mateuszbaran opened this issue Dec 6, 2022 · 15 comments · Fixed by #174
Milestone

Comments

@mateuszbaran
Copy link
Member

The following code demonstrates how conjugate gradient descent fails on Rosenbrock function using default parameters:

using Manopt
using Manifolds

f_manopt(::Euclidean, x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

function g_manopt!(::Euclidean, storage, x)
    storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
    storage[2] = 200.0 * (x[2] - x[1]^2)
    return storage
end

M = Euclidean(2)
cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), return_options=true)

Could we have default parameters that make it work?

@kellertuer
Copy link
Member

If you find default parameters that do work – sure; I am not 100% sure where the current default is from, I think those are the ones from Manopt/Matlab.

In general setting good defaults is a really complicated thing to do – and remember that Rosenbrock is a really mean example. But - hehe - this one really explodes fast

julia> cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), debug=[:Iteration, :Iterate, :Cost, :Stop,"\n"])
Initial F(x): 1.000000
# 1     x: [2.0, 0.0]F(x): 1601.000000
# 2     x: [-3200.0, 800.0]F(x): 10484121674246400.000000
# 3     x: [2.6212863989790594e13, -3.2725775149997744e12]F(x): 47212597681438596372495799766368366330997806787448537088.000000
# 4     x: [-1.4408985664390341e43, 8.994538430325952e41]F(x): 4310559429836373893762014461052062964684649679539072155822222971503375833074684002488557111981322457934557086002722283200061060668117239830893990062402996910678581744400072704.000000
# 5     x: [2.393261832712779e132, -7.46974354390761e130]F(x): Inf
# 6     x: [NaN, NaN]F(x): NaN

@kellertuer
Copy link
Member

Ah, I see that is the same problem as with gradient descent. We have a constant stepsize as default for both

For example with the default. Armijo

julia> cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3789170183626525, 0.1455948665234621]F(x): 0.386151
# 200   x: [0.4604486280239601, 0.21308291361774606]F(x): 0.291230
# 300   x: [0.5157252480309722, 0.26664128363827033]F(x): 0.234567
# 400   x: [0.5595346370785345, 0.3136420428474512]F(x): 0.194041
# 500   x: [0.5940718838053803, 0.3531472767461153]F(x): 0.164783
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.5940718838053803
 0.3531472767461153

it already looks better. The question here is – whom do we want to annoy with our default?

  • people on complicated manifolds, where we have no default retraction
  • people on R^n with the constant stepsize

I think we could switch to the first case, since we now have the default_retraction_method(M) – the reason to switch to that was, that the old default was ExponentialRetraction within Armijo (and that did not have a manifold in its arguments.

@kellertuer
Copy link
Member

...and of course you can tweak Armijo as well:

x2 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.99),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3861691582528581, 0.15159518234304356]F(x): 0.377398
# 200   x: [0.47064890494573247, 0.2227435429442573]F(x): 0.280365
# 300   x: [0.5267486471817312, 0.2782161517108435]F(x): 0.224023
# 400   x: [0.568652360020186, 0.32391203359214754]F(x): 0.186091
# 500   x: [0.602145033676647, 0.3629237886067693]F(x): 0.158300
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.602145033676647
 0.3629237886067693

x3 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.999, sufficient_decrease=0.00001),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3116617176333236, 0.16292536758228865]F(x): 0.906673
# 200   x: [0.36268757688393066, 0.19533509991927367]F(x): 0.813120
# 300   x: [0.40246487607807646, 0.22263562719679558]F(x): 0.724983
# 400   x: [0.43777261003280743, 0.2491305562675438]F(x): 0.646560
# 500   x: [0.47052893901998494, 0.27567440680272826]F(x): 0.574938
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.47052893901998494
 0.27567440680272826

f_manopt(M, x2) # is 0.15830048687506507
f_manopt(M, x3) # is 0.1456151297663319

or even switch to a different CG coefficient

x4 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.99, sufficient_decrease=0.00001),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100], coefficient=FletcherReevesCoefficient())
Initial F(x): 1.000000
# 100   x: [0.5104355347188466, 0.2533412185265233]F(x): 0.244862
# 200   x: [1.3625537059095383, 1.8580866725590546]F(x): 0.131681
# 300   x: [0.707132629054107, 0.49838856349984095]F(x): 0.086043
# 400   x: [1.1917317530345954, 1.401468104467437]F(x): 0.071942
# 500   x: [0.7745036515003424, 0.6057082658975672]F(x): 0.054274
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.7745036515003424
 0.6057082658975672

yields

julia> f_manopt(M,x4)
0.0542736146044402

So it is again often the question: Provide good defaults that work “okayish” – or optimise for a specific problem

@mateuszbaran
Copy link
Member Author

I think it's safe to assume that people who would like gradient-based optimization on a manifold do it on a manifold with a reasonable default_retraction_method(M). I wouldn't over-optimize for Rosenbrock but it would be nice to have defaults that make it work sort of OK -- maybe even at the cost of special-casing defaults for R^n. Maybe such defaults would also make sense for other nonpositively-curved manifolds?

@kellertuer
Copy link
Member

Sure we can switch to Armijo in Manopt 0.4.

The safest way of course could be to have an “empty” default if none is defined (but that is a change to ManifoldsBase) and choose a constant stepsize then. But this means that when defining exp one has to set the default themselves.

@kellertuer
Copy link
Member

Currently I have no way of checking the curvature before setting the default, so If I would switch for all manifolds to Armijo.

@mateuszbaran
Copy link
Member Author

Could we have a get_default_linesearch(::Manifold, ::Type{<:Options}) function in Manopt 0.4?

@kellertuer
Copy link
Member

kellertuer commented Dec 6, 2022

That sounds like a good idea :)

White some things to keep in mind for 0.4, but sure, we can do a few PRs on main before releasing that one. Let me finish the large rework for costgrad first.

edit: Maybe default_linesearch fits better, since the other defaults also do not have a get upfront – and it is a little difficult to define these defaults, since that would have to be done only if manifolds is loaded (and I do not have good defaults for specific manifolds for now).

@kellertuer kellertuer added this to the 0.4 milestone Dec 6, 2022
@kellertuer kellertuer changed the title Conjugate gradient descent fails on Rosenbrock Introduce default_linesearch(M, O) Dec 6, 2022
@kellertuer
Copy link
Member

Ah an I would maybe call that default_stepsize since there are quite a few rules that do not search at all ;)

@mateuszbaran mateuszbaran changed the title Introduce default_linesearch(M, O) Introduce default_stepsize(M, O) Dec 6, 2022
@kellertuer
Copy link
Member

kellertuer commented Jan 2, 2023

I noticed two problems with this idea.

  1. If so we would only defined default_stepsize(::AbstractManifold, ::MyConcreteSolverState) within Manopt.jl; so later on for specific manifolds one can only overwrite this for a spefific manifold together with a specific solver (or at least a class of manifolds with a specific solver), which I am not sure is that useful.

  2. More problematic is how to use this. My idea was to replace the stepsize=... now with stepsize=default_stepsize(M,...)- but at that point the state does not yet exist it is only generated later in the constructor, so we can not specify the second argument at the point where I wanted to use this.

What should we do about these two points to make is usable and easy to implement?

@mateuszbaran
Copy link
Member Author

Hm, I think this shouldn't dispatch on solver state but on solver type. So we may have something like default_stepsize(::AbstractManifold, ::typeof(quasi_Newton)). Or we can specify solver type using a new type hierarchy which may be a bit nicer but more complicated to implement. Note that Opimization.jl integration also needs a solver type thing: https://github.com/SciML/Optimization.jl/pull/437/files#diff-91c386d7b5fa9310b5c87114b777bc81e69479305f504a2f361228fd1b6fd8af but it needs to collect more information (e.g. stepsize) so it must be separate.

@kellertuer
Copy link
Member

Good that I have a hierarchy established for the (former options now) SolverStates, which identify the solver nearly-uniquely.
With nearly I mean that the solver functions do depend on Problem and State, but in practice State determines the solver. So we could use that also as the type hierarchy for Optimization.jl

I think the type-idea you posted is not 100% correct, since quasi_Newton is the high level interface function and I would prefer to use the AbstractSolverState type hierarchy at the second argument, but in reality that is also just a minor difference.

@mateuszbaran
Copy link
Member Author

So, your idea is to use something like default_stepsize(::AbstractManifold, ::Type{T}) where T<:AbstractSolverState? That would also work I think.

@kellertuer
Copy link
Member

I think that looks also nicer than using the high-level functions like quasi_Newton or DouglasRachfords types - and we actually can identify the solver based on its State type (as I said that was originally not necessarily intended, since we could have different solvers based on the problem as well). So the I will go with that. Since it is getting late (or dark here at least) I might also just start with that today still (it is a little dull work to introduce that at every solver).

@mateuszbaran
Copy link
Member Author

Cool, thanks 🙂 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants