Introduce `default_stepsize(M, O)` #180

mateuszbaran · 2022-12-06T09:32:11Z

The following code demonstrates how conjugate gradient descent fails on Rosenbrock function using default parameters:

using Manopt
using Manifolds

f_manopt(::Euclidean, x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

function g_manopt!(::Euclidean, storage, x)
    storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
    storage[2] = 200.0 * (x[2] - x[1]^2)
    return storage
end

M = Euclidean(2)
cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), return_options=true)

Could we have default parameters that make it work?

kellertuer · 2022-12-06T09:43:59Z

If you find default parameters that do work – sure; I am not 100% sure where the current default is from, I think those are the ones from Manopt/Matlab.

In general setting good defaults is a really complicated thing to do – and remember that Rosenbrock is a really mean example. But - hehe - this one really explodes fast

julia> cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), debug=[:Iteration, :Iterate, :Cost, :Stop,"\n"])
Initial F(x): 1.000000
# 1     x: [2.0, 0.0]F(x): 1601.000000
# 2     x: [-3200.0, 800.0]F(x): 10484121674246400.000000
# 3     x: [2.6212863989790594e13, -3.2725775149997744e12]F(x): 47212597681438596372495799766368366330997806787448537088.000000
# 4     x: [-1.4408985664390341e43, 8.994538430325952e41]F(x): 4310559429836373893762014461052062964684649679539072155822222971503375833074684002488557111981322457934557086002722283200061060668117239830893990062402996910678581744400072704.000000
# 5     x: [2.393261832712779e132, -7.46974354390761e130]F(x): Inf
# 6     x: [NaN, NaN]F(x): NaN

kellertuer · 2022-12-06T09:56:56Z

Ah, I see that is the same problem as with gradient descent. We have a constant stepsize as default for both

For example with the default. Armijo

julia> cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3789170183626525, 0.1455948665234621]F(x): 0.386151
# 200   x: [0.4604486280239601, 0.21308291361774606]F(x): 0.291230
# 300   x: [0.5157252480309722, 0.26664128363827033]F(x): 0.234567
# 400   x: [0.5595346370785345, 0.3136420428474512]F(x): 0.194041
# 500   x: [0.5940718838053803, 0.3531472767461153]F(x): 0.164783
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.5940718838053803
 0.3531472767461153

it already looks better. The question here is – whom do we want to annoy with our default?

people on complicated manifolds, where we have no default retraction
people on R^n with the constant stepsize

I think we could switch to the first case, since we now have the default_retraction_method(M) – the reason to switch to that was, that the old default was ExponentialRetraction within Armijo (and that did not have a manifold in its arguments.

kellertuer · 2022-12-06T10:08:31Z

...and of course you can tweak Armijo as well:

x2 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.99),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3861691582528581, 0.15159518234304356]F(x): 0.377398
# 200   x: [0.47064890494573247, 0.2227435429442573]F(x): 0.280365
# 300   x: [0.5267486471817312, 0.2782161517108435]F(x): 0.224023
# 400   x: [0.568652360020186, 0.32391203359214754]F(x): 0.186091
# 500   x: [0.602145033676647, 0.3629237886067693]F(x): 0.158300
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.602145033676647
 0.3629237886067693

x3 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.999, sufficient_decrease=0.00001),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3116617176333236, 0.16292536758228865]F(x): 0.906673
# 200   x: [0.36268757688393066, 0.19533509991927367]F(x): 0.813120
# 300   x: [0.40246487607807646, 0.22263562719679558]F(x): 0.724983
# 400   x: [0.43777261003280743, 0.2491305562675438]F(x): 0.646560
# 500   x: [0.47052893901998494, 0.27567440680272826]F(x): 0.574938
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.47052893901998494
 0.27567440680272826

f_manopt(M, x2) # is 0.15830048687506507
f_manopt(M, x3) # is 0.1456151297663319

or even switch to a different CG coefficient

x4 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.99, sufficient_decrease=0.00001),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100], coefficient=FletcherReevesCoefficient())
Initial F(x): 1.000000
# 100   x: [0.5104355347188466, 0.2533412185265233]F(x): 0.244862
# 200   x: [1.3625537059095383, 1.8580866725590546]F(x): 0.131681
# 300   x: [0.707132629054107, 0.49838856349984095]F(x): 0.086043
# 400   x: [1.1917317530345954, 1.401468104467437]F(x): 0.071942
# 500   x: [0.7745036515003424, 0.6057082658975672]F(x): 0.054274
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.7745036515003424
 0.6057082658975672

yields

julia> f_manopt(M,x4)
0.0542736146044402

So it is again often the question: Provide good defaults that work “okayish” – or optimise for a specific problem

mateuszbaran · 2022-12-06T10:12:14Z

I think it's safe to assume that people who would like gradient-based optimization on a manifold do it on a manifold with a reasonable default_retraction_method(M). I wouldn't over-optimize for Rosenbrock but it would be nice to have defaults that make it work sort of OK -- maybe even at the cost of special-casing defaults for R^n. Maybe such defaults would also make sense for other nonpositively-curved manifolds?

kellertuer · 2022-12-06T10:14:50Z

Sure we can switch to Armijo in Manopt 0.4.

The safest way of course could be to have an “empty” default if none is defined (but that is a change to ManifoldsBase) and choose a constant stepsize then. But this means that when defining exp one has to set the default themselves.

kellertuer · 2022-12-06T10:15:23Z

Currently I have no way of checking the curvature before setting the default, so If I would switch for all manifolds to Armijo.

mateuszbaran · 2022-12-06T10:19:46Z

Could we have a get_default_linesearch(::Manifold, ::Type{<:Options}) function in Manopt 0.4?

kellertuer · 2022-12-06T10:21:07Z

That sounds like a good idea :)

White some things to keep in mind for 0.4, but sure, we can do a few PRs on main before releasing that one. Let me finish the large rework for costgrad first.

edit: Maybe default_linesearch fits better, since the other defaults also do not have a get upfront – and it is a little difficult to define these defaults, since that would have to be done only if manifolds is loaded (and I do not have good defaults for specific manifolds for now).

kellertuer · 2022-12-06T10:31:29Z

Ah an I would maybe call that default_stepsize since there are quite a few rules that do not search at all ;)

kellertuer · 2023-01-02T13:30:58Z

I noticed two problems with this idea.

If so we would only defined default_stepsize(::AbstractManifold, ::MyConcreteSolverState) within Manopt.jl; so later on for specific manifolds one can only overwrite this for a spefific manifold together with a specific solver (or at least a class of manifolds with a specific solver), which I am not sure is that useful.
More problematic is how to use this. My idea was to replace the stepsize=... now with stepsize=default_stepsize(M,...)- but at that point the state does not yet exist it is only generated later in the constructor, so we can not specify the second argument at the point where I wanted to use this.

What should we do about these two points to make is usable and easy to implement?

mateuszbaran · 2023-01-02T15:20:15Z

Hm, I think this shouldn't dispatch on solver state but on solver type. So we may have something like default_stepsize(::AbstractManifold, ::typeof(quasi_Newton)). Or we can specify solver type using a new type hierarchy which may be a bit nicer but more complicated to implement. Note that Opimization.jl integration also needs a solver type thing: https://github.com/SciML/Optimization.jl/pull/437/files#diff-91c386d7b5fa9310b5c87114b777bc81e69479305f504a2f361228fd1b6fd8af but it needs to collect more information (e.g. stepsize) so it must be separate.

kellertuer · 2023-01-02T15:33:09Z

Good that I have a hierarchy established for the (former options now) SolverStates, which identify the solver nearly-uniquely.
With nearly I mean that the solver functions do depend on Problem and State, but in practice State determines the solver. So we could use that also as the type hierarchy for Optimization.jl

I think the type-idea you posted is not 100% correct, since quasi_Newton is the high level interface function and I would prefer to use the AbstractSolverState type hierarchy at the second argument, but in reality that is also just a minor difference.

mateuszbaran · 2023-01-02T15:40:23Z

So, your idea is to use something like default_stepsize(::AbstractManifold, ::Type{T}) where T<:AbstractSolverState? That would also work I think.

kellertuer · 2023-01-02T15:43:32Z

I think that looks also nicer than using the high-level functions like quasi_Newton or DouglasRachfords types - and we actually can identify the solver based on its State type (as I said that was originally not necessarily intended, since we could have different solvers based on the problem as well). So the I will go with that. Since it is getting late (or dark here at least) I might also just start with that today still (it is a little dull work to introduce that at every solver).

mateuszbaran · 2023-01-02T15:47:26Z

Cool, thanks 🙂 .

kellertuer added this to the 0.4 milestone Dec 6, 2022

kellertuer changed the title ~~Conjugate gradient descent fails on Rosenbrock~~ Introduce default_linesearch(M, O) Dec 6, 2022

mateuszbaran changed the title ~~Introduce default_linesearch(M, O)~~ Introduce default_stepsize(M, O) Dec 6, 2022

kellertuer mentioned this issue Dec 26, 2022

Rework Problem structure to have a Objective for reusing interim computations and a Cache #174

Merged

36 tasks

kellertuer closed this as completed in #174 Jan 10, 2023

kellertuer closed this as completed in 9a8d127 Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `default_stepsize(M, O)` #180

Introduce `default_stepsize(M, O)` #180

mateuszbaran commented Dec 6, 2022

kellertuer commented Dec 6, 2022

kellertuer commented Dec 6, 2022

kellertuer commented Dec 6, 2022

mateuszbaran commented Dec 6, 2022

kellertuer commented Dec 6, 2022

kellertuer commented Dec 6, 2022

mateuszbaran commented Dec 6, 2022

kellertuer commented Dec 6, 2022 •

edited

Loading

kellertuer commented Dec 6, 2022

kellertuer commented Jan 2, 2023 •

edited

Loading

mateuszbaran commented Jan 2, 2023

kellertuer commented Jan 2, 2023

mateuszbaran commented Jan 2, 2023

kellertuer commented Jan 2, 2023

mateuszbaran commented Jan 2, 2023

Introduce default_stepsize(M, O) #180

Introduce default_stepsize(M, O) #180

Comments

mateuszbaran commented Dec 6, 2022

kellertuer commented Dec 6, 2022

kellertuer commented Dec 6, 2022

kellertuer commented Dec 6, 2022

mateuszbaran commented Dec 6, 2022

kellertuer commented Dec 6, 2022

kellertuer commented Dec 6, 2022

mateuszbaran commented Dec 6, 2022

kellertuer commented Dec 6, 2022 • edited Loading

kellertuer commented Dec 6, 2022

kellertuer commented Jan 2, 2023 • edited Loading

mateuszbaran commented Jan 2, 2023

kellertuer commented Jan 2, 2023

mateuszbaran commented Jan 2, 2023

kellertuer commented Jan 2, 2023

mateuszbaran commented Jan 2, 2023

Introduce `default_stepsize(M, O)` #180

Introduce `default_stepsize(M, O)` #180

kellertuer commented Dec 6, 2022 •

edited

Loading

kellertuer commented Jan 2, 2023 •

edited

Loading