Advise on optimization over simplex #206

juliohm · 2022-12-07T17:28:08Z

juliohm
Dec 7, 2022

Suppose I have a n-by-p matrix Q such that each column lives in the n-simplex (i.e. sum(Q[:,j]) == 1 and Q[i,j] >= 0). Additionally, suppose that I have a n-by-k matrix C such that each column lives in the n-simplex. I would like to find a matrix M such that Q ≈ C*M and such that each column of M lives in the k-simplex.

Do you have any good literature on this optimization problem over the n-dimensional simplex? It resembles the traditional least-squares problem, but on manifolds. What are the advantages of using Manopt.jl versus explicitly adding simplex constraints to a JuMP.jl model for instance? Appreciate any advise.

kellertuer · 2022-12-07T19:46:05Z

kellertuer
Dec 7, 2022
Maintainer

So to continue your idea, you would like to minimize on the Power manifold M^k (in Array Representation of the https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/probabilitysimplex.html (as M), see https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/power.html this is exactly a matrix and solve

$$\operatorname*{arg\,min}_{M\in\mathcal M^k} \lVert Q - CM\rVert^2$$

(the square just makes all this smooth and nice.

Here are some differences

the gradient is different, since it is the Riesz Representer of the differential and a tangent vector – on $\mathbb R^{k\times k}$ you just have the classical gradient. But we have AD tools and computations to help with that
the problem on the manifold M^k is unconstraint, which is cool, you can directly used gradient descent or even quasi newton (there is a Riemannian variant of L-BFGS https://manoptjl.org/stable/solvers/quasi_Newton/ while in JuMP you will have to specify the constraints and do Augmented Lagrangian or Exact Penalty or something like that.
Another major difference is that each step in the solver (imagine a gradient descent step for example) we “walk on the manifold”, that especially means we can not use plus but have to use retractions.

The last point is maybe a little complicated to see, so let's use another manifold as an example: The 2-sphere. Imaging you are at the North Pole and you found the gradient, that is the direction to walk into to minimize your function (and you know how long the step is) – this is a vector in the x,y plane (that is tangent to North Pole).
Now to stay on the sphere you have to follow some path into that direction (that would be a retraction) or for best of cases the great arc (ah, I remember you are a geoscience - guy – we follow shortest paths!). Great arcs are geodesics – and the exponential map (a specific retraction) does that, the great arc that starts in direction X from North Pole.

Note that N+X would leave the sphere, we can not do that but $\frac{N+X}{\lVert N+X\rVert}$ would be the right direction (but not the exponential map) – so that for example is a retraction. They are “not exactly + but often cheaper than the proper generalisation, the exponential map”.

This is of course more costly, and as a summary maybe the main tradeoff:
a. using JuMP, constraints and + (a constraint problem with all its disadvantages, but wow, + is fast)
b. using Manopt.jl, an unconstraint problem and exp/retractions (woohow! quasi Newton Here we go! but a disadvantage is a retraction/exp to use).

Note that on some manifolds, exp might be hard (and not even available in closed form); luckily on the simplex it is. And we have the “change_representer” available – which means:
If you take the cost above, compute the classical (Euclidean / matrix-valued) gradient – we can easily (without much overhead) compute your Riemannian gradient – so that is not much of a problem (see https://manoptjl.org/stable/tutorials/AutomaticDifferentiation/#.-Conversion-of-a-Euclidean-Gradient-in-the-Embedding-to-a-Riemannian-Gradient-of-a-(not-Necessarily-Isometrically)-Embedded-Manifold for the theory).

so very TLDR: It depends, both approaches have their advantages and it depends heavily on the problem, which one to use.
I am of course biased and would try the Riemannian approach first – but it might be slower in the end due to exp (though here it might also be fast enough, exp looks reasonable on Simplex). Quasi Newton itself might converge faster for the unconstraint problem.

Hope that helps and let me know if you want to try that (and run into problems).

0 replies

juliohm · 2022-12-07T20:03:23Z

juliohm
Dec 7, 2022
Author

Thank you @kellertuer , that is very helpful. Do you have a simple example that you could quickly write with Manopt.jl that I could copy and paste to compare with the JuMP.jl implementation I already have? My problem is a bit large, but I am curious to see if the results obtained with the manifold approach differ in an interesting way.

0 replies

kellertuer · 2022-12-07T20:15:56Z

kellertuer
Dec 7, 2022
Maintainer

Hm, I am not sure how to help much more than pointing to

https://manoptjl.org/stable/tutorials/Optimize!/

which explains how to get started with a gradient descent (and of course, see quasi newton above, the signature of that one is very similar). For gradient descent, make sure to set the stepwise (the default one is not so optimal at the moment but will be changed with the next breaking release) – for best to stepsize=ArmijoLineSearch(M) should work well (if M if your manifold).
Quasi Newton should have reasonable defaults I hope.

0 replies

juliohm · 2022-12-08T14:18:21Z

juliohm
Dec 8, 2022
Author

For gradient descent, make sure to set the stepwise (the default one is not so optimal at the moment but will be changed with the next breaking release) – for best to stepsize=ArmijoLineSearch(M) should work well (if M if your manifold).

I saw the example, but it is always tricky to get things working properly as a non-maintainer. If you could kindly write a MWE I could quickly adapt and try it with the data I have :) I'd be happy to report any differences with the JuMP code.

0 replies

kellertuer · 2022-12-08T14:21:18Z

kellertuer
Dec 8, 2022
Maintainer

Do you mean something like

gradient_descent(M, f, grad_f, x0; stepsize=ArmijoLinesearch(M))

?

Tat is not an MWE because I do not know your M, f and grad_f, but I am not sure what to further provide.

0 replies

juliohm · 2022-12-08T14:28:42Z

juliohm
Dec 8, 2022
Author

This is a simple random data set that has the properties I mentioned:

Q = rand(20,100)
Q = Q ./ sum(Q, dims=1)

C = rand(20, 10)
C = C ./ sum(C, dims=1)

Following your minimization formulation:

Each column of M should live in the 10-simplex.

0 replies

kellertuer · 2022-12-08T14:32:15Z

kellertuer
Dec 8, 2022
Maintainer

Well, then I can write down your manifold and your cost f but I would also need the gradient for an actual MWE (I do not have time to derive the Euclidean gradient by hand, we are approaching the end of the semester here).

0 replies

juliohm · 2022-12-08T14:38:07Z

juliohm
Dec 8, 2022
Author

Oh I thought that the gradient would be something that the Manifolds.jl framework would compute auto-magically given that the manifold is known. That is it is always a good idea to ask the experts first before committing to an implementation :)

0 replies

kellertuer · 2022-12-08T14:45:46Z

kellertuer
Dec 8, 2022
Maintainer

We do a lot of things in code – magic is not part of that ;)

If you have a Euclidean (classical function) and have some AD tools for the Euclidean gradient that would be fine as well. But at least the (Euclidean) gradient is what I would need.

Then – indeed – Manifolds.jl can compute the Riemannian one from that.

0 replies

juliohm · 2022-12-08T15:24:58Z

juliohm
Dec 8, 2022
Author

When you say Euclidean gradient you mean the usual gradient in R^n? I understood that the manifold information (i.e. simplex) would be enough to take a code that is "Euclidean", i.e. with vectors x in R^n and make it "Manifold" by projecting the Euclidean gradient onto the manifold, etc. I am probably missing something.

0 replies

kellertuer · 2022-12-08T15:29:58Z

kellertuer
Dec 8, 2022
Maintainer

Yes, if you give me the R^n gradient (the usual one you know from math classes) I can (magically!) compute the Riemannian one.

I can not do that with any vectors or code and - no - that is not just projecting – and not onto the manifold. It is projection onto the corresponding tangent space and changing the Riesz representer. But both these actions/functions can luckily be computed in your case. Maybe one further issue is, that it really requires some knowledge about the theory (and I sadly can not give you a whole 2-lectures-a-week-for-15weeks introduction here in this issue).

0 replies

kellertuer · 2022-12-08T15:49:23Z

kellertuer
Dec 8, 2022
Maintainer

So this is how far I get with your current information:

using LinearAlgebra, Manopt, Manifolds
n = 20
p = 100
k = 10

Q = rand(n,p)
Q = Q ./ sum(Q, dims=1)

C = rand(n, k)
C = C ./ sum(C, dims=1)

# Since Q=CM, we need M of sice kp, that is p columns? but then the simplex is of dimension k-1 (in R^k), namely:
manifold = MultinomialMatrices(k,p)

# The cost requires the format: (Manifold, point) -> value - note that the manifold is also usually called M in Manopt/Manifolds:
f(M,p) = norm(Q-C*p)^2
# Gradient? grad_f(M,p) = ...? also ok to give me the euclidean gradient first...
# Initial matrix M? M0 = ...
# We could just take
M0 = 1/k * ones(k,p)
# And note that we can also check that is a point on the manifold (the third parameter - true - just throws an error with more details if this fails
is_point(manifold, M0, true) # returns true, that is M0 is a point on the manifold you are on

# then we could call
gradient_descent(manifold, f, grad_f, M0; stepsize=ArmijoLinesearch(M))

0 replies

juliohm · 2022-12-08T17:10:44Z

juliohm
Dec 8, 2022
Author

Maybe one further issue is, that it really requires some knowledge about the theory (and I sadly can not give you a whole 2-lectures-a-week-for-15weeks introduction here in this issue).

Yes, I fully agree. I am still trying to find the time to dive into this literature, couldn't find it since our last video call last year. 😢 Having a simple snippet like the one you shared above at least helps getting a sense of performance in this specific application.

0 replies

kellertuer · 2022-12-08T17:24:37Z

kellertuer
Dec 8, 2022
Maintainer

Well, sometimes – time is hard to find ;) As I said, deriving in any way the Euclidean Gradient of f should be enough here. And I think it is not too hard to do, even on paper – I would just need (as well) a short moment to check.

0 replies

kellertuer · 2022-12-08T18:25:33Z

kellertuer
Dec 8, 2022
Maintainer

One can not attach Pluto notebooks, so I narrowed it down to just code again. Here is the Euclidean gradient – including a check, the magic to convert it (we might have a shorter way to write that but it is getting late) – and a gradient descent.

Quasi Newton still requires a parallel transport (or a vector transport) which I have not yet looked for whether that's easy to implement.

using LinearAlgebra, Manopt, Manifolds, Plots

n = 20
p = 100
k = 10

Q = rand(n,p)
Q = Q ./ sum(Q, dims=1)

C = rand(n, k)
C = C ./ sum(C, dims=1)
end;

# Gradient? grad_f(M,p) = ...? also ok to give me the euclidean gradient first...
# Initial matrix M? M0 = ...
# We could just take
M0 = 1/k * ones(k,p);

# Since Q=CM, we need M of sice kp, that is p columns? but then the simplex is of dimension k-1 (in R^k), namely:
manifold = MultinomialMatrices(k,p)

# And note that we can also check that is a point on the manifold (the third parameter - true - just throws an error with more details if this fails
is_point(manifold, M0, true) # returns true, that is M0 is a point on the manifold you are on

# The cost requires the format: (Manifold, point) -> value - note that the manifold is also usually called M in Manopt/Manifolds: the 1/2 is for keeping other formulae nice.
f(M,p) = 1/2 * norm(Q-C*p)^2

grad_f_E(M,x) = transpose(C)*(C*x-Q)

# Let's check this (Ronny says a lot – who knows!) - define the Euclidean manifold of matrices
E = ℝ^(k,p)

# maybe not the best (=numerically stable) but its ok
check_gradient(E, f, grad_f_E, M0, slope_tol=0.2, throw_error=true)

# ok, let's convert (magic!) - now even shorter
grad_f(M, p) = riemannian_gradient(M, p, grad_f_E(E,p))

# Here we are
gradient_descent(
	manifold, f, grad_f, M0;
	stepsize=ArmijoLinesearch(manifold),
	debug=[:Iteration, :Cost, :Stop, 25, "\n"],	
)

0 replies

mateuszbaran · 2022-12-08T18:58:00Z

mateuszbaran
Dec 8, 2022
Maintainer

Hm, interesting. It looks like a least squares problem so I've tried the new Riemannian Levenberg-Marquardt that we have now but I probably have an error in the Jacobian:

F_RLM(M,p) = vec(Q-C*p)

function jacF_RLM(
    M::AbstractManifold, p; basis_domain::AbstractBasis=DefaultOrthogonalBasis()
)
    X0 = zeros(manifold_dimension(M))
    J = ForwardDiff.jacobian(
        x -> F_RLM(M, exp(M, p, get_vector(M, p, x, basis_domain))), X0
    )

    return J
end

using ForwardDiff

o = LevenbergMarquardt(
    manifold,
    F_RLM,
    jacF_RLM,
    M0,
    length(Q);
    return_options=true,
    jacB=ProjectedOrthonormalBasis(:svd),
)

0 replies

kellertuer · 2022-12-14T12:07:09Z

kellertuer
Dec 14, 2022
Maintainer

Have you tried my code? It should work for your problem if I am not mistaken – let me know where you are stuck.

Note that I also updated the code above slightly, because we introduced a nicer function for conversion of Euclidean to Riemannian gradients.

0 replies

juliohm · 2022-12-14T13:53:47Z

juliohm
Dec 14, 2022
Author

I didn't have time yet @kellertuer but it is on my TODO list. Lots of work here, but as soon as I have time to pause I will give it a try.

0 replies

kellertuer · 2022-12-14T13:56:35Z

kellertuer
Dec 14, 2022
Maintainer

Ah, then I misread your post on Zulip, because that read for me that this thread is stuck, you tried everything and need even more input now.

0 replies

juliohm · 2022-12-14T14:00:39Z

juliohm
Dec 14, 2022
Author

I could get going with the JuMP solution. The manifold solution is something I would like to try for comparison now so that future users can understand the tradeoffs.

0 replies

kellertuer · 2023-01-19T19:03:58Z

kellertuer
Jan 19, 2023
Maintainer

I converted this to a discussion, since we are not dicussing a bug – but feel free to of course ask further when you had time to run the comparison in the future :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advise on optimization over simplex #206

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 21 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Advise on optimization over simplex #206

juliohm Dec 7, 2022

Replies: 21 comments

kellertuer Dec 7, 2022 Maintainer

juliohm Dec 7, 2022 Author

kellertuer Dec 7, 2022 Maintainer

juliohm Dec 8, 2022 Author

kellertuer Dec 8, 2022 Maintainer

juliohm Dec 8, 2022 Author

kellertuer Dec 8, 2022 Maintainer

juliohm Dec 8, 2022 Author

kellertuer Dec 8, 2022 Maintainer

juliohm Dec 8, 2022 Author

kellertuer Dec 8, 2022 Maintainer

kellertuer Dec 8, 2022 Maintainer

juliohm Dec 8, 2022 Author

kellertuer Dec 8, 2022 Maintainer

kellertuer Dec 8, 2022 Maintainer

mateuszbaran Dec 8, 2022 Maintainer

kellertuer Dec 14, 2022 Maintainer

juliohm Dec 14, 2022 Author

kellertuer Dec 14, 2022 Maintainer

juliohm Dec 14, 2022 Author

kellertuer Jan 19, 2023 Maintainer

juliohm
Dec 7, 2022

kellertuer
Dec 7, 2022
Maintainer

juliohm
Dec 7, 2022
Author

kellertuer
Dec 7, 2022
Maintainer

juliohm
Dec 8, 2022
Author

kellertuer
Dec 8, 2022
Maintainer

juliohm
Dec 8, 2022
Author

kellertuer
Dec 8, 2022
Maintainer

juliohm
Dec 8, 2022
Author

kellertuer
Dec 8, 2022
Maintainer

juliohm
Dec 8, 2022
Author

kellertuer
Dec 8, 2022
Maintainer

kellertuer
Dec 8, 2022
Maintainer

juliohm
Dec 8, 2022
Author

kellertuer
Dec 8, 2022
Maintainer

kellertuer
Dec 8, 2022
Maintainer

mateuszbaran
Dec 8, 2022
Maintainer

kellertuer
Dec 14, 2022
Maintainer

juliohm
Dec 14, 2022
Author

kellertuer
Dec 14, 2022
Maintainer

juliohm
Dec 14, 2022
Author

kellertuer
Jan 19, 2023
Maintainer