Replies: 21 comments
-
So to continue your idea, you would like to minimize on the Power manifold (the square just makes all this smooth and nice. Here are some differences
The last point is maybe a little complicated to see, so let's use another manifold as an example: The 2-sphere. Imaging you are at the North Pole and you found the gradient, that is the direction to walk into to minimize your function (and you know how long the step is) – this is a vector in the x,y plane (that is tangent to North Pole). Note that N+X would leave the sphere, we can not do that but This is of course more costly, and as a summary maybe the main tradeoff: Note that on some manifolds, exp might be hard (and not even available in closed form); luckily on the simplex it is. And we have the “change_representer” available – which means: so very TLDR: It depends, both approaches have their advantages and it depends heavily on the problem, which one to use. Hope that helps and let me know if you want to try that (and run into problems). |
Beta Was this translation helpful? Give feedback.
-
Thank you @kellertuer , that is very helpful. Do you have a simple example that you could quickly write with Manopt.jl that I could copy and paste to compare with the JuMP.jl implementation I already have? My problem is a bit large, but I am curious to see if the results obtained with the manifold approach differ in an interesting way. |
Beta Was this translation helpful? Give feedback.
-
Hm, I am not sure how to help much more than pointing to https://manoptjl.org/stable/tutorials/Optimize!/ which explains how to get started with a gradient descent (and of course, see quasi newton above, the signature of that one is very similar). For gradient descent, make sure to set the stepwise (the default one is not so optimal at the moment but will be changed with the next breaking release) – for best to |
Beta Was this translation helpful? Give feedback.
-
I saw the example, but it is always tricky to get things working properly as a non-maintainer. If you could kindly write a MWE I could quickly adapt and try it with the data I have :) I'd be happy to report any differences with the JuMP code. |
Beta Was this translation helpful? Give feedback.
-
Do you mean something like gradient_descent(M, f, grad_f, x0; stepsize=ArmijoLinesearch(M)) ? Tat is not an MWE because I do not know your M, f and grad_f, but I am not sure what to further provide. |
Beta Was this translation helpful? Give feedback.
-
This is a simple random data set that has the properties I mentioned: Q = rand(20,100)
Q = Q ./ sum(Q, dims=1)
C = rand(20, 10)
C = C ./ sum(C, dims=1) Following your minimization formulation: Each column of M should live in the 10-simplex. |
Beta Was this translation helpful? Give feedback.
-
Well, then I can write down your manifold and your cost |
Beta Was this translation helpful? Give feedback.
-
Oh I thought that the gradient would be something that the Manifolds.jl framework would compute auto-magically given that the manifold is known. That is it is always a good idea to ask the experts first before committing to an implementation :) |
Beta Was this translation helpful? Give feedback.
-
We do a lot of things in code – magic is not part of that ;) If you have a Euclidean (classical function) and have some AD tools for the Euclidean gradient that would be fine as well. But at least the (Euclidean) gradient is what I would need. Then – indeed – Manifolds.jl can compute the Riemannian one from that. |
Beta Was this translation helpful? Give feedback.
-
When you say Euclidean gradient you mean the usual gradient in R^n? I understood that the manifold information (i.e. simplex) would be enough to take a code that is "Euclidean", i.e. with vectors x in R^n and make it "Manifold" by projecting the Euclidean gradient onto the manifold, etc. I am probably missing something. |
Beta Was this translation helpful? Give feedback.
-
Yes, if you give me the R^n gradient (the usual one you know from math classes) I can (magically!) compute the Riemannian one. I can not do that with any vectors or code and - no - that is not just projecting – and not onto the manifold. It is projection onto the corresponding tangent space and changing the Riesz representer. But both these actions/functions can luckily be computed in your case. Maybe one further issue is, that it really requires some knowledge about the theory (and I sadly can not give you a whole 2-lectures-a-week-for-15weeks introduction here in this issue). |
Beta Was this translation helpful? Give feedback.
-
So this is how far I get with your current information: using LinearAlgebra, Manopt, Manifolds
n = 20
p = 100
k = 10
Q = rand(n,p)
Q = Q ./ sum(Q, dims=1)
C = rand(n, k)
C = C ./ sum(C, dims=1)
# Since Q=CM, we need M of sice kp, that is p columns? but then the simplex is of dimension k-1 (in R^k), namely:
manifold = MultinomialMatrices(k,p)
# The cost requires the format: (Manifold, point) -> value - note that the manifold is also usually called M in Manopt/Manifolds:
f(M,p) = norm(Q-C*p)^2
# Gradient? grad_f(M,p) = ...? also ok to give me the euclidean gradient first...
# Initial matrix M? M0 = ...
# We could just take
M0 = 1/k * ones(k,p)
# And note that we can also check that is a point on the manifold (the third parameter - true - just throws an error with more details if this fails
is_point(manifold, M0, true) # returns true, that is M0 is a point on the manifold you are on
# then we could call
gradient_descent(manifold, f, grad_f, M0; stepsize=ArmijoLinesearch(M)) |
Beta Was this translation helpful? Give feedback.
-
Yes, I fully agree. I am still trying to find the time to dive into this literature, couldn't find it since our last video call last year. 😢 Having a simple snippet like the one you shared above at least helps getting a sense of performance in this specific application. |
Beta Was this translation helpful? Give feedback.
-
Well, sometimes – time is hard to find ;) As I said, deriving in any way the Euclidean Gradient of |
Beta Was this translation helpful? Give feedback.
-
One can not attach Pluto notebooks, so I narrowed it down to just code again. Here is the Euclidean gradient – including a check, the magic to convert it (we might have a shorter way to write that but it is getting late) – and a gradient descent. Quasi Newton still requires a parallel transport (or a vector transport) which I have not yet looked for whether that's easy to implement. using LinearAlgebra, Manopt, Manifolds, Plots
n = 20
p = 100
k = 10
Q = rand(n,p)
Q = Q ./ sum(Q, dims=1)
C = rand(n, k)
C = C ./ sum(C, dims=1)
end;
# Gradient? grad_f(M,p) = ...? also ok to give me the euclidean gradient first...
# Initial matrix M? M0 = ...
# We could just take
M0 = 1/k * ones(k,p);
# Since Q=CM, we need M of sice kp, that is p columns? but then the simplex is of dimension k-1 (in R^k), namely:
manifold = MultinomialMatrices(k,p)
# And note that we can also check that is a point on the manifold (the third parameter - true - just throws an error with more details if this fails
is_point(manifold, M0, true) # returns true, that is M0 is a point on the manifold you are on
# The cost requires the format: (Manifold, point) -> value - note that the manifold is also usually called M in Manopt/Manifolds: the 1/2 is for keeping other formulae nice.
f(M,p) = 1/2 * norm(Q-C*p)^2
grad_f_E(M,x) = transpose(C)*(C*x-Q)
# Let's check this (Ronny says a lot – who knows!) - define the Euclidean manifold of matrices
E = ℝ^(k,p)
# maybe not the best (=numerically stable) but its ok
check_gradient(E, f, grad_f_E, M0, slope_tol=0.2, throw_error=true)
# ok, let's convert (magic!) - now even shorter
grad_f(M, p) = riemannian_gradient(M, p, grad_f_E(E,p))
# Here we are
gradient_descent(
manifold, f, grad_f, M0;
stepsize=ArmijoLinesearch(manifold),
debug=[:Iteration, :Cost, :Stop, 25, "\n"],
) |
Beta Was this translation helpful? Give feedback.
-
Hm, interesting. It looks like a least squares problem so I've tried the new Riemannian Levenberg-Marquardt that we have now but I probably have an error in the Jacobian: F_RLM(M,p) = vec(Q-C*p)
function jacF_RLM(
M::AbstractManifold, p; basis_domain::AbstractBasis=DefaultOrthogonalBasis()
)
X0 = zeros(manifold_dimension(M))
J = ForwardDiff.jacobian(
x -> F_RLM(M, exp(M, p, get_vector(M, p, x, basis_domain))), X0
)
return J
end
using ForwardDiff
o = LevenbergMarquardt(
manifold,
F_RLM,
jacF_RLM,
M0,
length(Q);
return_options=true,
jacB=ProjectedOrthonormalBasis(:svd),
) |
Beta Was this translation helpful? Give feedback.
-
Have you tried my code? It should work for your problem if I am not mistaken – let me know where you are stuck. Note that I also updated the code above slightly, because we introduced a nicer function for conversion of Euclidean to Riemannian gradients. |
Beta Was this translation helpful? Give feedback.
-
I didn't have time yet @kellertuer but it is on my TODO list. Lots of work here, but as soon as I have time to pause I will give it a try. |
Beta Was this translation helpful? Give feedback.
-
Ah, then I misread your post on Zulip, because that read for me that this thread is stuck, you tried everything and need even more input now. |
Beta Was this translation helpful? Give feedback.
-
I could get going with the JuMP solution. The manifold solution is something I would like to try for comparison now so that future users can understand the tradeoffs. |
Beta Was this translation helpful? Give feedback.
-
I converted this to a discussion, since we are not dicussing a bug – but feel free to of course ask further when you had time to run the comparison in the future :) |
Beta Was this translation helpful? Give feedback.
-
Suppose I have a n-by-p matrix
Q
such that each column lives in the n-simplex (i.e.sum(Q[:,j]) == 1
andQ[i,j] >= 0
). Additionally, suppose that I have a n-by-k matrixC
such that each column lives in the n-simplex. I would like to find a matrix M such thatQ ≈ C*M
and such that each column of M lives in the k-simplex.Do you have any good literature on this optimization problem over the n-dimensional simplex? It resembles the traditional least-squares problem, but on manifolds. What are the advantages of using Manopt.jl versus explicitly adding simplex constraints to a JuMP.jl model for instance? Appreciate any advise.
Beta Was this translation helpful? Give feedback.
All reactions