Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rosenbrock function for testing Nonlinear Conjugate Gradient Method #1879

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

varunagrawal
Copy link
Collaborator

I implemented the Rosenbrock function as a 1D factor graph to test the correctness and efficiency of NonlinearConjugateGradientOptimizer.

Good news is that it is able to optimize to the correct value $(x, y) = (a, a^2)$. The bad news is that it takes many iterations if the initial estimation is not close enough.
For example, given a=12 and x=3, y=5, it takes over 300 iterations for it to converge to the correct solution which seems like a lot for Conjugate Gradient in the 1D case.
I'll test this against simple steepest descent to see how long this takes so we can compare the two.

I feel this can be significantly improved by implementing additional line search methods which have theoretical bounds following Wolfe Conditions. The Golden Section Search which is currently performed is robust but slow. @mehregandor and I can analyze this further to improve the convergence rates.

Copy link

@mehregandor mehregandor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an issue with the full Rosenbrock function formulation.

using namespace rosenbrock;
double a = 1.0, b = 100.0;
Rosenbrock1Factor f1(X(0), a, noiseModel::Unit::Create(1));
Rosenbrock2Factor f2(X(0), Y(0), noiseModel::Isotropic::Sigma(1, b));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is graph is constructed currently, it seems to me that the full objective function becomes $$f(x,y) = 2(x-a)^2 + 2\frac{1}{b^2}(x^2-y)^2$$. Instead it should be $$b(x^2-y)^2$$, so it should be $$\frac{1}{\sqrt{b}}$$ in the noise model sigma, no? Also, why do we need the $$\sqrt{2}$$, is it because the full cost is computed as $$\frac{1}{2}f_1^\top \Sigma_1^{-1} f_1 + \frac{1}{2}f_2^\top \Sigma_2^{-1} f_2$$? Instead consider taking out the $$\sqrt{2}$$ from the error and putting it into the covariance directly. Thanks for clarifying

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this one is wrong. The correct definition is in GetRosenbrockGraph. That's a good suggestion to put 2 in the covariance/precision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do note that GTSAM error is defined as 0.5*r^2, so please be consistent with that. No point in having 2 definitions of the objective function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I updated the tests to always use the same definition.

graph.emplace_shared<Rosenbrock1Factor>(
X(0), a, noiseModel::Isotropic::Precision(1, 2));
graph.emplace_shared<Rosenbrock2Factor>(
X(0), Y(0), noiseModel::Isotropic::Precision(1, 2 * b));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something, why is b scaled here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error function for factors is $\frac{1}{2}e^2$ so by defining $e^2 = (x^2 - y)\Sigma^{-1} (x^2 - y) = (x^2 - y)2b (x^2 - y) = 2b(x^2 - y)^2$ we get the second term as $b(x^2 - y)^2$.

@varunagrawal
Copy link
Collaborator Author

I re-ran the test with a=2, b=100, x=1.0, y=1.0 and it is able to converge in 13 steps.
Similarly, for a = 12, b = 100 and x = 10.0, y = 135.0, it takes about 23 steps, so it seems like the optimization is highly dependent on how close the initial estimate is to the final solution.

Base automatically changed from cg-methods to develop October 20, 2024 20:44
Copy link

@mehregandor mehregandor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants