-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vd sampler behaves strangely #9
Comments
It isn't a bug, it is more like a feature.
I suppose that the function is a rotated cigar. I confirmed that the same behavior is observed with my vdcma code. This problem is observed when the cigar axis is nearly parallel to coordinate axis, or nearly in a subspace spanned by the basis. If I run vdcma on a diagonally oriented cigar, the strange behavior less likely happens. In Figure 2 of the reference [1], you find a relatively large standard deviation on cigrot and ellcig.
[1] Y. Akimoto, A. Auger, N. Hansen: Comparison-Based Natural Gradient Optimization in High Dimension, GECCO 2014, pp 373--380 (2014)
… On May 15, 2017, at 7:53, nikohansen ***@***.***> wrote:
In maybe 20% of the runs, we see plots like this:
@youheiakimoto
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks, I see. Then I would consider this to be somewhat a defect in the algorithm, which might be intrinsic. I guess one way to look at the underlying reason is that unlearning V takes much longer than learning it? |
Right. No active unlearning mechanism for V (and D) is implemented, while learning one long axis is very quick thanks to the cumulation.
One major defect of VD-CMA is that once it learns a wrong long axis (V), it has to first make the vector short and then rotate the vector and make it longer to learn the right long axis. It is problematic when the initial step-size is very small. Then, the evolution path first tends to be long in the negative gradient direction, which is orthogonal to the long axis of the function, and so does V. Therefore VD-CMA needs to wait until V becomes sufficiently short.
The same happens for CMA (learns a wrong axis at the beginning), but CMA doesn't need to wait this axis becomes short. It learns the right long axis while it makes the wrong axis short. If we have two vectors V in VD covariance model, I guess the situation will be better.
|
I am not so sure about that, because we can observe a very similar effect with small initial step-size. The effect is prevented with the |
Given that VD-CMA is largely succeeded by VkD-CMA, I am closing this issue. |
In maybe 20% of the runs, we see plots like this:
@youheiakimoto
The text was updated successfully, but these errors were encountered: