Doesn't work for continuous_mountain_car #9

joyousrabbit · 2017-05-10T17:05:28Z

Hello, the algo doesn't work for continuous_mountain_car, because it's reward is -pow(action[0],2)*0.1. What means, the car's initial state is a local max reward, all the exploration will decrease the reward and cannot get evoluated.

Of course, if the car can explore the final solution by one try, it will work. But the probability is negligible.

How do you handle such local max initial state issue???

PatrykChrabaszcz · 2017-05-11T14:20:32Z

What do you mean by Of course, if the car can explore the final solution by one try, it will work. .
I think that if it finds good solution (Reaching the final state) by accident then update in weights will be too small anyway as most of the population will want to keep the policy "Do nothing" . Correct me if I'm wrong but I think that for this experiment you would have to change the way in which policy weights are updated to give more value to much better results and ignore the rest, and you would have to increase the noise so it's possible to find good policy by adding noise to policy that does nothing.

This example is quite hard. I managed to get good results for discrete version (MountainCar-v0) but no success for this one.

joyousrabbit · 2017-05-11T14:58:28Z

@PatrykChrabaszcz Hello, after the solution is found quickly, the new weights will all be based on that solution.

PatrykChrabaszcz · 2017-05-11T15:15:48Z

I don't see how one proper solution would drag the weights for the current policy such that it makes it more probable to draw more policies that reach final state in the next generation (for this enviroment). Influence from policies doing nothing will be much bigger when you use current default updating rule.

Maybe you mean initializing current policy (by accident) such that big part of the first population reaches the goal state.

joyousrabbit · 2017-05-11T15:20:43Z

@PatrykChrabaszcz No, whenever it reachs the goal state, the influence will be big and immediate to the following biased and random weights. Because it's reward is huge compared with other opponents of doing nothing.

PatrykChrabaszcz · 2017-05-11T17:22:36Z

Reward might be huge but by default if I understand correctly it uses weighted average to update parameters. But the weights in this average are from <-0.5, 0.5> centered_rank . So if there is only one good solution in this population it will be counted as 0.5 but the next one assuming for example population of size 100 will be counted as 0.49. That's why I said you could change the way those weights are updated so it gives this good solution higher importance.
Am I right?

joyousrabbit · 2017-05-11T17:29:28Z

It's not average. It's only based on (R_positive_rank-R_negative_rank)/number_of_rewards, so the huge reward = 1, the tiny reward is 0.0000001. They are independent.

…

On 11 May 2017 at 19:22, Patryk Chrabaszcz ***@***.***> wrote: Reward might be huge but by default if I understand correctly it uses weighted average to update parameters. But the weights in this average are from <-0.5, 0.5> *centered_rank* . So if there is only one good solution in this population it will be counted as 0.5 but the next one assuming for example population of size 100 will be counted as 0.49. That's why I said you could change the way those weights are updated so it gives this good solution higher importance. Am I right? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ARFboFbmvTzAs32_hOB-s3yY0in5QaVvks5r40PegaJpZM4NW9SV> .

atgambardella mentioned this issue Aug 2, 2017

Performance on MountainCar atgambardella/pytorch-es#9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't work for continuous_mountain_car #9

Doesn't work for continuous_mountain_car #9

joyousrabbit commented May 10, 2017

PatrykChrabaszcz commented May 11, 2017

joyousrabbit commented May 11, 2017 •

edited

Loading

PatrykChrabaszcz commented May 11, 2017

joyousrabbit commented May 11, 2017

PatrykChrabaszcz commented May 11, 2017

joyousrabbit commented May 11, 2017 via email

Doesn't work for continuous_mountain_car #9

Doesn't work for continuous_mountain_car #9

Comments

joyousrabbit commented May 10, 2017

PatrykChrabaszcz commented May 11, 2017

joyousrabbit commented May 11, 2017 • edited Loading

PatrykChrabaszcz commented May 11, 2017

joyousrabbit commented May 11, 2017

PatrykChrabaszcz commented May 11, 2017

joyousrabbit commented May 11, 2017 via email

joyousrabbit commented May 11, 2017 •

edited

Loading