-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't work for continuous_mountain_car #9
Comments
What do you mean by Of course, if the car can explore the final solution by one try, it will work. . This example is quite hard. I managed to get good results for discrete version (MountainCar-v0) but no success for this one. |
@PatrykChrabaszcz Hello, after the solution is found quickly, the new weights will all be based on that solution. |
I don't see how one proper solution would drag the weights for the current policy such that it makes it more probable to draw more policies that reach final state in the next generation (for this enviroment). Influence from policies doing nothing will be much bigger when you use current default updating rule. Maybe you mean initializing current policy (by accident) such that big part of the first population reaches the goal state. |
@PatrykChrabaszcz No, whenever it reachs the goal state, the influence will be big and immediate to the following biased and random weights. Because it's reward is huge compared with other opponents of doing nothing. |
Reward might be huge but by default if I understand correctly it uses weighted average to update parameters. But the weights in this average are from <-0.5, 0.5> centered_rank . So if there is only one good solution in this population it will be counted as 0.5 but the next one assuming for example population of size 100 will be counted as 0.49. That's why I said you could change the way those weights are updated so it gives this good solution higher importance. |
It's not average. It's only based on
(R_positive_rank-R_negative_rank)/number_of_rewards, so the huge reward =
1, the tiny reward is 0.0000001. They are independent.
…On 11 May 2017 at 19:22, Patryk Chrabaszcz ***@***.***> wrote:
Reward might be huge but by default if I understand correctly it uses
weighted average to update parameters. But the weights in this average are
from <-0.5, 0.5> *centered_rank* . So if there is only one good solution
in this population it will be counted as 0.5 but the next one assuming for
example population of size 100 will be counted as 0.49. That's why I said
you could change the way those weights are updated so it gives this good
solution higher importance.
Am I right?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARFboFbmvTzAs32_hOB-s3yY0in5QaVvks5r40PegaJpZM4NW9SV>
.
|
Hello, the algo doesn't work for continuous_mountain_car, because it's reward is -pow(action[0],2)*0.1. What means, the car's initial state is a local max reward, all the exploration will decrease the reward and cannot get evoluated.
Of course, if the car can explore the final solution by one try, it will work. But the probability is negligible.
How do you handle such local max initial state issue???
The text was updated successfully, but these errors were encountered: