-
-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lunar Lander Example is Seriously Regressed #256
Comments
Just thought I should leave an update on this issue... Things I've learned:
Actions I'm taking in response to this:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
I just found out that the original version of the Lunar Lander example was able to land successfully sometimes. In the current code, it never even gets remotely close. It can't even get a positive score.
The original code says:
In the current code, I can run it for 500+ generations without it ever cresting above 0 reward. So something has seriously regressed. In reading the code, I now realize that the
compute_fitness
function makes no sense to me, so I believe there is some issue in confusing rewards with outputs. Also, the actual scores obtained when running the networks afterward are nowhere near the "fitness" being plotted. So this also points to there being a complete disconnect between "fitness" and actual score.I will be debugging this in the next couple of days, but wanted to report the issue ahead of time.
To Reproduce
Steps to reproduce the behavior:
cd examples/openai-lander
python evolve.py
fitness.svg
plot like the one below. We can't achieve a positive reward (solving the task would be a reward of +200).The text was updated successfully, but these errors were encountered: