Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARS agents score not good on exploring #3

Open
ar8372 opened this issue Jun 24, 2022 · 1 comment
Open

ARS agents score not good on exploring #3

ar8372 opened this issue Jun 24, 2022 · 1 comment

Comments

@ar8372
Copy link

ar8372 commented Jun 24, 2022

Hey @colinskow , I have implemented ars.py for the bipedal problem. The score on 1500 iteration is around 330.
In each step, in the training loop we explore once using below code

            # Play an episode with the new weights and print the score
            reward_evaluation = self.explore()

Now I have saved the theta at 1500 iterations and also all the other parameter.
Next I have initialized the theta with this pretrained theta while creating an instance of Policy() class and explored 10 times, but score is around 6.23 not even closer to 330.
Can you tell me why is this happening.

Each time in Explore() we do self.env.reset() so just restarts the env but why reward from the explore function, when called from inside of training loop and manually call explore function is so different.

Let me know if my query is not clear, thanks.

@vfsousas
Copy link

Hello @ar8372 I have the same problem...

When I try to use this in a "production env" it fail and I did the same as you.
I saved all the params(n, mean, mean_diff, var) and theta and loaded into another instance of police, but never get the reward that was trained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants