Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values used for normalized score calculation #1

Closed
Randl opened this issue Oct 31, 2021 · 5 comments
Closed

Values used for normalized score calculation #1

Randl opened this issue Oct 31, 2021 · 5 comments

Comments

@Randl
Copy link

Randl commented Oct 31, 2021

I couldn't find the values used for normalized score calculation neither in paper nor in repo. It would be convenient if we'd be able to compare new methods based on the same metric (mean normalized return). Also the values themselves do not appear anywhere in the paper, only on figures, which is a bit confusing.

@MishaLaskin
Copy link
Collaborator

Good catch! Here are the expert scores, we'll update the repo + paper to include these numbers.

walker_stand 984
walker_walk 971
walker_run 796
walker_flip 799
quadruped_walk 866
quadruped_run 888
quadruped_stand 920
quadruped_jump 888
jaco_reach_top_left 191
jaco_reach_top_right 223
jaco_reach_bottom_left 193
jaco_reach_bottom_right 203

@Randl
Copy link
Author

Randl commented Nov 2, 2021

@MishaLaskin Also, how error bars for Fig. 3 are calculated? I tried to use std over tasks for https://paperswithcode.com/task/unsupervised-reinforcement-learning but it's definitely too large. Did you take into account the std of the expert scores? Then probably under normal approximation for the ratio distribution the usual std of sum can be used? In that case, can you share stds too?

@MishaLaskin
Copy link
Collaborator

The error bars are standard errors (so taking # of seeds run into account to get a tighter estimate of the mean). Because expert scores are only there for normalization purposes, we just divide by the expert expert score without considering its standard deviation.

@Randl
Copy link
Author

Randl commented Nov 3, 2021

I think that expert std should be taken into account (if it has very high std then it is pretty clear that estimates are less confident) -- https://en.wikipedia.org/wiki/Ratio_distribution#Uncorrelated_noncentral_normal_ratio says the distribution of ratio can be estimated with normal one, taking into account both stds (and converging to normalized std if expert is very low std)

@MishaLaskin
Copy link
Collaborator

MishaLaskin commented Nov 3, 2021

That's a fair point, but we are using expert scores only as a way to display scores (it's just a scaling factor). We could instead just use the raw scores. Additionally, for all envs considered, the expert scores which are results of running supervised RL for 2M steps have low variance (see @denisyarats pytorch_sac and DrQ / DrQv2 repos) so this shouldn't be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants