-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Values used for normalized score calculation #1
Comments
Good catch! Here are the expert scores, we'll update the repo + paper to include these numbers.
|
@MishaLaskin Also, how error bars for Fig. 3 are calculated? I tried to use std over tasks for https://paperswithcode.com/task/unsupervised-reinforcement-learning but it's definitely too large. Did you take into account the std of the expert scores? Then probably under normal approximation for the ratio distribution the usual std of sum can be used? In that case, can you share stds too? |
The error bars are standard errors (so taking # of seeds run into account to get a tighter estimate of the mean). Because expert scores are only there for normalization purposes, we just divide by the expert expert score without considering its standard deviation. |
I think that expert std should be taken into account (if it has very high std then it is pretty clear that estimates are less confident) -- https://en.wikipedia.org/wiki/Ratio_distribution#Uncorrelated_noncentral_normal_ratio says the distribution of ratio can be estimated with normal one, taking into account both stds (and converging to normalized std if expert is very low std) |
That's a fair point, but we are using expert scores only as a way to display scores (it's just a scaling factor). We could instead just use the raw scores. Additionally, for all envs considered, the expert scores which are results of running supervised RL for 2M steps have low variance (see @denisyarats pytorch_sac and DrQ / DrQv2 repos) so this shouldn't be an issue. |
I couldn't find the values used for normalized score calculation neither in paper nor in repo. It would be convenient if we'd be able to compare new methods based on the same metric (mean normalized return). Also the values themselves do not appear anywhere in the paper, only on figures, which is a bit confusing.
The text was updated successfully, but these errors were encountered: