Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the number in the paper #5

Open
MSRA-COLT opened this issue Nov 30, 2020 · 4 comments
Open

Questions about the number in the paper #5

MSRA-COLT opened this issue Nov 30, 2020 · 4 comments

Comments

@MSRA-COLT
Copy link

Hi, I really appreciate your open source code. My question is how is your performance number reported in the paper.

For example, in Table 1, do you use the max evaluation return during the learning process or use the last evaluation return. The return of the policy has large variance in different iteration.

image

Thanks,
Yue

@weihongwei0586
Copy link

Hi, I really appreciate your open source code. My question is how is your performance number reported in the paper.

For example, in Table 1, do you use the max evaluation return during the learning process or use the last evaluation return. The return of the policy has large variance in different iteration.

image

Thanks,
Yue

I have the same problem, When i run the demo not in mixed, the results has large variance.
image

@HYDesmondLiu
Copy link

Also, which version of D4RL were you using (also in COMBO)?
The reason why I ask is that the buffer quality is quite different in v0~v2. (you could refer to the TD3BC paper for details).

@typoverflow
Copy link

@HYDesmondLiu The config file in this repo says they used '-v0' dataset for MOPO. But I'm still curious about the dataset version used in COMBO, is COMBO's source code even released?
I am also having trouble stabilizing MOPO's performance. The variance of performance across epochs is quite huge.

@HYDesmondLiu
Copy link

@typoverflow
AFAIK, COMBO source code is not shared. As I recall they use D4RL v2 buffers since the performance between v0 and v2 is quite different. You could easily spot the difference.
"Some" DRL methods are notorious for being unreproducible.
You could refer to this paper and other related research for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants