-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduce MOPO results #101
Comments
MOPO implemented in d3rlpy does not terminate its model rollout at terminal states. I found that this model rollout termination improves d3rl MOPO's performance and gets it closer to the original one. [1] https://github.com/tianheyu927/mopo/blob/master/mopo/algorithms/mopo.py |
@TakuyaHiraoka Thanks for the info! I never realized they did a very tricky hack. It seems not very practical in general. I would rather add a classifier trained to estimate terminal flags. Probably this explains the COMBO issue too. |
Actually, in the official implementation of Morel, the authors use the same trick. Maybe using a known termination function is common in model-based offline RL? |
I agree with @IantheChan. To my knowledge, some model-based RL works uses this trick, which seems to be very critical for model-based RL. I find this makes sense for robot learning as it is somehow not too difficult to check whether a robot sensor status is feasible. |
Hi @takuseno,
I have been trying to reproduce the MOPO results using your library and I have been having trouble. I have been following your MOPO script in the reproduce directory and have been messing around with training the dynamics ensemble for much longer. However even after training for more than 500 epochs I fail to get an evaluation score higher than 7 for Hopper-medium (d4rl). Could you provide any pointers?
Some metrics that seem suspicious are the SAC losses. There seems to be a blowup in the critic loss. Thanks!
The text was updated successfully, but these errors were encountered: