Discrepancy in SAC on entropy coefficient update #177

marioyc · 2022-10-25T01:53:15Z

Noticed that here the log_prob variable is computed before the udpate of the actor while on SAC's repo it is recomputed after the actor update (the paper also mentions in Section 6 that an update is made on both q-function and policy before the update for the entropy coefficient). By any chance have you compared whether this detail makes a difference?

The text was updated successfully, but these errors were encountered:

muupan · 2022-10-25T03:18:41Z

You are right, it seems to be a discrepancy from the official implementation. I do not remember whether I made a comparison, maybe not.

marioyc · 2022-10-26T02:07:25Z

I see, no problem, thanks for replying anyways.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in SAC on entropy coefficient update #177

Discrepancy in SAC on entropy coefficient update #177

marioyc commented Oct 25, 2022

muupan commented Oct 25, 2022

marioyc commented Oct 26, 2022

Discrepancy in SAC on entropy coefficient update #177

Discrepancy in SAC on entropy coefficient update #177

Comments

marioyc commented Oct 25, 2022

muupan commented Oct 25, 2022

marioyc commented Oct 26, 2022