You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Noticed that here the log_prob variable is computed before the udpate of the actor while on SAC's repo it is recomputed after the actor update (the paper also mentions in Section 6 that an update is made on both q-function and policy before the update for the entropy coefficient). By any chance have you compared whether this detail makes a difference?
The text was updated successfully, but these errors were encountered:
Noticed that here the
log_prob
variable is computed before the udpate of the actor while on SAC's repo it is recomputed after the actor update (the paper also mentions in Section 6 that an update is made on both q-function and policy before the update for the entropy coefficient). By any chance have you compared whether this detail makes a difference?The text was updated successfully, but these errors were encountered: