Skip to content

Commit

Permalink
Merge pull request #1 from kventinel/patch-8
Browse files Browse the repository at this point in the history
Update practice.pytorch
  • Loading branch information
kventinel authored Apr 16, 2018
2 parents 3bbf4ef + 083193d commit e487a3b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion week7_pomdp/practice_pytorch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -412,7 +412,7 @@
"\n",
"__One more thing:__ since we train on T-step rollouts, we can use N-step formula for advantage for free:\n",
" * At the last step, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot V(s_{t+1}) - V(s) $\n",
" * One step earlier, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot r(s_{t+1}, a_{t+1}) + \\gamma ^ 2 \\cdot V(s_{t+1}) - V(s) $\n",
" * One step earlier, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot r(s_{t+1}, a_{t+1}) + \\gamma ^ 2 \\cdot V(s_{t+2}) - V(s) $\n",
" * Et cetera, et cetera. This way agent starts training much faster since it's estimate of A(s,a) depends less on his (imperfect) value function and more on actual rewards. There's also a [nice generalization](https://arxiv.org/abs/1506.02438) of this.\n",
"\n",
"\n",
Expand Down

0 comments on commit e487a3b

Please sign in to comment.