Merge pull request #1 from kventinel/patch-8

Update practice.pytorch
yandexdataschool · Apr 16, 2018 · e487a3b · e487a3b
2 parents 3bbf4ef + 083193d
commit e487a3b
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/week7_pomdp/practice_pytorch.ipynb b/week7_pomdp/practice_pytorch.ipynb
@@ -412,7 +412,7 @@
     "\n",
     "__One more thing:__ since we train on T-step rollouts, we can use N-step formula for advantage for free:\n",
     "  * At the last step, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot V(s_{t+1}) - V(s) $\n",
-    "  * One step earlier, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot r(s_{t+1}, a_{t+1}) + \\gamma ^ 2 \\cdot V(s_{t+1}) - V(s) $\n",
+    "  * One step earlier, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot r(s_{t+1}, a_{t+1}) + \\gamma ^ 2 \\cdot V(s_{t+2}) - V(s) $\n",
     "  * Et cetera, et cetera. This way agent starts training much faster since it's estimate of A(s,a) depends less on his (imperfect) value function and more on actual rewards. There's also a [nice generalization](https://arxiv.org/abs/1506.02438) of this.\n",
     "\n",
     "\n",