From 083193dc6374fd58698ac219c86ad1e9ea4aaa49 Mon Sep 17 00:00:00 2001 From: Rak Alexey Date: Mon, 16 Apr 2018 21:36:19 +0300 Subject: [PATCH] Update practice.pytorch Fix error in latex --- week7_pomdp/practice_pytorch.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/week7_pomdp/practice_pytorch.ipynb b/week7_pomdp/practice_pytorch.ipynb index 35fa6bdd4..14a6cb500 100644 --- a/week7_pomdp/practice_pytorch.ipynb +++ b/week7_pomdp/practice_pytorch.ipynb @@ -412,7 +412,7 @@ "\n", "__One more thing:__ since we train on T-step rollouts, we can use N-step formula for advantage for free:\n", " * At the last step, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot V(s_{t+1}) - V(s) $\n", - " * One step earlier, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot r(s_{t+1}, a_{t+1}) + \\gamma ^ 2 \\cdot V(s_{t+1}) - V(s) $\n", + " * One step earlier, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot r(s_{t+1}, a_{t+1}) + \\gamma ^ 2 \\cdot V(s_{t+2}) - V(s) $\n", " * Et cetera, et cetera. This way agent starts training much faster since it's estimate of A(s,a) depends less on his (imperfect) value function and more on actual rewards. There's also a [nice generalization](https://arxiv.org/abs/1506.02438) of this.\n", "\n", "\n",