From 083193dc6374fd58698ac219c86ad1e9ea4aaa49 Mon Sep 17 00:00:00 2001
From: Rak Alexey <kventinel@gmail.com>
Date: Mon, 16 Apr 2018 21:36:19 +0300
Subject: [PATCH] Update practice.pytorch

Fix error in latex
---
 week7_pomdp/practice_pytorch.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/week7_pomdp/practice_pytorch.ipynb b/week7_pomdp/practice_pytorch.ipynb
index 35fa6bdd4..14a6cb500 100644
--- a/week7_pomdp/practice_pytorch.ipynb
+++ b/week7_pomdp/practice_pytorch.ipynb
@@ -412,7 +412,7 @@
     "\n",
     "__One more thing:__ since we train on T-step rollouts, we can use N-step formula for advantage for free:\n",
     "  * At the last step, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot V(s_{t+1}) - V(s) $\n",
-    "  * One step earlier, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot r(s_{t+1}, a_{t+1}) + \\gamma ^ 2 \\cdot V(s_{t+1}) - V(s) $\n",
+    "  * One step earlier, $A(s_t,a_t) = r(s_t, a_t) + \\gamma \\cdot r(s_{t+1}, a_{t+1}) + \\gamma ^ 2 \\cdot V(s_{t+2}) - V(s) $\n",
     "  * Et cetera, et cetera. This way agent starts training much faster since it's estimate of A(s,a) depends less on his (imperfect) value function and more on actual rewards. There's also a [nice generalization](https://arxiv.org/abs/1506.02438) of this.\n",
     "\n",
     "\n",