From 5e62649d04a740b1b14a18748f6824b7f4a64ea0 Mon Sep 17 00:00:00 2001
From: Michael Abbott <32575566+mcabbott@users.noreply.github.com>
Date: Tue, 29 Nov 2022 12:00:22 -0500
Subject: [PATCH] move a sentence

---
 docs/src/training/training.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/docs/src/training/training.md b/docs/src/training/training.md
index d278ff87b2..ba3bba071d 100644
--- a/docs/src/training/training.md
+++ b/docs/src/training/training.md
@@ -29,7 +29,6 @@ for data in train_set
 end
 ```
 
-It is important that every `update!` step receives a newly gradient computed gradient.
 This loop can also be written using the function [`train!`](@ref Flux.Train.train!),
 but it's helpful to undersand the pieces first:
 
@@ -43,8 +42,8 @@ end
 
 Fist recall from the section on [taking gradients](@ref man-taking-gradients) that 
 `Flux.gradient(f, a, b)` always calls `f(a, b)`, and returns a tuple `(∂f_∂a, ∂f_∂b)`.
-In the code above, the function `f` is an anonymous function with one argument,
-created by the `do` block, hence  `grads` is a tuple with one element.
+In the code above, the function `f` passed to `gradient` is an anonymous function with
+one argument, created by the `do` block, hence  `grads` is a tuple with one element.
 Instead of a `do` block, we could have written:
 
 ```julia
@@ -58,6 +57,9 @@ structures are what Zygote calls "explicit" gradients.
 It is important that the execution of the model takes place inside the call to `gradient`,
 in order for the influence of the model's parameters to be observed by Zygote.
 
+It is also important that every `update!` step receives a newly gradient computed gradient,
+as this will be change whenever the model's parameters are changed, and for each new data point.
+
 !!! compat "Explicit vs implicit gradients"
     Flux ≤ 0.13 used Zygote's "implicit" mode, in which `gradient` takes a zero-argument function.
     It looks like this: