Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Commit

Permalink
minor
Browse files Browse the repository at this point in the history
  • Loading branch information
tqchen committed Sep 28, 2015
1 parent aecef55 commit fc2f408
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions doc/program_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ They corresponds to the red nodes in the following figure.
![Comp Graph Folded](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/prog_model/comp_graph_backward.png)

What the imperative program did was actually the same as the symbolic way. It implicitly saves a backward
computation graph in the grad closure. When we invoked the ```d.grad```, we start from ```g[D]```,
computation graph in the grad closure. When we invoked the ```d.grad```, we start from ```d(D)```,
backtrace the graph to compute the gradient and collect the results back.

So we can find that in fact the gradient calculation in both symbolic and imperative programming follows the same
Expand All @@ -197,7 +197,7 @@ free the memory of previous results, and share the memory between inputs and out

Imagine now we are not running this toy example, but doing instead a deep neural net with ```n``` layers.
If we are only running forward pass, but not backward(gradient) pass, we will only need to allocate 2 copies of
temperal space to store values of intermediate layers, instead of ```n``` copies of them.
temporal space to store values of intermediate layers, instead of ```n``` copies of them.
However because the imperative programs need to be prepared for the possible futures of getting gradient,
the intermediate values have to be stored, which requires ```n``` copies of temporal space.

Expand Down

0 comments on commit fc2f408

Please sign in to comment.