Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

update symbol in python_guide #79

Merged
merged 1 commit into from
Sep 15, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 93 additions & 113 deletions doc/python/python_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ applications. There are mainly three concepts:
offers matrix and tensor computations on both CPU and GPU, with automatic
parallelization

* [Symbol](#symbolic-and-automatic-differentiation) makes defining a neural
* [Symbol](#symbol-and-automatic-differentiation) makes defining a neural
network extremely easy, and provides automatic differentiation.

* [KVStore](#distributed-key-value-store) easy the data synchronization between
Expand Down Expand Up @@ -218,145 +218,124 @@ want to write highly parallelized codes, we only need to postpone when we need
the results.


## Symbolic and Automatic Differentiation
## Symbol and Automatic Differentiation

Now you have seen the power of NArray of MXNet. It seems to be interesting and
we are ready to build some real deep learning. Hmm, this seems to be really
exciting, but wait, do we need to build things from scratch? It seems that we
need to re-implement all the layers in deep learning toolkits such as
[CXXNet](https://github.com/dmlc/cxxnet) in NArray? Well, you do not have
to. There is a Symbolic API in MXNet that readily helps you to do all these.
NDArray is the basic computation unit in MXNet. Besides, MXNet provides a
symbolic interface, named Symbol, to simplify constructing neural networks. The
symbol combines both flexibility and efficiency. On one hand, it is similar to
the network configuration in [Caffe](http://caffe.berkeleyvision.org/) and
[CXXNet](https://github.com/dmlc/cxxnet), on the other hand, the symbol defines
the computation graph in [Theano](http://deeplearning.net/software/theano/).

More importantly, the Symbolic API is designed to bring in the advantage of C++
static layers(operators) to ***maximumly optimizes the performance and memory***
that is even better than CXXNet. Sounds exciting? Let us get started on this.
### Basic Composition of Symbols

### Creating Symbols

A common way to create a neural network is to create it via some way of
configuration file or API. The following code creates a configuration two layer
perceptrons.
The following codes create a two layer perceptrons network:

```python
import mxnet.symbol as sym
data = sym.Variable('data')
net = sym.FullyConnected(data=data, name='fc1', num_hidden=128)
net = sym.Activation(data=net, name='relu1', act_type="relu")
net = sym.FullyConnected(data=net, name='fc2', num_hidden=10)
net = sym.Softmax(data=net, name = 'sm')
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
>>> net = mx.symbol.Activation(data=net, name='relu1', act_type="relu")
>>> net = mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
>>> net = mx.symbol.Softmax(data=net, name='out')
>>> type(net)
<class 'mxnet.symbol.Symbol'>
```

If you are familiar with tools such as cxxnet or caffe, the ```Symbol``` object
is like configuration files that configures the network structure. If you are
more familiar with tools like theano, the ```Symbol``` object something that
defines the computation graph. Basically, it creates a computation graph that
defines the forward pass of neural network.
Each symbol takes a (unique) string name. *Variable* often defines the inputs,
or free variables. Other symbols take a symbol as the input (*data*),
and may accept other hyper-parameters such as the number of hidden neurons (*num_hidden*)
or the activation type (*act_type*).

The Configuration API allows you to define the computation graph via
compositions. If you have not used symbolic configuration tools like theano
before, one thing to note is that the ```net``` can also be viewed as function
that have input arguments.

You can get the list of arguments by calling ```Symbol.list_arguments```.
The symbol can be simply viewed as a function taking several arguments, whose
names are automatically generated and can be get by

```python
>>> net.list_arguments()
['data', 'fc1_weight', 'fc1_bias', 'fc2_weight', 'fc2_bias']
['data', 'fc1_weight', 'fc1_bias', 'fc2_weight', 'fc2_bias', 'out_label']
```

In our example, you can find that the arguments contains the parameters in each
layer, as well as input data. One thing that worth noticing is that the
argument names like ```fc1_weight``` are automatically generated because it was
not specified in creation of fc1. You can also specify it explicitly, like the
following code.
As can be seen, these arguments are the parameters need by each symbol:

- *data* : input data needed by the variable *data*
- *fc1_weight* and *fc1_bias* : the weight and bias for the first fully connected layer *fc1*
- *fc2_weight* and *fc2_bias* : the weight and bias for the second fully connected layer *fc2*
- *out_label* : the label needed by the loss

We can also specify the automatic generated names explicitly:

```python
>>> import mxnet.symbol as sym
>>> data = sym.Variable('data')
>>> w = sym.Variable('myweight')
>>> net = sym.FullyConnected(data=data, weight=w,
name='fc1', num_hidden=128)
>>> net = mx.symbol.Variable('data')
>>> w = mx.symbol.Variable('myweight')
>>> net = sym.FullyConnected(data=data, weight=w, name='fc1', num_hidden=128)
>>> net.list_arguments()
['data', 'myweight', 'fc1_bias']
```

Besides the coarse grained neuralnet operators such as FullyConnected,
Convolution. MXNet also provides fine graned operations such as elementwise
add, multiplications. The following example first performs an elementwise add
between two symbols, then feed them to the FullyConnected operator.
### More Complicated Composition

MXNet provides well-optimized symbols (see
[src/operator](https://github.com/dmlc/mxnet/tree/master/src/operator)) for
commonly used layers in deep learning. We can also easily define new operators
in python. The following example first performs an elementwise add between two
symbols, then feed them to the fully connected operator.

```python
>>> import mxnet.symbol as sym
>>> lhs = sym.Variable('data1')
>>> rhs = sym.Variable('data2')
>>> net = sym.FullyConnected(data=lhs + rhs,
name='fc1', num_hidden=128)
>>> lhs = mx.symbol.Variable('data1')
>>> rhs = mx.symbol.Variable('data2')
>>> net = mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
>>> net.list_arguments()
['data1', 'data2', 'fc1_weight', 'fc1_bias']
```

### More Complicated Composition

In the previous example, Symbols are constructed in a forward compositional way.
Besides doing things in a forward compistion way. You can also treat composed
symbols as functions, and apply them to existing symbols.
We can also construct symbol in a more flexible way rather than the single
forward composition we addressed before.

```python
>>> import mxnet.symbol as sym
>>> data = sym.Variable('data')
>>> net = sym.FullyConnected(data=data,
name='fc1', num_hidden=128)
>>> net.list_arguments()
['data', 'fc1_weight', 'fc1_bias']
>>> data2 = sym.Variable('data2')
>>> in_net = sym.FullyConnected(data=data,
name='in', num_hidden=128)
>>> composed_net = net(data=in_net, name='compose')
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
>>> net2 = mx.symbol.Variable('data2')
>>> net2 = mx.symbol.FullyConnected(data=net2, name='net2', num_hidden=128)
>>> composed_net = net(data=net2, name='compose')
>>> composed_net.list_arguments()
['data2', 'in_weight', 'in_bias', 'compose_fc1_weight', 'compose_fc1_bias']
['data2', 'net2_weight', 'net2_bias', 'compose_fc1_weight', 'compose_fc1_bias']
```

In the above example, net is used a function to apply to an existing symbol
```in_net```, the resulting composed_net will replace the original ```data``` by
the the in_net instead. This is useful when you want to change the input of some
neural-net to be other structure.

### Shape Inference
In the above example, *net* is used a function to apply to an existing symbol
*net*, the resulting *composed_net* will replace the original argument *data* by
*net2* instead.

Now we have defined the computation graph. A common problem in the computation
graph, is to figure out shapes of each parameters. Usually, we want to know the
shape of all the weights, bias and outputs.
### Argument Shapes Inference

You can use ```Symbol.infer_shape``` to do that. THe shape inference function
allows you to pass in shapes of arguments that you know,
and it will try to infer the shapes of all arguments and outputs.
Now we have known how to define the symbol. Next we can inference the shapes of
all the arguments it needed by given the input data shape.

```python
>>> import mxnet.symbol as sym
>>> data = sym.Variable('data')
>>> net = sym.FullyConnected(data=data, name='fc1',
num_hidden=10)
>>> arg_shape, out_shape = net.infer_shape(data=(100, 100))
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=ent, name='fc1', num_hidden=10)
>>> arg_shape, out_shape, aux_shape = net.infer_shape(data=(100, 100))
>>> dict(zip(net.list_arguments(), arg_shape))
{'data': (100, 100), 'fc1_weight': (10, 100), 'fc1_bias': (10,)}
>>> out_shape
[(100, 10)]
```

In common practice, you only need to provide the shape of input data, and it
will automatically infers the shape of all the parameters. You can always also
provide more shape information, such as shape of weights. The ```infer_shape```
will detect if there is inconsitency in the shapes, and raise an Error if some
of them are inconsistent.
The shape inference can be used as an earlier debugging mechanism to detect
shape inconsistency.

### Bind the Symbols
### Bind the Symbols and Run

Symbols are configuration objects that represents a computation graph (a
configuration of neuralnet). So far we have introduced how to build up the
computation graph (i.e. a configuration). The remaining question is, how we can
do computation using the defined graph.
Now we can bind the free variables of the symbol and perform forward and
backward.

TODO.
```python
>>> in_shape = (128, 3, 100, 100) # minibatch_size, #channel, image_width, image_height
>>> executor = net.simple_bind(mx.gpu(), data = mx.nd.empty(in_shape, mx.gpu())
>>> # feed data and label..
>>> executor.forward()
>>> executor.backward()
>>> print executor.outputs[0].asnumpy()
```

### How Efficient is Symbolic API

Expand All @@ -371,12 +350,12 @@ utilization.
The coarse grained operators are equivalent to cxxnet layers, which are
extremely efficient. We also provide fine grained operators for more flexible
composition. Because we are also doing more inplace memory allocation, mxnet can
be ***more memory efficient*** than cxxnet, and gets to same runtime, with
be ***more memory efficient*** than cxxnet/caffe, and gets to same runtime, with
greater flexiblity.

## Distributed Key-value Store

`KVStore` is a place for data sharing. We can think it as a single object shared
KVStore is a place for data sharing. We can think it as a single object shared
across different devices (GPUs and machines), where each device can push data in
and pull data out.

Expand Down Expand Up @@ -409,7 +388,7 @@ For any key has been initialized, we can push a new value with the same shape to
```

The data for pushing can be on any device. Furthermore, we can push multiple
values into the same key, where `kvstore` will first sum all these
values into the same key, where KVStore will first sum all these
values and then push the aggregated value.

```python
Expand All @@ -422,7 +401,7 @@ values and then push the aggregated value.
[ 4. 4. 4.]]
```

For each push, `kvstore` applies the pushed value into the value stored by a
For each push, KVStore applies the pushed value into the value stored by a
`updater`. The default updater is `ASSGIN`, we can replace the default one to
control how data is merged.

Expand Down Expand Up @@ -458,7 +437,7 @@ pull the value into several devices by a single call.

### Handle a list of key-value pairs

All operations introduced so far are about a single key. `KVStore` also provides
All operations introduced so far are about a single key. KVStore also provides
the interface for a list of key-value pairs. For single device:

```python
Expand Down Expand Up @@ -493,16 +472,17 @@ update on key: 9

Base on parameter server. The `updater` will runs on the server nodes. MORE...

## How to Choose between APIs

You can mix them all as much as you like. Here are some guidelines
* Use Symbolic API and coarse grained operator to create established structure.
* Use fine-grained operator to extend parts of of more flexible symbolic graph.
* Do some dynamic NArray tricks, which are even more flexible, between the calls of forward and backward of executors.
<!-- ## How to Choose between APIs -->

<!-- You can mix them all as much as you like. Here are some guidelines -->
<!-- * Use Symbolic API and coarse grained operator to create established structure. -->
<!-- * Use fine-grained operator to extend parts of of more flexible symbolic graph. -->
<!-- * Do some dynamic NArray tricks, which are even more flexible, between the calls of forward and backward of executors. -->

We believe that different ways offers you different levels of flexibilty and
efficiency. Normally you do not need to be flexible in all parts of the
networks, so we allow you to use the fast optimized parts, and compose it
flexibly with fine-grained operator or dynamic NArray. We believe such kind of
mixture allows you to build the deep learning architecture both efficiently and
flexibly as your choice. To mix is to maximize the peformance and flexiblity.
<!-- We believe that different ways offers you different levels of flexibilty and -->
<!-- efficiency. Normally you do not need to be flexible in all parts of the -->
<!-- networks, so we allow you to use the fast optimized parts, and compose it -->
<!-- flexibly with fine-grained operator or dynamic NArray. We believe such kind of -->
<!-- mixture allows you to build the deep learning architecture both efficiently and -->
<!-- flexibly as your choice. To mix is to maximize the peformance and flexiblity. -->
24 changes: 12 additions & 12 deletions python/mxnet/kvstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def init(key, value):

Examples
--------
# init a single key-value pair
>>> # init a single key-value pair
>>> shape = (2,3)
>>> mx.kv.init(3, mx.nd.ones(shape)*2)
>>> a = mx.nd.zeros(shape)
Expand All @@ -70,7 +70,7 @@ def init(key, value):
[[ 2. 2. 2.]
[ 2. 2. 2.]]

# init a list of key-value pairs
>>> # init a list of key-value pairs
>>> keys = [5, 7, 9]
>>> mx.kv.init(keys, [mx.nd.ones(shape)]*len(keys))
"""
Expand All @@ -89,14 +89,14 @@ def push(key, value):

Examples
--------
# push a single key-value pair
>>> # push a single key-value pair
>>> mx.kv.push(3, mx.nd.ones(shape)*8)
>>> mx.kv.pull(3, out = a) # pull out the value
>>> print a.asnumpy()
[[ 8. 8. 8.]
[ 8. 8. 8.]]

# aggregate the value and the push
>>> # aggregate the value and the push
>>> gpus = [mx.gpu(i) for i in range(4)]
>>> b = [mx.nd.ones(shape, gpu) for gpu in gpus]
>>> mx.kv.push(3, b)
Expand All @@ -105,16 +105,16 @@ def push(key, value):
[[ 4. 4. 4.]
[ 4. 4. 4.]]

# push a list of keys.
# single device
>>> # push a list of keys.
>>> # single device
>>> mx.kv.push(keys, [mx.nd.ones(shape)]*len(keys))
>>> b = [mx.nd.zeros(shape)]*len(keys)
>>> mx.kv.pull(keys, out = b)
>>> print b[1].asnumpy()
[[ 1. 1. 1.]
[ 1. 1. 1.]]
# multiple devices:

>>> # multiple devices:
>>> b = [[mx.nd.ones(shape, gpu) for gpu in gpus]] * len(keys)
>>> mx.kv.push(keys, b)
>>> mx.kv.pull(keys, out = b)
Expand All @@ -137,29 +137,29 @@ def pull(key, out=None):

Examples
--------
# pull a single key-value pair
>>> # pull a single key-value pair
>>> a = mx.nd.zeros(shape)
>>> mx.kv.pull(3, out = a)
>>> print a.asnumpy()
[[ 2. 2. 2.]
[ 2. 2. 2.]]

# pull into multiple devices
>>> # pull into multiple devices
>>> b = [mx.nd.ones(shape, gpu) for gpu in gpus]
>>> mx.kv.pull(3, out = b)
>>> print b[1].asnumpy()
[[ 2. 2. 2.]
[ 2. 2. 2.]]

# pull a list of key-value pairs.
# On single device
>>> # pull a list of key-value pairs.
>>> # On single device
>>> keys = [5, 7, 9]
>>> b = [mx.nd.zeros(shape)]*len(keys)
>>> mx.kv.pull(keys, out = b)
>>> print b[1].asnumpy()
[[ 2. 2. 2.]
[ 2. 2. 2.]]
# On multiple devices
>>> # On multiple devices
>>> b = [[mx.nd.ones(shape, gpu) for gpu in gpus]] * len(keys)
>>> mx.kv.pull(keys, out = b)
>>> print b[1][1].asnumpy()
Expand Down