Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup docs a bit more #127

Merged
merged 1 commit into from
Aug 31, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions docs/src/api/optimizer.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,21 @@
# Optimizers

```@autodocs
Modules = [MXNet.mx]
Modules = [MXNet.mx, MXNet.mx.LearningRate, MXNet.mx.Momentum]
Pages = ["optimizer.jl"]
```

## Built-in optimizers

```@contents
Pages = ["optimizers/adam.md", "optimizers/sgd.md"]
### Stochastic Gradient Descent
```@autodocs
Modules = [MXNet.mx]
Pages = ["optimizers/sgd.jl"]
```

### ADAM
```@autodocs
Modules = [MXNet.mx]
Pages = ["optimizers/adam.jl"]
```

6 changes: 0 additions & 6 deletions docs/src/api/optimizers/adam.md

This file was deleted.

6 changes: 0 additions & 6 deletions docs/src/api/optimizers/sgd.md

This file was deleted.

36 changes: 18 additions & 18 deletions src/callback.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,21 +32,21 @@ A convenient function to construct a callback that runs every `n` mini-batches.

# Arguments
* `call_on_0::Bool`: keyword argument, default false. Unless set, the callback
will **not** be run on batch 0.
will *not* be run on batch 0.

For example, the :func:`speedometer` callback is defined as
For example, the [`speedometer`](@ref) callback is defined as

.. code-block:: julia

every_n_iter(frequency, call_on_0=true) do state :: OptimizationState
if state.curr_batch == 0
# reset timer
else
# compute and print speed
end
end
```julia
every_n_iter(frequency, call_on_0=true) do state :: OptimizationState
if state.curr_batch == 0
# reset timer
else
# compute and print speed
end
end
```

:seealso: :func:`every_n_epoch`, :func:`speedometer`.
See also [`every_n_epoch`](@ref) and [`speedometer`](@ref).
"""
function every_n_batch(callback :: Function, n :: Int; call_on_0 :: Bool = false)
BatchCallback(n, call_on_0, callback)
Expand All @@ -68,7 +68,7 @@ Create an `AbstractBatchCallback` that measure the training speed
(number of samples processed per second) every k mini-batches.

# Arguments
* Int frequency: keyword argument, default 50. The frequency (number of
* `frequency::Int`: keyword argument, default 50. The frequency (number of
min-batches) to measure and report the speed.
"""
function speedometer(;frequency::Int=50)
Expand Down Expand Up @@ -97,12 +97,12 @@ end

A convenient function to construct a callback that runs every `n` full data-passes.

* Int call_on_0: keyword argument, default false. Unless set, the callback
will **not** be run on epoch 0. Epoch 0 means no training has been performed
* `call_on_0::Int`: keyword argument, default false. Unless set, the callback
will *not* be run on epoch 0. Epoch 0 means no training has been performed
yet. This is useful if you want to inspect the randomly initialized model
that has not seen any data yet.

:seealso: :func:`every_n_iter`.
See also [`every_n_iter`](@ref).
"""
function every_n_epoch(callback :: Function, n :: Int; call_on_0 :: Bool = false)
EpochCallback(n, call_on_0, callback)
Expand All @@ -127,9 +127,9 @@ The checkpoints can be loaded back later on.
* `prefix::AbstractString`: the prefix of the filenames to save the model. The model
architecture will be saved to prefix-symbol.json, while the weights will be saved
to prefix-0012.params, for example, for the 12-th epoch.
* Int frequency: keyword argument, default 1. The frequency (measured in epochs) to
* `frequency::Int`: keyword argument, default 1. The frequency (measured in epochs) to
save checkpoints.
* Bool save_epoch_0: keyword argument, default false. Whether we should save a
* `save_epoch_0::Bool`: keyword argument, default false. Whether we should save a
checkpoint for epoch 0 (model initialized but not seen any data yet).
"""
function do_checkpoint(prefix::AbstractString; frequency::Int=1, save_epoch_0=false)
Expand Down
6 changes: 3 additions & 3 deletions src/executor.jl
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ Create an `Executor` by binding a `SymbolicNode` to concrete `NDArray`.
* `ctx::Context`: the context on which the computation should run.
* `args`: either a list of `NDArray` or a dictionary of name-array pairs. Concrete
arrays for all the inputs in the network architecture. The inputs typically include
network parameters (weights, bias, filters, etc.), data and labels. See :func:`list_arguments`
and :func:`infer_shape`.
network parameters (weights, bias, filters, etc.), data and labels. See [`list_arguments`](@ref)
and [`infer_shape`](@ref).
* `args_grad`:
* `aux_states`:
* `grad_req`:
Expand Down Expand Up @@ -211,7 +211,7 @@ Can be used to get an estimated about the memory cost.
dProvider = ... # DataProvider
exec = mx.simple_bind(net, mx.cpu(), data=size(dProvider.data_batch[1]))
dbg_str = mx.debug_str(exec)
println(split(ref, ['\n'])[end-2])
println(split(ref, ['\\n'])[end-2])
```
"""
function debug_str(self :: Executor)
Expand Down
131 changes: 65 additions & 66 deletions src/model.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ abstract AbstractModel
The feedforward model provides convenient interface to train and predict on
feedforward architectures like multi-layer MLP, ConvNets, etc. There is no
explicitly handling of *time index*, but it is relatively easy to implement
unrolled RNN / LSTM under this framework (**TODO**: add example). For models
that handles sequential data explicitly, please use **TODO**...
unrolled RNN / LSTM under this framework (*TODO*: add example). For models
that handles sequential data explicitly, please use *TODO*...
"""
type FeedForward <: AbstractModel
arch :: SymbolicNode
Expand Down Expand Up @@ -47,10 +47,11 @@ end
"""
FeedForward(arch :: SymbolicNode, ctx)

* arch: the architecture of the network constructed using the symbolic API.
* ctx: the devices on which this model should do computation. It could be a single `Context`
or a list of `Context` objects. In the latter case, data parallelization will be used
for training. If no context is provided, the default context `cpu()` will be used.
# Arguments:
* `arch`: the architecture of the network constructed using the symbolic API.
* `ctx`: the devices on which this model should do computation. It could be a single `Context`
or a list of `Context` objects. In the latter case, data parallelization will be used
for training. If no context is provided, the default context `cpu()` will be used.
"""
function FeedForward(arch :: SymbolicNode; context :: Union{Context, Vector{Context}, Void} = nothing)
if isa(context, Void)
Expand All @@ -64,17 +65,18 @@ end
"""
init_model(self, initializer; overwrite=false, input_shapes...)

Initialize the weights in the model.
Initialize the weights in the model.

This method will be called automatically when training a model. So there is usually no
need to call this method unless one needs to inspect a model with only randomly initialized
weights.
This method will be called automatically when training a model. So there is usually no
need to call this method unless one needs to inspect a model with only randomly initialized
weights.

* FeedForward self: the model to be initialized.
* AbstractInitializer initializer: an initializer describing how the weights should be initialized.
* Bool overwrite: keyword argument, force initialization even when weights already exists.
* input_shapes: the shape of all data and label inputs to this model, given as keyword arguments.
For example, `data=(28,28,1,100), label=(100,)`.
# Arguments:
* `self::FeedForward`: the model to be initialized.
* `initializer::AbstractInitializer`: an initializer describing how the weights should be initialized.
* `overwrite::Bool`: keyword argument, force initialization even when weights already exists.
* `input_shapes`: the shape of all data and label inputs to this model, given as keyword arguments.
For example, `data=(28,28,1,100), label=(100,)`.
"""
function init_model(self :: FeedForward, initializer :: AbstractInitializer; overwrite::Bool=false, input_shapes...)
# all arg names, including data, label, and parameters
Expand Down Expand Up @@ -162,46 +164,44 @@ function _setup_predictor(self :: FeedForward, overwrite :: Bool=false; data_sha
end

"""
.. function::
predict(self, data; overwrite=false, callback=nothing)
predict(self, data; overwrite=false, callback=nothing)

Predict using an existing model. The model should be already initialized, or trained or loaded from
a checkpoint. There is an overloaded function that allows to pass the callback as the first argument,
so it is possible to do
Predict using an existing model. The model should be already initialized, or trained or loaded from
a checkpoint. There is an overloaded function that allows to pass the callback as the first argument,
so it is possible to do

.. code-block:: julia

predict(model, data) do batch_output
# consume or write batch_output to file
end

* FeedForward self: the model.
* AbstractDataProvider data: the data to perform prediction on.
* Bool overwrite: an `Executor` is initialized the first time predict is called. The memory
allocation of the `Executor` depends on the mini-batch size of the test
data provider. If you call predict twice with data provider of the same batch-size,
then the executor can be potentially be re-used. So, if `overwrite` is false,
we will try to re-use, and raise an error if batch-size changed. If `overwrite`
is true (the default), a new `Executor` will be created to replace the old one.

.. note::

Prediction is computationally much less costly than training, so the bottleneck sometimes becomes the IO
for copying mini-batches of data. Since there is no concern about convergence in prediction, it is better
to set the mini-batch size as large as possible (limited by your device memory) if prediction speed is a
concern.

For the same reason, currently prediction will only use the first device even if multiple devices are
provided to construct the model.

.. note::

If you perform further after prediction. The weights are not automatically synchronized if `overwrite`
is set to false and the old predictor is re-used. In this case
setting `overwrite` to true (the default) will re-initialize the predictor the next time you call
predict and synchronize the weights again.

:seealso: :func:`train`, :func:`fit`, :func:`init_model`, :func:`load_checkpoint`
```julia
predict(model, data) do batch_output
# consume or write batch_output to file
end
```

# Arguments:
* `self::FeedForward`: the model.
* `data::AbstractDataProvider`: the data to perform prediction on.
* `overwrite::Bool`: an `Executor` is initialized the first time predict is called. The memory
allocation of the `Executor` depends on the mini-batch size of the test
data provider. If you call predict twice with data provider of the same batch-size,
then the executor can be potentially be re-used. So, if `overwrite` is false,
we will try to re-use, and raise an error if batch-size changed. If `overwrite`
is true (the default), a new `Executor` will be created to replace the old one.

!!! note
Prediction is computationally much less costly than training, so the bottleneck sometimes becomes the IO
for copying mini-batches of data. Since there is no concern about convergence in prediction, it is better
to set the mini-batch size as large as possible (limited by your device memory) if prediction speed is a
concern.

For the same reason, currently prediction will only use the first device even if multiple devices are
provided to construct the model.

!!! note
If you perform further after prediction. The weights are not automatically synchronized if `overwrite`
is set to false and the old predictor is re-used. In this case
setting `overwrite` to true (the default) will re-initialize the predictor the next time you call
predict and synchronize the weights again.

See also [`train`](@ref), [`fit`](@ref), [`init_model`](@ref), and [`load_checkpoint`](@ref)
"""
function predict(callback :: Function, self :: FeedForward, data :: AbstractDataProvider; overwrite :: Bool = true)
predict(self, data; overwrite = overwrite, callback=callback)
Expand Down Expand Up @@ -310,7 +310,7 @@ end
"""
train(model :: FeedForward, ...)

Alias to :func:`fit`.
Alias to [`fit`](@ref).
"""
function train(self :: FeedForward, optimizer :: AbstractOptimizer, data :: AbstractDataProvider; kwargs...)
fit(self, optimizer, data; kwargs...)
Expand All @@ -321,26 +321,25 @@ end

Train the `model` on `data` with the `optimizer`.

* FeedForward model: the model to be trained.
* AbstractOptimizer optimizer: the optimization algorithm to use.
* AbstractDataProvider data: the training data provider.
* Int n_epoch: default 10, the number of full data-passes to run.
* AbstractDataProvider eval_data: keyword argument, default `nothing`. The data provider for
* `model::FeedForward`: the model to be trained.
* `optimizer::AbstractOptimizer`: the optimization algorithm to use.
* `data::AbstractDataProvider`: the training data provider.
* `n_epoch::Int`: default 10, the number of full data-passes to run.
* `eval_data::AbstractDataProvider`: keyword argument, default `nothing`. The data provider for
the validation set.
* AbstractEvalMetric eval_metric: keyword argument, default `Accuracy()`. The metric used
* `eval_metric::AbstractEvalMetric`: keyword argument, default [`Accuracy()`](@ref). The metric used
to evaluate the training performance. If `eval_data` is provided, the same metric is also
calculated on the validation set.
* kvstore: keyword argument, default `:local`. The key-value store used to synchronize gradients
* `kvstore`: keyword argument, default `:local`. The key-value store used to synchronize gradients
and parameters when multiple devices are used for training.
:type kvstore: `KVStore` or `Base.Symbol`
* AbstractInitializer initializer: keyword argument, default `UniformInitializer(0.01)`.
* Bool force_init: keyword argument, default false. By default, the random initialization using the
* `initializer::AbstractInitializer`: keyword argument, default `UniformInitializer(0.01)`.
* `force_init::Bool`: keyword argument, default false. By default, the random initialization using the
provided `initializer` will be skipped if the model weights already exists, maybe from a previous
call to :func:`train` or an explicit call to :func:`init_model` or :func:`load_checkpoint`. When
call to [`train`](@ref) or an explicit call to [`init_model`](@ref) or [`load_checkpoint`](@ref). When
this option is set, it will always do random initialization at the begining of training.
* callbacks: keyword argument, default `[]`. Callbacks to be invoked at each epoch or mini-batch,
* `callbacks::Vector{AbstractCallback}`: keyword argument, default `[]`. Callbacks to be invoked at each epoch or mini-batch,
see `AbstractCallback`.
:type callbacks: `Vector{AbstractCallback}`
"""
function fit(self :: FeedForward, optimizer :: AbstractOptimizer, data :: AbstractDataProvider; kwargs...)
opts = TrainingOptions(; kwargs...)
Expand Down
Loading