forked from lisa-lab/DeepLearningTutorials
-
Notifications
You must be signed in to change notification settings - Fork 0
/
deep.txt
136 lines (93 loc) · 4.27 KB
/
deep.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
.. _deep:
Deep Learning
=============
The breakthrough to effective training strategies for deep architectures came in
2006 with the algorithms for training deep belief networks
(DBN) [Hinton07]_ and stacked auto-encoders [Ranzato07]_ , [Bengio07]_ .
All these methods are based on a similar approach: **greedy layer-wise unsupervised
pre-training** followed by **supervised fine-tuning**.
The pretraining strategy consists in using unsupervised learning to guide the
training of intermediate levels of representation. Each layer is pre-trained
with an unsupervised learning algorithm, which attempts to learn a nonlinear
transformation of its input, in order to captures its main variations. Higher
levels of abstractions are created by feeding the output of one layer, to the
input of the subsequent layer.
The resulting an architecture can then be seen in two lights:
* the pre-trained deep network can be used to initialize the weights of all, but
the last layer of a deep neural network. The weights are then further adapted
to a supervised task (such as classification) through traditional gradient
descent (see :ref:`Multilayer perceptron <mlp>`). This is referred to as the
fine-tuning step.
* the pre-trained deep network can also serve solely as a feature extractor. The
output of the last layer is fed to a classifier, such as logistic regression,
which is trained independently. Better results can be obtained by
concatenating the output of the last layer, with the hidden representations of
all intermediate layers [Lee09]_.
For the purposes of this tutorial, we will focus on the first interpretation,
as that is what was first proposed in [Hinton06]_.
Deep Coding
+++++++++++
Since Deep Belief Networks (DBN) and Stacked Denoising-AutoEncoders (SDA) share
much of the same architecture and have very similar training algorithms (in
terms of pretraining and fine-tuning stages), it makes sense to implement them
in a similar fashion, as part of a "Deep Learning" framework.
We thus define a generic interface, which both of these architectures will
share.
.. code-block:: python
class DeepLayerwiseModel(object):
def layerwise_pretrain(self, layer_fns, pretrain_amounts):
"""
"""
def finetune(self, datasets, lr, batch_size):
"""
class DBN(DeepLayerwiseModel):
"""
"""
class StackedDAA(DeepLayerwiseModel):
"""
"""
.. code-block:: python
def deep_main(learning_rate=0.1,
pretraining_epochs=20,
pretrain_lr=0.1,
training_epochs=1000,
batch_size=20,
mnist_file='mnist.pkl.gz'):
n_train_examples, train_valid_test = load_mnist(mnist_file)
# instantiate model
deep_model = ...
####
#### Phase 1: Pre-training
####
# create an array of functions, which will be used for the greedy
# layer-wise unsupervised training procedure
pretrain_functions = deep_model.pretrain_functions(
batch_size=batch_size,
train_set_x=train_set_x,
learning_rate=pretrain_lr,
...
)
# loop over all the layers in our network
for layer_idx, pretrain_fn in enumerate(pretrain_functions):
# iterate over a certain number of epochs)
for i in xrange(pretraining_epochs * n_train_examples / batch_size):
# follow one step in the gradient of the unsupervised cost
# function, at the given layer
layer_fn(i)
.. code-block:: python
####
#### Phase 2: Fine Tuning
####
# create theano functions for fine-tuning, as well as
# validation and testing our model.
train_fn, valid_scores, test_scores =\
deep_model.finetune_functions(
train_valid_test[0][0], # training dataset
learning_rate=finetune_lr, # the learning rate
batch_size=batch_size) # number of examples to use at once
# use these functions as part of the generic early-stopping procedure
for i in xrange(patience_max):
if i >= patience:
break
cost_i = train_fn(i)
...