Skip to content

Commit

Permalink
Update 2016-7-29-A-Beginner's-Guide-To-Understanding-Convolutional-Ne…
Browse files Browse the repository at this point in the history
…ural-Networks-Part-2.html
  • Loading branch information
adeshpande3 authored Jun 25, 2017
1 parent 7908949 commit 0e84f85
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ <h2><p><strong>Pooling Layers</strong></p></h2>
<img src="/assets/MaxPool.png">
<p>Other options for pooling layers are average pooling and L2-norm pooling. The intuitive reasoning behind this layer is that once we know that a specific feature is in the original input volume (there will be a high activation value), its exact location is not as important as its relative location to the other features. As you can imagine, this layer drastically reduces the spatial dimension (the length and the width change but not the depth) of the input volume. This serves two main purposes. The first is that the amount of parameters or weights is reduced by 75%, thus lessening the computation cost. The second is that it will control <strong>overfitting</strong>. This term refers to when a model is so tuned to the training examples that it is not able to generalize well for the validation and test sets. A symptom of overfitting is having a model that gets 100% or 99% on the training set, but only 50% on the test data.</p>
<h2><p><strong>Dropout Layers</strong></p></h2>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Now, <strong>dropout layers </strong>have a very specific function in neural networks. In the last section, we discussed the problem of overfitting, where after training, the weights of the network are so tuned to the training examples they are given that the network doesn&rsquo;t perform well when given new examples. The idea of dropout is simplistic in nature. This layer &ldquo;drops out&rdquo; a random set of activations in that layer by setting them to zero in the forward pass. Simple as that. Now, what are the benefits of such a simple and seemingly unnecessary and counterintuitive process? Well, in a way, it forces the network to be redundant. By that I mean the network should be able to provide the right classification or output for a specific example even if some of the activations are dropped out. It makes sure that the network isn&rsquo;t getting too &ldquo;fitted&rdquo; to the training data and thus helps alleviate the overfitting problem. An important note is that this layer is only used during training, and not during test time.</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Now, <strong>dropout layers </strong>have a very specific function in neural networks. In the last section, we discussed the problem of overfitting, where after training, the weights of the network are so tuned to the training examples they are given that the network doesn&rsquo;t perform well when given new examples. The idea of dropout is simplistic in nature. This layer &ldquo;drops out&rdquo; a random set of activations in that layer by setting them to zero. Simple as that. Now, what are the benefits of such a simple and seemingly unnecessary and counterintuitive process? Well, in a way, it forces the network to be redundant. By that I mean the network should be able to provide the right classification or output for a specific example even if some of the activations are dropped out. It makes sure that the network isn&rsquo;t getting too &ldquo;fitted&rdquo; to the training data and thus helps alleviate the overfitting problem. An important note is that this layer is only used during training, and not during test time.</p>
<p><a href="https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf">Paper</a> by Geoffrey Hinton.</p>
<h2><p><strong>Network in Network Layers</strong></p></h2>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A <strong>network in network</strong> layer refers to a conv layer where a 1 x 1 size filter is used. Now, at first look, you might wonder why this type of layer would even be helpful since receptive fields are normally larger than the space they map to. However, we must remember that these 1x1 convolutions span a certain depth, so we can think of it as a 1 x 1 x N convolution where N is the number of filters applied in the layer. Effectively, this layer is performing a N-D element-wise multiplication where N is the depth of the input volume into the layer.</p>
Expand Down

0 comments on commit 0e84f85

Please sign in to comment.