From 0e84f855a7852d543e55c00b0725f21d6225a911 Mon Sep 17 00:00:00 2001
From: Adit Deshpande <adeshpande3@g.ucla.edu>
Date: Sun, 25 Jun 2017 14:36:44 -0700
Subject: [PATCH] Update
 2016-7-29-A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2.html

---
 ...e-To-Understanding-Convolutional-Neural-Networks-Part-2.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/_posts/2016-7-29-A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2.html b/_posts/2016-7-29-A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2.html
index 46c20f16b0cb3..d7e265086b2f9 100644
--- a/_posts/2016-7-29-A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2.html
+++ b/_posts/2016-7-29-A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2.html
@@ -36,7 +36,7 @@ <h2><p><strong>Pooling Layers</strong></p></h2>
 <img src="/assets/MaxPool.png">
 <p>Other options for pooling layers are average pooling and L2-norm pooling. The intuitive reasoning behind this layer is that once we know that a specific feature is in the original input volume (there will be a high activation value), its exact location is not as important as its relative location to the other features. As you can imagine, this layer drastically reduces the spatial dimension (the length and the width change but not the depth) of the input volume. This serves two main purposes. The first is that the amount of parameters or weights is reduced by 75%, thus lessening the computation cost. The second is that it will control <strong>overfitting</strong>. This term refers to when a model is so tuned to the training examples that it is not able to generalize well for the validation and test sets. A symptom of overfitting is having a model that gets 100% or 99% on the training set, but only 50% on the test data.</p>
 <h2><p><strong>Dropout Layers</strong></p></h2>
-<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Now, <strong>dropout layers </strong>have a very specific function in neural networks. In the last section, we discussed the problem of overfitting, where after training, the weights of the network are so tuned to the training examples they are given that the network doesn&rsquo;t perform well when given new examples. The idea of dropout is simplistic in nature. This layer &ldquo;drops out&rdquo; a random set of activations in that layer by setting them to zero in the forward pass. Simple as that. Now, what are the benefits of such a simple and seemingly unnecessary and counterintuitive process? Well, in a way, it forces the network to be redundant. By that I mean the network should be able to provide the right classification or output for a specific example even if some of the activations are dropped out. It makes sure that the network isn&rsquo;t getting too &ldquo;fitted&rdquo; to the training data and thus helps alleviate the overfitting problem. An important note is that this layer is only used during training, and not during test time.</p>
+<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Now, <strong>dropout layers </strong>have a very specific function in neural networks. In the last section, we discussed the problem of overfitting, where after training, the weights of the network are so tuned to the training examples they are given that the network doesn&rsquo;t perform well when given new examples. The idea of dropout is simplistic in nature. This layer &ldquo;drops out&rdquo; a random set of activations in that layer by setting them to zero. Simple as that. Now, what are the benefits of such a simple and seemingly unnecessary and counterintuitive process? Well, in a way, it forces the network to be redundant. By that I mean the network should be able to provide the right classification or output for a specific example even if some of the activations are dropped out. It makes sure that the network isn&rsquo;t getting too &ldquo;fitted&rdquo; to the training data and thus helps alleviate the overfitting problem. An important note is that this layer is only used during training, and not during test time.</p>
 <p><a href="https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf">Paper</a> by Geoffrey Hinton.</p>
 <h2><p><strong>Network in Network Layers</strong></p></h2>
 <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A <strong>network in network</strong> layer refers to a conv layer where a 1 x 1 size filter is used. Now, at first look, you might wonder why this type of layer would even be helpful since receptive fields are normally larger than the space they map to. However, we must remember that these 1x1 convolutions span a certain depth, so we can think of it as a 1 x 1 x N convolution where N is the number of filters applied in the layer. Effectively, this layer is performing a N-D element-wise multiplication where N is the depth of the input volume into the layer.</p>