We use the same tooling as for coding an ANN: (Theano OR Tensorflow) AND Keras.
Date pre-processing of images typically contains these steps:
- Load the images and labels
- Split the data into the train and test sets
- Do Image Scaling
- Do Feature Scaling (Pixel Scaling)
- Do Image Augmentation
When using Keras, some of these steps are combined into single library calls, but it is important to understand the motivation for each of each step separately.
See the annotated code for more details.
Loading the images, loading the labels and splitting the data into train and test
is very easy to do in Keras if we organise our data following a prescribed folder structure.
Keras calls this the flow_from_directory
method.
The structure looks like this:
- Split all the data in 2 folders
test_set/
andtraining_set/
. - Separate images by category in subfolders e.g.
test_set/cats/
andtest_set/dogs/
. - Name each file with the label e.g.
dog.1.jpg
...dog.5000.jpg
.
The images in our training and test sets most likely have different sizes and aspect ratios. However, our
CNN takes a images of a fixed size as input, so we need to scale the images to the same input size
.
There is an inherent trade-off with the image input size:
- Bigger images give the CNN more information to work with, which can positively impact accuracy.
- However, bigger images also means a higher computational expense to train the network.
Both the training and test sets must be scaled equally.
In addition to image scaling, we also re-scale the value of each pixel to take a value between 0
and 1
.
For example, if we receive an image whose pixel values range from 0
to 255
(e.g. an RGB image), we divide
each pixel by 255
.
Both the training and test sets must be scaled equally.
Image Augmentation is a technique that allows us to generate additional training images that are derived from the original ones by randomly applying some simple transformations like rotations, zooming, shifting, flipping, changes in brightness, etc.
Example of a horizontal shift image augmentation. Taken from here.
Image augmentation helps us in two ways:
- Deep learning networks typically perform better when they are exposed to more data.
- It exposes the network to plausible variations of the images that the model could encounter during prediction. For example, it is reasonable to think that someone may take a picture with a parrot in the middle of the frame and some other person with the parrot on the left. Exposing the network to these variants during training makes it more robust to different scenarios.
Image augmentation is only performed on the training set.
Read this great article by Jason Brownlee if you want to know more.
See the annotated code for most of the information on how to actually do it. The notes below just complement the annotated code.
This is a hyperparameter and selecting the value is an art and defined through experimentation.
A rule of thumb that is commonly used is doubling the filters on each convolutional layer. For example: 32 in the first convolutional layer, 64 in the second, 128 in the third... and so on.
Same thing. This is a hyperparameter that is found experimentally. Take into account that as the number of conv-layers increase, so does the computation intensity of the task, so start with 1 or 3 and ramp up.
The motivation for using dropout to control overfitting in CNNS is largely the same as for ANNs.
Implementation-wise, there are some nuances that only apply to CNNs:
- In CNNs, dropout is typically used after the pooling layers, but this is a rough heuristic. It could also be used after the convolution layers.
- In CNNs, an alternative form dropout is to drop entire feature maps (as opposed to regions of each feature map).
This is called
SpatialDropout
and it is also supported in Keras. - Dropout can also be applied to the fully connected hidden layers after flattening. This can be done in addition to dropout in the convolution or max-pooling layers.
Learn more about dropout regularisation here.
Keras supports a very convenient way of saving a model's architecture together with the trained weights to disk. This allows us to re-load the model later and use it for prediction. See the code for more details on how to do this.
Note that Keras also supports saving the architecture and weights in separate files.