This is the assignment of 4th session in phase-1 of EVA-8 from TSAI.
Objective of this assignment is to build a CNN based network which will take MNIST Handwritten Dataset and will achieve targets as mentioned below.
- Upto
99.4%
test accuracy. - Using total parameters count of under 10,000.
- Test accuracy should be consistant and need to be achieved till training reaches 15th epoch.
While achieving above mentioned target code needs to be writtern under certain conditions as mentioned below:
- Target should be achieved in more then 3 steps in a gradual fashion. Means there should be more then 3 versions of the code where each code will be progressive in nature.
- Each code version should have a target, result and analysis block showing what was the target of that code and what were the results and your analysis.
This repository contains 4 folders, each folder contains colab copy of notebook of the training code and a model architecture image. Below are the information about the following folders.
- step_1/ - This folder (as name suggest) contains colab copy of notebook of first step i.e. first code setup
step_1/EVA_assignment_4-step1.ipynb
. - step_2/ - This folder contains colab copy of notebook of second step i.e. second code setup
step_2/EVA_assignment_4-step2.ipynb
- step_3/ - This folder contains colab copy of notebook of third step i.e. Third code setup
step_3/EVA_assignment_4-step3.ipynb
- step_4/ - This folder contains colab copy of notebook of fourth step i.e. fourth code setup
step_4/EVA_assignment_4-step4.ipynb
- Bonus_step/ - This assignment has a bonus part in which same target should be achieved in less then 8,000 parameters for bonus points. So this folder contains colab copy of notebook of that extra bonus step code setup
Bonus_step/EVA_assignment_4-Bonusstep.ipynb
NOTE: This means that required target is achieved in four steps and an extra step to achieve bonus target.
As mentioned earlier, in this assignment we are using MNIST Handwritten Digit Dataset to train our CNN model to achieve mentioned target. Below is an image representing subset of the image data which we will be using in this assignment.
In this section we will look into the first code setup which is present in step_1/EVA_assignment_4-step1.ipynb
. We will look into the target set for first code setup step, model architecture, result target, analysis and output logs.
Below are the target for step1.
- Get the setup correct and working. Because it is required to have basic working code for training and evaluation.
- setup basic data transformations like normalization.
- setup data loader for training and testing.
- setup training and evaluation loop
- Build CNN architecture/skeleton to have <10,000 parameters. Because assignment strongly mentions to have less then 10k parameters, so building larger CNN model is useless and will cause problem in follow-up code setup.
As mentioned in target that we will be targeting a model architecutre which will have less than 10k parameters. Below is an image of model architecture which achieve this target. Above image also contians inputs and outputs of each layer as well as Receptive Field and it's calcuation. Following are the explaination of terms used for Receptive Field calculation.
- jumpIN - pixel jump calculated for previous layer.
- jumpOUT - pixel jump calculated for current layer. Formula for jumpOUT is
jumpOUT = jumpIN x stride
- stride - by how many pixel fliter is sliding our feature map.
- rIN - receptive field of previous layer.
- rOUT - receptive filed of current layer. Formula for rOUT is
rIN + (kernel_size - 1)xjumpIN
Below are the result achieved in the first code setup:
- Total number of parameters -
9,734 (<10k)
- Training accuracy at 15th epoch -
98.81%
- Testing accuracy at 15th epoch -
98.62%
- Training accuracy at 20th epoch -
99.0%
- Testing accuracy at 20th epoch -
98.73%
Also below is the graph generated after training:
Following are the analysis of this first code setup:
- We build a CNN which is able to train under 10k parameter.
- Highest train and test accuracy (20th epoch) is 99.0% and 98.73% resp. which is very less. Accuracy can be further improved.
- Based on accuracy, model seems to be overfitting as training accuracy is larger then testing accruacy.
In this section we will look into the second code setup which is present in step_2/EVA_assignment_4-step2.ipynb
. This is an interesting step where batch normalization is introduced and results looks good.
Following are the targets for second code setup.
- Improve overall train and test accuracy.
- Improve model overfitting i.e. reduce difference between train and test accuracy.
- Introduce very necessary component known as "batch normalization" in the CNN architecture
Below is an image of model architecture in second code setup.
- Since only batch normalization is introduced, Receptive Field calculation and input-output shapes will be same hence not presented in above image.
- Also as represented, model architecture has four major convoluation blocks and one transition block. (This representation applies for all model mentioned here).
Below are the results of second code setup.
- Total number of parameters - `9,930 (<10k) (small increase due to Batch norm learnable mean and standerd deviation)
- Training accuracy at 15th epoch -
99.55%
- Testing accuracy at 15th epoch -
99.36%
- Training accuracy at 20th epoch -
99.62%
- Testing accuracy at 20th epoch -
99.43%
Below in an graph image produced from training-testing loss and accraucy:
Following are the analysis of this second code setup:
- Over all accuracy of train and test dataset has been improved by alot (train: 98.81 to 99.55, test: 98.62 to 99.36)while using "Batch Normalization" in CNN architecture compared to the first setup.
- "Batch Normalization" normalizes feature map across batches in each layer hence fixing the distribution of data.
- There is still exist the problem of overfitting as training accuracy is still larger then testing accruacy although the gap is now smaller comare to first setup.
- We didn't reach the target of 99.4% test accuracy in the second code setup.
- Parameter counts increased by ~150 after adding batch norm because batch norm introduce new learnable parameters i.e. mean and std (alpha, beta).
In this section we will look into the third code setup which is present in step_3/EVA_assignment_4-step3.ipynb
. This is also an interesting step because here we see effect of underfitting while training the same CNN.
Following are the targets for third code setup.
- Reduce the model overfitting.
- Introduce "Drop Out" in CNN model. This will also helps with overfitting by randomly killing neurons in a layer while training the model and forcing other layers neuron not to focus on single neuron everytime.
- Use image transformation to augment training images. Image Augmentation helps to reduce overfitting by forcing model to fit more images so that any kind of bias is ruled output.
Below is an image of model architecture in third code setup.
- Since only Drop Out is introduced, Receptive Field calculation and input-output shapes will be same hence not presented in above image.
Below are the results of third code setup.
- Total number of parameters -
9,930 (<10k)
- Training accuracy at 15th epoch -
98.77%
- Testing accuracy at 15th epoch -
99.16%
- Training accuracy at 20th epoch -
98.84%
- Testing accuracy at 20th epoch -
99.26%
Below in an graph image produced from training-testing loss and accraucy:
Following are the analysis of this third code setup:
- Model is now underfitting i.e. testing accuracy is larger then training accuracy.
- Due to larger underfitting accuracy dropped for both training and testing dataset. Not good for the model performance.
- Due to "Drop Out" model could be suffering from excessive regularization which migh have impacted overall perfromance of the model as well as underfitting.
- Accuracy is also fluctuating alot due to fluctuation in loss in later epochs. Maybe changing the learning rate in step can help to smooth out the decent.
In this section we will look into the fourth and final code setup which is present in step_4/EVA_assignment_4-step4.ipynb
.
Following are the targets for fourth code setup.
- Improve underfitting of the model by setting Drop Out probablity to 0 i.e. we will not perform Drop Out while training in this code setup.
- Improve accuracy/loss fluctuation by introducing step-wise learning rate decay using
StepLR()
undertorch.optim.lr_scheduler
. This is a pytorch module which takes optimizer, decay-rate and step-size to reduce the learning rate while training the model.
Below is an image of model architecture in fourth code setup.
- In this architecture "Drop Out" has been removed hence there is no drop out in any layer in above image.
- Last 1x1 convolution does not has any activation, batch norm, etc. because last layer should not have anything.
Below are the results of fourth code setup.
- Total number of parameters -
9,930 (<10k)
- Training accuracy at 15th epoch -
99.48%
- Testing accuracy at 15th epoch -
99.49%
- Training accuracy at 20th epoch -
99.45%
- Testing accuracy at 20th epoch -
98.49%
Below in an graph image produced from training-testing loss and accraucy:
Following are the analysis of this fourth code setup:
- Setting Drop out to 0 (no Drop out) improves model accuracy for both train and test dataset. Means Drop Out was doing excessive regularization in third code setup.
- Only data augmentation was enough to solve our overfitting problem in code setup 2.
- By introducing Learning rate decay Model is giving consistant accuaracy for training and testing set.
- Test accuracy at 15th epoch - 99.49%
- Consistant? - YES (consistantly hitting from 6th epoch onward till 20th)
- parameters - 9,930 (under 10k)
EPOCH: 12
Loss=0.008368389680981636 Batch_id=937 Accuracy=99.47: 100%|██████████| 938/938 [00:49<00:00, 18.90it/s]
Test set: Average loss: 0.0173, Accuracy: 9947/10000 (99.47%)
EPOCH: 13
Loss=0.004195100627839565 Batch_id=937 Accuracy=99.50: 100%|██████████| 938/938 [00:50<00:00, 18.60it/s]
Test set: Average loss: 0.0170, Accuracy: 9948/10000 (99.48%)
EPOCH: 14
Loss=0.007462228648364544 Batch_id=937 Accuracy=99.47: 100%|██████████| 938/938 [00:49<00:00, 18.80it/s]
Test set: Average loss: 0.0169, Accuracy: 9952/10000 (99.52%)
EPOCH: 15
Loss=0.004266421776264906 Batch_id=937 Accuracy=99.48: 100%|██████████| 938/938 [00:49<00:00, 18.80it/s]
Test set: Average loss: 0.0169, Accuracy: 9949/10000 (99.49%)
In this section we will look into bonus code setup which will try to achieve target for bonus points.
This is an extra step where we will try to achieve consistant upto 99.4% accruacy under 8k parameters
- Since our model is already achieving desired targets in 9,930 parameters. We can utilize same Architecture and reduce parameters count by playing around number of channels.
- we want to reduce parameters from 9,930 to something less then 8,000 to get the additional point as mentioned in the assignment.
- Lets also add another augmentation known as ColorJitter which will play around with the brightness, contrast, etc. This will allow us to learn more rich features from image.
Below is an image of model architecture in bonus code setup.
- Here we have changed number of channels in almost layers (refer to code
Bonus_step/EVA_assignment_4-BonusStep.ipynb
for more information)
Below are the results of bonus code setup.
- Total number of parameters - 7,926 (<8k)
- Training accuracy at 15th epoch - 99.26%
- Testing accuracy at 15th epoch - 99.46%
- Training accuracy at 20th epoch - 99.21%
- Testing accuracy at 20th epoch - 98.46%
Below in an graph image produced from training-testing loss and accraucy:
Following are the analysis of this bonus code setup:
- By reducing the number of channels in the fourth code setup CNN, we were able to hit parameter count of 7,926 which is lesser then 8,000.
- Reducing parameters count also reduces model complexity which lead to small underfitting as can be seen by test and train accuracy.
- Test accuracy at 15th epoch - 99.49%
- Consistant? - YES (consistantly hitting from 10th epoch onward till 20th)
- parameters - 7,926 (under 8k)
EPOCH: 12
Loss=0.07796341925859451 Batch_id=937 Accuracy=99.26: 100%|██████████| 938/938 [01:17<00:00, 12.09it/s]
Test set: Average loss: 0.0197, Accuracy: 9946/10000 (99.46%)
EPOCH: 13
Loss=0.016041133552789688 Batch_id=937 Accuracy=99.23: 100%|██████████| 938/938 [01:17<00:00, 12.11it/s]
Test set: Average loss: 0.0197, Accuracy: 9949/10000 (99.49%)
EPOCH: 14
Loss=0.0026311047840863466 Batch_id=937 Accuracy=99.24: 100%|██████████| 938/938 [01:17<00:00, 12.06it/s]
Test set: Average loss: 0.0195, Accuracy: 9946/10000 (99.46%)
EPOCH: 15
Loss=0.007067004218697548 Batch_id=937 Accuracy=99.26: 100%|██████████| 938/938 [01:16<00:00, 12.31it/s]
Test set: Average loss: 0.0201, Accuracy: 9944/10000 (99.44%)