EVA-8_Phase-1_Assignment-4

This is the assignment of 4th session in phase-1 of EVA-8 from TSAI.

Introduction

Objective:

Objective of this assignment is to build a CNN based network which will take MNIST Handwritten Dataset and will achieve targets as mentioned below.

Upto 99.4% test accuracy.
Using total parameters count of under 10,000.
Test accuracy should be consistant and need to be achieved till training reaches 15th epoch.

While achieving above mentioned target code needs to be writtern under certain conditions as mentioned below:

Target should be achieved in more then 3 steps in a gradual fashion. Means there should be more then 3 versions of the code where each code will be progressive in nature.
Each code version should have a target, result and analysis block showing what was the target of that code and what were the results and your analysis.

Repository setup:

This repository contains 4 folders, each folder contains colab copy of notebook of the training code and a model architecture image. Below are the information about the following folders.

step_1/ - This folder (as name suggest) contains colab copy of notebook of first step i.e. first code setup step_1/EVA_assignment_4-step1.ipynb.
step_2/ - This folder contains colab copy of notebook of second step i.e. second code setup step_2/EVA_assignment_4-step2.ipynb
step_3/ - This folder contains colab copy of notebook of third step i.e. Third code setup step_3/EVA_assignment_4-step3.ipynb
step_4/ - This folder contains colab copy of notebook of fourth step i.e. fourth code setup step_4/EVA_assignment_4-step4.ipynb
Bonus_step/ - This assignment has a bonus part in which same target should be achieved in less then 8,000 parameters for bonus points. So this folder contains colab copy of notebook of that extra bonus step code setup Bonus_step/EVA_assignment_4-Bonusstep.ipynb

NOTE: This means that required target is achieved in four steps and an extra step to achieve bonus target.

Dataset representation

As mentioned earlier, in this assignment we are using MNIST Handwritten Digit Dataset to train our CNN model to achieve mentioned target. Below is an image representing subset of the image data which we will be using in this assignment.

First code setup (step 1)

In this section we will look into the first code setup which is present in step_1/EVA_assignment_4-step1.ipynb. We will look into the target set for first code setup step, model architecture, result target, analysis and output logs.

Target (step 1)

Below are the target for step1.

Get the setup correct and working. Because it is required to have basic working code for training and evaluation.

setup basic data transformations like normalization.
setup data loader for training and testing.
setup training and evaluation loop

Build CNN architecture/skeleton to have <10,000 parameters. Because assignment strongly mentions to have less then 10k parameters, so building larger CNN model is useless and will cause problem in follow-up code setup.

Model architecture (step 1)

As mentioned in target that we will be targeting a model architecutre which will have less than 10k parameters. Below is an image of model architecture which achieve this target. Above image also contians inputs and outputs of each layer as well as Receptive Field and it's calcuation. Following are the explaination of terms used for Receptive Field calculation.

jumpIN - pixel jump calculated for previous layer.
jumpOUT - pixel jump calculated for current layer. Formula for jumpOUT is jumpOUT = jumpIN x stride
stride - by how many pixel fliter is sliding our feature map.
rIN - receptive field of previous layer.
rOUT - receptive filed of current layer. Formula for rOUT is rIN + (kernel_size - 1)xjumpIN

Result (step 1)

Below are the result achieved in the first code setup:

Total number of parameters - 9,734 (<10k)
Training accuracy at 15th epoch - 98.81%
Testing accuracy at 15th epoch - 98.62%
Training accuracy at 20th epoch - 99.0%
Testing accuracy at 20th epoch - 98.73%

Also below is the graph generated after training:

Analysis (step 1)

Following are the analysis of this first code setup:

We build a CNN which is able to train under 10k parameter.
Highest train and test accuracy (20th epoch) is 99.0% and 98.73% resp. which is very less. Accuracy can be further improved.
Based on accuracy, model seems to be overfitting as training accuracy is larger then testing accruacy.

Second code setup (step 2)

In this section we will look into the second code setup which is present in step_2/EVA_assignment_4-step2.ipynb. This is an interesting step where batch normalization is introduced and results looks good.

Target (step 2)

Following are the targets for second code setup.

Improve overall train and test accuracy.
Improve model overfitting i.e. reduce difference between train and test accuracy.
Introduce very necessary component known as "batch normalization" in the CNN architecture

Model architecture (step 2)

Below is an image of model architecture in second code setup.

Since only batch normalization is introduced, Receptive Field calculation and input-output shapes will be same hence not presented in above image.
Also as represented, model architecture has four major convoluation blocks and one transition block. (This representation applies for all model mentioned here).

Result (step 2)

Below are the results of second code setup.

Total number of parameters - `9,930 (<10k) (small increase due to Batch norm learnable mean and standerd deviation)
Training accuracy at 15th epoch - 99.55%
Testing accuracy at 15th epoch - 99.36%
Training accuracy at 20th epoch - 99.62%
Testing accuracy at 20th epoch - 99.43%

Below in an graph image produced from training-testing loss and accraucy:

Analysis (step 2)

Following are the analysis of this second code setup:

Over all accuracy of train and test dataset has been improved by alot (train: 98.81 to 99.55, test: 98.62 to 99.36)while using "Batch Normalization" in CNN architecture compared to the first setup.
"Batch Normalization" normalizes feature map across batches in each layer hence fixing the distribution of data.
There is still exist the problem of overfitting as training accuracy is still larger then testing accruacy although the gap is now smaller comare to first setup.
We didn't reach the target of 99.4% test accuracy in the second code setup.
Parameter counts increased by ~150 after adding batch norm because batch norm introduce new learnable parameters i.e. mean and std (alpha, beta).

Third code setup (step 3)

In this section we will look into the third code setup which is present in step_3/EVA_assignment_4-step3.ipynb. This is also an interesting step because here we see effect of underfitting while training the same CNN.

Target (step 3)

Following are the targets for third code setup.

Reduce the model overfitting.
Introduce "Drop Out" in CNN model. This will also helps with overfitting by randomly killing neurons in a layer while training the model and forcing other layers neuron not to focus on single neuron everytime.
Use image transformation to augment training images. Image Augmentation helps to reduce overfitting by forcing model to fit more images so that any kind of bias is ruled output.

Model architecture (step 3)

Below is an image of model architecture in third code setup.

Since only Drop Out is introduced, Receptive Field calculation and input-output shapes will be same hence not presented in above image.

Result (step 3)

Below are the results of third code setup.

Total number of parameters - 9,930 (<10k)
Training accuracy at 15th epoch - 98.77%
Testing accuracy at 15th epoch - 99.16%
Training accuracy at 20th epoch - 98.84%
Testing accuracy at 20th epoch - 99.26%

Below in an graph image produced from training-testing loss and accraucy:

Analysis (step 3)

Following are the analysis of this third code setup:

Model is now underfitting i.e. testing accuracy is larger then training accuracy.
Due to larger underfitting accuracy dropped for both training and testing dataset. Not good for the model performance.
Due to "Drop Out" model could be suffering from excessive regularization which migh have impacted overall perfromance of the model as well as underfitting.
Accuracy is also fluctuating alot due to fluctuation in loss in later epochs. Maybe changing the learning rate in step can help to smooth out the decent.

Fourth code setup (step 4)

In this section we will look into the fourth and final code setup which is present in step_4/EVA_assignment_4-step4.ipynb.

Target (step 4)

Following are the targets for fourth code setup.

Improve underfitting of the model by setting Drop Out probablity to 0 i.e. we will not perform Drop Out while training in this code setup.
Improve accuracy/loss fluctuation by introducing step-wise learning rate decay using StepLR() under torch.optim.lr_scheduler. This is a pytorch module which takes optimizer, decay-rate and step-size to reduce the learning rate while training the model.

Model architecture (step 4)

Below is an image of model architecture in fourth code setup.

In this architecture "Drop Out" has been removed hence there is no drop out in any layer in above image.
Last 1x1 convolution does not has any activation, batch norm, etc. because last layer should not have anything.

Result (step 4)

Below are the results of fourth code setup.

Total number of parameters - 9,930 (<10k)
Training accuracy at 15th epoch - 99.48%
Testing accuracy at 15th epoch - 99.49%
Training accuracy at 20th epoch - 99.45%
Testing accuracy at 20th epoch - 98.49%

Below in an graph image produced from training-testing loss and accraucy:

Analysis (step 4)

Following are the analysis of this fourth code setup:

Setting Drop out to 0 (no Drop out) improves model accuracy for both train and test dataset. Means Drop Out was doing excessive regularization in third code setup.
Only data augmentation was enough to solve our overfitting problem in code setup 2.
By introducing Learning rate decay Model is giving consistant accuaracy for training and testing set.

We reached our objective of consistant >99.4% accuracy under 10k parameters and under 15 epochs

Test accuracy at 15th epoch - 99.49%
Consistant? - YES (consistantly hitting from 6th epoch onward till 20th)
parameters - 9,930 (under 10k)

Training log snippet (step 4)

EPOCH: 12
Loss=0.008368389680981636 Batch_id=937 Accuracy=99.47: 100%|██████████| 938/938 [00:49<00:00, 18.90it/s]

Test set: Average loss: 0.0173, Accuracy: 9947/10000 (99.47%)

EPOCH: 13
Loss=0.004195100627839565 Batch_id=937 Accuracy=99.50: 100%|██████████| 938/938 [00:50<00:00, 18.60it/s]

Test set: Average loss: 0.0170, Accuracy: 9948/10000 (99.48%)

EPOCH: 14
Loss=0.007462228648364544 Batch_id=937 Accuracy=99.47: 100%|██████████| 938/938 [00:49<00:00, 18.80it/s]

Test set: Average loss: 0.0169, Accuracy: 9952/10000 (99.52%)

EPOCH: 15
Loss=0.004266421776264906 Batch_id=937 Accuracy=99.48: 100%|██████████| 938/938 [00:49<00:00, 18.80it/s]

Test set: Average loss: 0.0169, Accuracy: 9949/10000 (99.49%)

Bonus code setup (bonus step for bonus points)

In this section we will look into bonus code setup which will try to achieve target for bonus points.

Target (bonus step)

This is an extra step where we will try to achieve consistant upto 99.4% accruacy under 8k parameters

Since our model is already achieving desired targets in 9,930 parameters. We can utilize same Architecture and reduce parameters count by playing around number of channels.
we want to reduce parameters from 9,930 to something less then 8,000 to get the additional point as mentioned in the assignment.
Lets also add another augmentation known as ColorJitter which will play around with the brightness, contrast, etc. This will allow us to learn more rich features from image.

Model architecture (bonus step)

Below is an image of model architecture in bonus code setup.

Here we have changed number of channels in almost layers (refer to code Bonus_step/EVA_assignment_4-BonusStep.ipynb for more information)

Result (bonus step)

Below are the results of bonus code setup.

Total number of parameters - 7,926 (<8k)
Training accuracy at 15th epoch - 99.26%
Testing accuracy at 15th epoch - 99.46%
Training accuracy at 20th epoch - 99.21%
Testing accuracy at 20th epoch - 98.46%

Below in an graph image produced from training-testing loss and accraucy:

Analysis (bonus step)

Following are the analysis of this bonus code setup:

By reducing the number of channels in the fourth code setup CNN, we were able to hit parameter count of 7,926 which is lesser then 8,000.
Reducing parameters count also reduces model complexity which lead to small underfitting as can be seen by test and train accuracy.

We reached our objective of consistant >99.4% accuracy under 8k parameters and under 15 epochs

Test accuracy at 15th epoch - 99.49%
Consistant? - YES (consistantly hitting from 10th epoch onward till 20th)
parameters - 7,926 (under 8k)

Training log snippet (bonus step)

EPOCH: 12
Loss=0.07796341925859451 Batch_id=937 Accuracy=99.26: 100%|██████████| 938/938 [01:17<00:00, 12.09it/s]

Test set: Average loss: 0.0197, Accuracy: 9946/10000 (99.46%)

EPOCH: 13
Loss=0.016041133552789688 Batch_id=937 Accuracy=99.23: 100%|██████████| 938/938 [01:17<00:00, 12.11it/s]

Test set: Average loss: 0.0197, Accuracy: 9949/10000 (99.49%)

EPOCH: 14
Loss=0.0026311047840863466 Batch_id=937 Accuracy=99.24: 100%|██████████| 938/938 [01:17<00:00, 12.06it/s]

Test set: Average loss: 0.0195, Accuracy: 9946/10000 (99.46%)

EPOCH: 15
Loss=0.007067004218697548 Batch_id=937 Accuracy=99.26: 100%|██████████| 938/938 [01:16<00:00, 12.31it/s]

Test set: Average loss: 0.0201, Accuracy: 9944/10000 (99.44%)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Bonus_step		Bonus_step
step_1		step_1
step_2		step_2
step_3		step_3
step_4		step_4
README.md		README.md

devdastl/-EVA-8_Phase-1_Assignment-4

Folders and files

Latest commit

History

Repository files navigation

EVA-8_Phase-1_Assignment-4

Introduction

Objective:

Repository setup:

Dataset representation

First code setup (step 1)

Target (step 1)

Model architecture (step 1)

Result (step 1)

Analysis (step 1)

Second code setup (step 2)

Target (step 2)

Model architecture (step 2)

Result (step 2)

Analysis (step 2)

Third code setup (step 3)

Target (step 3)

Model architecture (step 3)

Result (step 3)

Analysis (step 3)

Fourth code setup (step 4)

Target (step 4)

Model architecture (step 4)

Result (step 4)

Analysis (step 4)

We reached our objective of consistant >99.4% accuracy under 10k parameters and under 15 epochs

Training log snippet (step 4)

Bonus code setup (bonus step for bonus points)

Target (bonus step)

Model architecture (bonus step)

Result (bonus step)

Analysis (bonus step)

We reached our objective of consistant >99.4% accuracy under 8k parameters and under 15 epochs

Training log snippet (bonus step)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages