Skip to content

GoogLeNet / InceptionV1, AlexNet, VGG (VGG-11, VGG-13, VGG-16, VGG-19), LeNet, and ResNet

Notifications You must be signed in to change notification settings

abhamedewar/classification_networks

Repository files navigation

Implementation of State-of-the-Art Convolutional Neural Network from Scratch

GoogLeNet, ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, AlexNet, LeNet, VGG11, VGG13, VGG16, VGG19

Custom Dataset can be used to train the networks present in this repository. Just provide the dataset in the format given below and run any model of choice.

Dataset format:

Root directory

  • The root directory should contain all the image files related to the dataset.

CSV File

  • The dataset is accompanied by a CSV file that contains two columns: "Image Name" and "Class". This CSV file serves as a reference to associate each image with its respective class or label.
  • The "Image Name" column contains the names of the image files present in the root directory. Each entry in this column should uniquely identify an image file.
  • The "Class" column represents the corresponding class or label for each image. It contains numerical values.

Class Mapping JSON:

  • Json file with class mapping. Refer class_mapping.json.

Running the code:

Currently the code supports the following CNN architectures: ['googlenet', 'resnet', 'alexnet', 'lenet', 'vgg11', 'vgg13', 'vgg16', 'vgg19'].

Once you have the dataset ready in the above format just run the code by executing the following command:

Install the dependencies:

pip install -r requirements.txt

Run the code:

python main.py --data_path <folder with all images> --csv_path <path to csv file> --class_mapping <json file with class mapping> --network_type <cnn type>

Details of various CNN architectures

GoogLeNet/Inception v1

The paper can be found at the following link: https://arxiv.org/pdf/1409.4842.pdf

  • The main contribution of the GoogLeNet architecture is the inception module.
  • Making the decision between pooling and convolutional operations, as well as determining the size and number of filters applied to the output of the previous layer, is a critical aspect of the network architecture design process.
  • The Inception module addresses the trade-off between pooling and convolutional operations by running multiple operations simultaneously, such as pooling and convolution, and using multiple filter sizes (e.g., 3x3, 5x5). This approach allows for capturing diverse features without compromising performance.
  • Input image size- 224 x 224 x 3

Architecture Details:

image

  • #3×3 reduce and #5×5 reduce stands for the number of 1×1 filters in the reduction layer used before the 3×3 and 5×5 convolutions.
  • pool proj column is the number of 1×1 filtersafter the built-in max-pooling.

Inception block

image

ResNet

The paper can be found at the following link: https://arxiv.org/abs/1512.03385

  • The problem with deeper networks is that it can cause vanishing gradient problem.
  • The main idea behind the paper is to use skip connections to address the problem of vanishing gradient.(Introduced residual blocks as shown below)
  • The way to create ResNet is taking multiple residual blocks and stacking them to create a deep neural network.
  • Input image size- 224 x 224 x 3

Residual Block

image

Architecture Details:

image

AlexNet

  • Input image size: 32321
  • 5 convolutional layers, 3 fully connected layers

Architecture Details:

image

Source: https://www.mdpi.com/2072-4292/9/8/848

VGG

The paper can be found at the following link: https://arxiv.org/pdf/1409.1556v6.pdf

  • This repository consists of implementation of VGG-11, VGG-13, VGG-16 and VGG-19 architectures.
  • Input image size: 224 x 224 x 3
  • Mean RGB value of training set is substracted from each image in training set.
  • Batch size- 256
  • L2- 5*10^-4
  • Dropout ratio- 0.5
  • Momentum- 0.9
  • The learning rate was initially set to 10^−2 and then decreased by a factor of 10 when the validation set accuracy stopped improving.

Architecture Details:

The different configurations of VGG are:

image

  • Kernel size for convolutional layers = 3 x 3, stride = 1
  • Kernel size for MaxPooling is 2 x 2, stride = 2

LeNet

The link to the paper: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

  • The LeNet-5 architecture is a classic convolutional neural network (CNN) designed by Yann LeCun et al. It is primarily known for its effectiveness in handwritten digit recognition tasks. This section provides an overview of the LeNet architecture and its components.
  • The LeNet architecture consists of three convolutional layers (self.conv1, self.conv2, self.conv3) followed by two fully connected layers (self.fc1, self.fc2).
  • LeNet uses tanh and sigmoid activation function.
  • Input image size: 32321

Architecture Details:

image

  • Convolutional Layer 1: Input Channels: 1, Output Channels: 6, Stride: (1, 1), Kernel Size: (5, 5)
  • Convolutional Layer 2: Input Channels: 6, Output Channels: 16, Stride: (1, 1), Kernel Size: (5, 5)
  • Convolutional Layer 3: Input Channels: 16, Output Channels: 120, Stride: (1, 1), Kernel Size: (5, 5)
  • Fully Connected Layer 1: Input Features: 120, Output Features: 84
  • Fully Connected Layer 2 (Output Layer): Input Features: 84, Output Features: Number of classes in the classification task (variable num_classes)

About

GoogLeNet / InceptionV1, AlexNet, VGG (VGG-11, VGG-13, VGG-16, VGG-19), LeNet, and ResNet

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages