Skip to content

This project is about predicting the digit that a given sign language represents. The aim is to build a convolutional neural network in Keras to classify each digit sign language image to a number between 0 to 9.

Notifications You must be signed in to change notification settings

francislata/Sign-Language-Digits-CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sign Language Digits - CNN

Description

This project is about using the Sign Language Digits Dataset to classify images of sign language digits. This is similar to the MNIST dataset that has been used throughout the years to classify a grayscale, handwritten digits between 0 to 9.

Goal

The idea behind this project is to create a convolutional neural network (CNN) model to classify a sign language digit image to a digit between 0 to 9.

Furthermore, this demonstrates how I approach designing of a learning model and its development. I will outline the designs I have considered and evaluating their performance in this dataset.

About the dataset

This dataset has been provided by Turkey Ankara Ayrancı Anadolu High School and I have found this dataset through Kaggle. The images are converted to grayscale images of size 64 x 64.

Technologies used

In this project, I will be using Python as the programming language of choice. Also, I will use the Keras framework to create the layers of the CNN model.

How to run

Type in the following command:

python run.py

Ensure that all requirements (found here) have been met in order to run the project.

Architecture

The architecture used in this model is the following

  • Convolution 1D: 32 filters and 3 x 1 kernel size
  • Maximum Pooling 1D: 2 x 1 kernel size
  • Convolution 1D: 64 filters and 3 x 1 kernel size
  • Maximum Pooling 1D: 2 x 1 kernel size
  • Convolution 1D: 128 filters and 3 x 1 kernel size
  • Maximum Pooling 1D: 2 x 1 kernel size
  • Convolution 1D: 256 filters and 3 x 1 kernel size
  • Maximum Pooling 1D: 2 x 1 kernel size
  • Flatten
  • Dense: 1024 hidden units
  • Dropout: 0.5 hidden unit drop probability
  • Dense: 512 hidden units
  • Dropout: 0.5 hidden unit drop probability
  • Dense: 256 hidden units
  • Dense: 10 output units corresponding to digits 0 to 9

This architecture is inspired by the VGG16 network with the paper found here. In this paper, configuration A has been used as the starting point.

Process arriving to the final architecture

Due to the small amount of data, I have to ensure that the amount of parameters is kept to a small amount to ensure that it does not overfit the training set. As a result, I have limited the number of convolution operations performed in each of the pixels.

Also, the number of hidden units in this architecture is reduced as it gets closer to the output layer. The application of dropout in between each dense layer helps to reduce the effect of overfitting.

Result

The highest test set accuracy received after 50 epochs is 93.46%.

The training set accuracy is 99.39% and the validation set accuracy is 88.48%.

The loss function for the training and validation sets is shown here:

Credits

The dataset and its original source can be found through Kaggle's website here.

The arXiv paper that refers to the VGG16 network can be found here.

About

This project is about predicting the digit that a given sign language represents. The aim is to build a convolutional neural network in Keras to classify each digit sign language image to a number between 0 to 9.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages