Project to recognize hand gesture using state of the art neural networks.
Aman Srivastava [https://www.linkedin.com/in/amansrivastava1/]
Nurul Q Khan [https://www.linkedin.com/in/nurulquamar/]
Prakash Srinivasan [https://www.linkedin.com/in/prakash-srinivasan-6641812/]
Tim Kumar [https://www.linkedin.com/in/tim-kumar-b1519252/]
Imagine you are working as a data scientist at a home electronics company which manufactures state of the art smart televisions. You want to develop a cool feature in the smart-TV that can recognise five different gestures performed by the user which will help users control the TV without using a remote
The gestures are continuously monitored by the webcam mounted on the TV. Each gesture corresponds to a specific command:
- Thumbs up: Increase the volume
- Thumbs down: Decrease the volume
- Left swipe: 'Jump' backwards 10 seconds
- Right swipe: 'Jump' forward 10 seconds
- Stop: Pause the movie
Each video is a sequence of 30 frames (or images)
The training data consists of a few hundred videos categorised into one of the five classes. Each video (typically 2-3 seconds long) is divided into a sequence of 30 frames(images). These videos have been recorded by various people performing one of the five gestures in front of a webcam - similar to what the smart TV will use.
The data is in a zip file. The zip file contains a 'train' and a 'val' folder with two CSV files for the two folders.
Model Name | Model Type | Number of parameters | Augment Data | Model Size(in MB) | Highest Validation accuracy | Corres-ponding Training accuracy | Observations |
---|---|---|---|---|---|---|---|
conv_3d1_model | Conv3D | 1,117,061 | No | NA | 78% | 99% | Model is over-fitting. Augment data using cropping |
conv_3d2_model | Conv3D | 3,638,981 | Yes | 43.8 | 85% | 91% | Model is not over-fitting. Next we will try to reduce the parameter size. Moreover since we see minor oscillations in loss, let's try lowering the learning rate to 0.0002 |
conv_3d3_model | Conv3D | 1,762,613 | Yes | 21.2 | 85% | 83% | Model has stable results .Also we were able to reduce the parameter size by half. Let's trying adding more layers at the same level of abstractions |
conv_3d4_model | Conv3D | 2,556,533 | Yes | 30.8 | 76% | 89% | With more layers added model is over-fitting. Let's try adding dropouts at the convolution layers |
conv_3d5_model | Conv3D | 2,556,533 | Yes | 30.8 | 70% | 89% | Adding dropouts has further reduced validation accuracy as its not to learn generalizable features and its further over-fitting |
conv_3d6_model | Conv3D | 696,645 | Yes | 8.46 | 77% | 92% | Reducing the number of network parameters by reducing image resolution/ filter size and dense layer neurons. Comparably good validation accuracy |
conv_3d7_model | Conv3D | 504,709 | Yes | 6.15 | 77% | 85% | |
conv_3d8_model | Conv3D | 230,949 | Yes | 2.87 | 78% | 86% | |
rnn_cnn1_model | CNN-LSTM | 1,657,445 | Yes | 20 | 75% | 92% | Model is over-fitting. Let’s try reducing the number of layers in next iteration |
Model Name | Model Type | Number of parameters | Augment Data | Model Size(in MB) | Highest validation accuracy | Corresponding Training accuracy |
---|---|---|---|---|---|---|
conv_3d10_model | Conv3D | 3,638,981 | Yes | 43.8 | 86% | 86% |
conv_3d11_model | Conv3D | 1,762,613 | Yes | 21.2 | 78 % | 79 % |
conv_3d12_model | Conv3D | 2,556,533 | Yes | 30.8 | 81% | 84% |
conv_3d13_model | Conv3D | 2,556,533 | Yes | 30.8 | 31% | 78% |
conv_3d14_model | Conv3D | 696,645 | Yes | 8.46 | 77% | 87% |
conv_3d15_model | Conv3D | 504,709 | Yes | 6.15 | 75% | 82% |
conv_3d16_model | Conv3D | 230,949 | Yes | 2.87 | 76% | 77% |
rnn_cnn2_model | CNN-LSTM | 1,346,021 | Yes | 31 | 78% | 96% |
Model Name | Number of parameters | Augment Data | Model Size(in MB) | Highest validation accuracy | Corres-ponding Training accuracy | Observations |
---|---|---|---|---|---|---|
rnn_cnn_tl_model | 3,840,453 | Yes | 20.4 | 56% | 85% | For this experiment, Mobilenet layer weights are not trained. Validation accuracy is very poor. So let’s train mobilenet layer’s weights as well |
rnn_cnn_tl2_model | 3,692,869 | Yes | 42.3 | 97% | 99% | We get a better accuracy on training mobilenet layer’s weights as well. |