EmoPy is a python toolkit with deep neural net classes which accurately predict emotions given images of people's faces.
Figure from @Chen2014FacialER
The aim of this project is to make accurate Facial Expression Recognition (FER) models free, open, easy to use, and easy to integrate into different projects. We also aim to expand our development community, and we are open to suggestions and contributions. Please contact us to discuss.
EmoPy includes four primary modules that are plugged together to build a trained FER prediction model.
fermodel.py
neuralnets.py
imageprocessor.py
featureextractor.py
The fermodel.py
module operates the other modules as shown below, making it the easiest entry point to get a trained model up and running quickly.
The imageprocessor.py
and featureextractor.py
modules are designed to allow you to experiment with raw images, processed images, and feature extraction.
Each of the modules contains one class, except for neuralnets.py
, which has one interface and three subclasses. Each of these subclasses implements a different neural net model using the Keras framework with Tensorflow backend, allowing you to experiment and see which one performs best for your needs.
The EmoPy documentation contains detailed information on the classes and their interactions. Also, an overview of the different neural nets included in this project is included below.
As of this moment, in order to use this repository you will have to provide your own labeled facial expression image dataset. We aim to provide pre-trained prediction models in the near future, but for now you can try out the system using your own dataset or a small dataset we have provided in the image-data subdirectory.
Predictions ideally perform well on a diversity of datasets, illumination conditions, and subsets of the standard 7 emotion labels (happiness, anger, fear, surprise, disgust, sadness, calm/neutral) seen in FER research. Some good example public datasets are the Extended Cohn-Kanade and FER+.
To get started, clone the directory and open it in your terminal.
git clone https://github.com/thoughtworksarts/EmoPy.git
cd EmoPy
You will need to install Python 3.6.3 using Homebrew. If you do not have Homebrew installed run this command to install:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Now you can install Python 3.6.3 using Homebrew.
brew install python3
The next step is to set up a virtual environment using virtualenv. Install virtualenv with sudo.
sudo pip install virtualenv
To create and activate the virtual environment, make sure you are in the EmoPy
directory and run:
virtualenv -p $(which python3) venv
source venv/bin/activate
Your terminal command line should now be prefixed with (venv)
.
The last step is to install the remaining dependencies using pip.
pip install -r requirements.txt
Now you're ready to go!
To deactivate the virtual environment run deactivate
in the command line. You'll know it has been deactivated when the prefix (venv)
disappears.
You can find example code to run each of the current neural net classes in examples. The best place to start is the FERModel example. Here is a listing of that code:
import sys
sys.path.append('../')
from fermodel import FERModel
target_emotions = ['anger', 'fear', 'neutral', 'sad', 'happy', 'surprise', 'disgust']
csv_file_path = "image_data/sample.csv"
model = FERModel(target_emotions, csv_data_path=csv_file_path, raw_dimensions=(48,48), csv_image_col=1, csv_label_col=0, verbose=True)
model.train()
The code above initializes and trains an FER deep neural net model for the target emotions listed using the sample images from the a small csv dataset. As you can see, all you have to supply with this example is a set of target emotions and a data path.
Once you have completed the installation, you can run this example by moving into the examples folder and running the example script.
cd examples
python fermodel_example.py
The first thing the example does is initialize the model. A summary of the model architecture will be printed out. This includes a list of all the neural net layers and the shape of their output. Our models are built using the Keras framework, which offers this visualization function.
You will see the training and validation accuracies of the model being updated as it is trained on each sample image. The validation accuracy will be very low since we are only using three images for training and validation. It should look something like this:
The Time-Delayed 3D-Convolutional Neural Network model is inspired by the work described in this paper written by Dr. Hongying Meng of Brunel University, London. It uses temporal information as part of its training samples. Instead of using still images as training samples, it uses past images from a series for additional context. One training sample will contain n number of images from a series and its emotion label will be that of the most recent image. The idea is to capture the progression of a facial expression leading up to a peak emotion.
Facial expression image sequence in Cohn-Kanade database from @Jia2014
The Convolutional Long Short Term Memory neural net is a convolutional and recurrent neural network hybrid. Convolutional NNs (CNNs) use kernels, or filters, to find patterns in smaller parts of an image. Recurrent NNs (RNNs) take into account previous training examples, similar to the Time-Delay Neural Network, for context. This model is able to both extract local data from images and use temporal context.
The Time-Delay model and this model differ in how they use temporal context. The former only takes context from within video clips of a single face as shown in the figure above. The ConvolutionLstmNN is given still images that have no relation to each other. It looks for pattern differences between past image samples and the current sample as well as their labels. It isn’t necessary to have a progression of the same face, simply different faces to compare.
Figure from @vanGent2016
This model uses a technique known as Transfer Learning, where pre-trained deep neural net models are used as starting points. The pre-trained models it uses are trained on images to classify objects. The model then retrains the pre-trained models using facial expression images with emotion classifications rather than object classifications. It adds a couple top layers to the original model to match the number of target emotions we want to classify and reruns the training algorithm with a set of facial expression images. It only uses still images, no temporal context.
Currently the ConvolutionalLstmNN model is performing best with a validation accuracy of 62.7% trained to classify three emotions. The table below shows accuracy values of this model and the TransferLearningNN model when trained on all seven standard emotions and on a subset of three emotions (fear, happiness, neutral). They were trained on 5,000 images from the FER+ dataset.
Neural Net Model | 7 emotions | 3 emotions | ||
---|---|---|---|---|
Training Accuracy | Validation Accuracy | Training Accuracy | Validation Accuracy | |
ConvolutionalLstmNN | 0.6187 | 0.4751 | 0.9148 | 0.6267 |
TransferLearningNN | 0.5358 | 0.2933 | 0.7393 | 0.4840 |
Both models are overfitting, meaning that training accuracies are much higher than validation accuracies. This means that the models are doing a really good job of recognizing and classifying patterns in the training images, but are overgeneralizing. They are less accurate when predicting emotions for new images.
If you would like to experiment with different parameters using our neural net classes, we recommend you use FloydHub, a platform for training and deploying deep learning models in the cloud. Let us know how your models are doing! The goal is to optimize the performance and generalizability of all the FERPython models.
These are the principals we use to guide development and contributions to the project:
-
FER for Good. FER applications have the potential to be used for malicious purposes. We want to build EmoPy with a community that champions integrity, transparency, and awareness and hope to instill these values throughout development while maintaining an accessible, quality toolkit.
-
User Friendliness. EmoPy prioritizes user experience and is designed to be as easy as possible to get an FER prediction model up and running by minimizing the total user requirements for basic use cases.
-
Experimentation to Maximize Performance. Optimal performance in FER prediction is a primary goal. The deep neural net classes are designed to easily modify training parameters, image pre-processing options, and feature extraction methods in the hopes that experimentation in the open-source community will lead to high-performing FER prediction.
-
Modularity. EmoPy contains four base modules (
fermodel
,neuralnets
,imageprocessor
, andfeatureextractor
) that can be easily used together with minimal restrictions.
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request :D
This is a new library that has a lot of room for growth. Check out the list of open issues that we need help addressing!