This project is a challenge to myself to be able to learn and implement face detection from scratch. To do this I use the following Python libraries:
I used original Yolo Paper as a reference to implement the algorithms involved.
I experimented with different backbones and loss function hyperparameters (such as penalty weighting for incorrect predictions). From my testing I concluded several points:
- The Yolo loss function performs poorly on very small objects close together (e.g. a crowd of faces).
- The Yolo loss function performs well on larger objects (e.g. a single face).
- The Yolo loss function is very sensitive to over or under tuning the hyperparameters.
The more interesting part I found about the project was trying to get the model as small as possible, while maintaining the same accuracy. For this I tried 3 different architectures:
- A standard Resnet backbone, to get a good starting point.
- A Pool Resnet backbone, to reduce the computation time, without changing the parameters.
- A pretrained Mobilenet v3 backbone, to test if pretraining made any different.
I found that:
- The standard Resnet backbone performs well, but can be too slow when the number of bounding boxes is greater than 100.
- The Pool Resnet backbone performs equally well, but is much faster.
- The Mobilenet v3 backbone performs the same as the Pool Resnet, leading me to believe that none of the pretraining is helping.
To run the medium PoolResnet Model:
- Install the requirements
pip install -r requirements.txt
- Run the model
python demo_model.py
(Note: you'll need to install pytorch with cuda for training) To train the medium PoolResnet Model:
- Install the requirements
pip install -r requirements.txt
- Run the model
python train_model.py