object-localization

This project shows how to localize a single object in an image by just using a convolutional neural network. There are more sophisticated methods like YOLO, R-CNN, SSD or RetinaNet (focal loss), but sometimes the only thing you need are the coordinates of a single object and its class.

Architecture

First, let's look at YOLOv2's approach:

Pretrain Darknet-19 on ImageNet (feature extractor)
Remove the last convolutional layer
Add three 3 x 3 convolutional layers with 1024 filters
Add a 1 x 1 convolutional layer with the number of outputs needed for detection

We proceed in the same way to build the object detector:

Choose a model from Keras Applications i.e. feature extractor
Remove the dense layer
Freeze some/all/no layers
Add one/multiple/no convolution block (or _inverted_res_block for MobileNetv2)
Add a convolution layer for the coordinates

The code in this repository uses MobileNetv2 [1], because it is faster than other models and the performance can be adapted. For example, if alpha = 0.35 with 96x96 is not good enough, one can just increase both values (see [2] for a comparison). If you use another architecture, change preprocess_input.

Example 1: Finding cats and dogs in images

Installation

pip3 install imgaug (needed for data augmentations)
Download The Oxford-IIIT Pet Dataset
Download The Oxford-IIIT Pet Dataset Annotations
tar xf images.tar.gz
tar xf annotations.tar.gz
mv annotations/xmls/* images/
python3 generate_dataset.py
python3 example_1/train_model.py
Adjust the WEIGHTS_FILE in evaluate_performance.py (given by the last script)
python3 example_1/evaluate_performance.py

Result

I trained the neural network for 75 epochs. The results are 90% avg IoU on training set, 72% on validation set.

Configuration: no augmentations, full fine tuning (no freezing), image size 96, alpha 1.0, batch size 32, initial learning rate 0.001 (then decrease).

In the following images red is the predicted box, green is the ground truth:

Example 2: Distinguishing classes

Installation

We use the same dataset as before, but this time we run the scripts example_2/train_model.py and example_2/evaluate_performance.py.

Changes

In order to distinguish between classes, we have to modify the loss function. I'm using here w_1*log((y_hat - y)^2 + 1) + w_2*FL(p_hat, p) where w_1 = w_2 = 1 are two weights and FL(p_hat, p) = -(0.9(1 - p_hat)^2 p*log(p_hat) + 0.1*p_hat^2(1 - p)log(1-p_hat)) (focal loss).

Instead of using all 37 classes, the code will only output class 0 (contains only class 0) or class 1 (contains class 1 to 36). However, it is easy to extend this to more classes (use categorical cross entropy instead of focal loss and try out different weights).

Result

I trained the neural network for 74 epochs. The results are 85% avg IoU and 97% accuracy on training set and validation set.

Guidelines

Improve accuracy (IoU)

enable augmentations: set AUGMENTATION=True in generate_dataset.py and install imgaug.
better augmentations: increase AUGMENTATION_PER_IMAGE and try out different transformations.
for MobileNetv1/2: increase ALPHA and IMAGE_SIZE in train_model.py
other architectures: increase IMAGE_SIZE
add more layers: e.g. YOLOv2 adds 3 conv layers
try out other loss functions (MAE, smooth L1 loss etc.)
other optimizer: SGD with momentum 0.9, adjust learning rate
read keras-team/keras#9965

Increase training speed

set inmemory=True in train_model.py for small datasets
increase BATCH_SIZE
less layers, IMAGE_SIZE and ALPHA

Overfitting

If the new dataset is small and similar to ImageNet, freeze all layers.
If the new dataset is small and not similar to ImageNet, freeze some layers.
If the new dataset is large, freeze no layers.
read http://cs231n.github.io/transfer-learning/

References

[1] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen. MobileNetV2: Inverted Residuals and Linear Bottlenecks.

[2] https://github.com/keras-team/keras-applications/blob/master/keras_applications/mobilenet_v2.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
example_1		example_1
example_2		example_2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_dataset.py		generate_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

object-localization

Architecture

Example 1: Finding cats and dogs in images

Installation

Result

Example 2: Distinguishing classes

Installation

Changes

Result

Guidelines

Improve accuracy (IoU)

Increase training speed

Overfitting

References

About

Releases

Packages

Languages

License

PavelKovalets/object-localization

Folders and files

Latest commit

History

Repository files navigation

object-localization

Architecture

Example 1: Finding cats and dogs in images

Installation

Result

Example 2: Distinguishing classes

Installation

Changes

Result

Guidelines

Improve accuracy (IoU)

Increase training speed

Overfitting

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages