Tensorflow implementation of the RCNN object detection system as proposed by Rich feature hierarchies for accurate object detection and semantic segmentation .
The RCNN system was proposed by Ross Girshick, Jeff Donahue, Trevor Darrell and **Jitendra Malik from UC Berkeley in their paper Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. RCNN stands for Regions with CNN features, which summarizes the working of the system in very simple terms, generating region proposals with classification using CNNs. The RCNN consists of 3 simple stages:
-
Given an input image, around 2000 bottom-up region proposals are extracted.
-
Computation of features for each proposal using a large convolutional neural network (like pre-trained VGG or ResNets).
-
Classification of each region using class-specific linear SVMs (or MLPs).
For generating the region proposals, we would look towards the following 2 papers:
-
Efficient Graph-Based Image Segmentation proposed by by Pedro F. Felzenszwalb and Daniel P. Huttenlocher.
-
Selective Search for Object Recognition proposed by J R R Uijlings, K E A van de Sande, T Gevers and A W M Smeulders.
@misc{1311.2524,
Author = {Ross Girshick and Jeff Donahue and Trevor Darrell and Jitendra Malik},
Title = {Rich feature hierarchies for accurate object detection and semantic segmentation},
Year = {2013},
Eprint = {arXiv:1311.2524},
}