Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks.
Backbone | Style | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
---|---|---|---|---|---|---|---|
R-50-DC5 | caffe | 1x | - | - | 37.2 | config | model | log |
R-50-FPN | caffe | 1x | 3.8 | 37.8 | config | model | log | |
R-50-FPN | pytorch | 1x | 4.0 | 21.4 | 37.4 | config | model | log |
R-50-FPN (FP16) | pytorch | 1x | 3.4 | 28.8 | 37.5 | config | model | log |
R-50-FPN | pytorch | 2x | - | - | 38.4 | config | model | log |
R-101-FPN | caffe | 1x | 5.7 | 39.8 | config | model | log | |
R-101-FPN | pytorch | 1x | 6.0 | 15.6 | 39.4 | config | model | log |
R-101-FPN | pytorch | 2x | - | - | 39.8 | config | model | log |
X-101-32x4d-FPN | pytorch | 1x | 7.2 | 13.8 | 41.2 | config | model | log |
X-101-32x4d-FPN | pytorch | 2x | - | - | 41.2 | config | model | log |
X-101-64x4d-FPN | pytorch | 1x | 10.3 | 9.4 | 42.1 | config | model | log |
X-101-64x4d-FPN | pytorch | 2x | - | - | 41.6 | config | model | log |
We trained with R-50-FPN pytorch style backbone for 1x schedule.
Backbone | Loss type | Mem (GB) | Inf time (fps) | box AP | Config | Download |
---|---|---|---|---|---|---|
R-50-FPN | L1Loss | 4.0 | 21.4 | 37.4 | config | model | log |
R-50-FPN | IoULoss | 37.9 | config | model | log | ||
R-50-FPN | GIoULoss | 37.6 | config | model | log | ||
R-50-FPN | BoundedIoULoss | 37.4 | config | model | log |
We also train some models with longer schedules and multi-scale training. The users could finetune them for downstream tasks.
Backbone | Style | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
---|---|---|---|---|---|---|---|
R-50-DC5 | caffe | 1x | - | 37.4 | config | model | log | |
R-50-DC5 | caffe | 3x | - | 38.7 | config | model | log | |
R-50-FPN | caffe | 2x | 3.7 | 39.7 | config | model | log | |
R-50-FPN | caffe | 3x | 3.7 | 39.9 | config | model | log | |
R-50-FPN | pytorch | 3x | 3.9 | 40.3 | config | model | log | |
R-101-FPN | caffe | 3x | 5.6 | 42.0 | config | model | log | |
R-101-FPN | pytorch | 3x | 5.8 | 41.8 | config | model | log | |
X-101-32x4d-FPN | pytorch | 3x | 7.0 | 42.5 | config | model | log | |
X-101-32x8d-FPN | pytorch | 3x | 10.1 | 42.4 | config | model | log | |
X-101-64x4d-FPN | pytorch | 3x | 10.0 | 43.1 | config | model | log |
We further finetune some pre-trained models on the COCO subsets, which only contain only a few of the 80 categories.
Backbone | Style | Class name | Pre-traind model | Mem (GB) | box AP | Config | Download |
---|---|---|---|---|---|---|---|
R-50-FPN | caffe | person | R-50-FPN-Caffe-3x | 3.7 | 55.8 | config | model | log |
R-50-FPN | caffe | person-bicycle-car | R-50-FPN-Caffe-3x | 3.7 | 44.1 | config | model | log |
Torchvision released its high-precision ResNet models. The training details can be found on the Pytorch website. Here, we have done grid searches on learning rate and weight decay and found the optimal hyper-parameter on the detection task.
Backbone | Style | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
---|---|---|---|---|---|---|---|
R-50-TNR | pytorch | 1x | - | 40.2 | config | model | log |
@article{Ren_2017,
title={Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
year={2017},
month={Jun},
}