Skip to content

Latest commit

 

History

History
65 lines (52 loc) · 8.83 KB

README.md

File metadata and controls

65 lines (52 loc) · 8.83 KB

VarifocalNet: An IoU-aware Dense Object Detector

Introduction

Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. In this work, we propose to learn IoU-aware classification scores (IACS) that simultaneously represent the object presence confidence and localization accuracy, to produce a more accurate ranking of detections in dense object detectors. In particular, we design a new loss function, named Varifocal Loss (VFL), for training a dense object detector to predict the IACS, and a new efficient star-shaped bounding box feature representation (the features at nine yellow sampling points) for estimating the IACS and refining coarse bounding boxes. Combining these two new components and a bounding box refinement branch, we build a new IoU-aware dense object detector based on the FCOS+ATSS architecture, what we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO benchmark show that our VFNet consistently surpasses the strong baseline by ~2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN reaches a single-model single-scale AP of 55.1 on COCO test-dev, achieving the state-of-the-art performance among various object detectors.

Learning to Predict the IoU-aware Classification Score.

Citing VarifocalNet

@inproceedings{zhang2020varifocalnet,
  title={VarifocalNet: An IoU-aware Dense Object Detector},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  booktitle={CVPR},
  year={2021}
}

Results and Models

Backbone Style DCN MS train Lr schd Inf time (fps) box AP (val) box AP (test-dev) Download
R-50 pytorch N N 1x 19.4 41.6 41.6 model | log
R-50 pytorch N Y 2x 19.3 44.5 44.8 model | log
R-50 pytorch Y Y 2x 16.3 47.8 48.0 model | log
R-101 pytorch N N 1x 15.5 43.0 43.6 model | log
R-101 pytorch N N 2x 15.6 43.5 43.9 model | log
R-101 pytorch N Y 2x 15.6 46.2 46.7 model | log
R-101 pytorch Y Y 2x 12.6 49.0 49.2 model | log
X-101-32x4d pytorch N Y 2x 13.1 47.4 47.6 model | log
X-101-32x4d pytorch Y Y 2x 10.1 49.7 50.0 model | log
X-101-64x4d pytorch N Y 2x 9.2 48.2 48.5 model | log
X-101-64x4d pytorch Y Y 2x 6.7 50.4 50.8 model | log
R2-101 pytorch N Y 2x 13.0 49.2 49.3 model | log
R2-101 pytorch Y Y 2x 10.3 51.1 51.3 model | log

Notes:

  • The MS-train scale range is 1333x[480:960] (range mode) and the inference scale keeps 1333x800.
  • The R2-101 backbone is Res2Net-101.
  • DCN means using DCNv2 in both backbone and head.
  • The inference speed is tested with a Nvidia V100 GPU on HPC (log file).

We also provide the models of RetinaNet, FoveaBox, RepPoints and ATSS trained with the Focal Loss (FL) and our Varifocal Loss (VFL).

Method Backbone MS train Lr schd box AP (val) Download
RetinaNet + FL R-50 N 1x 36.5 model | log
RetinaNet + VFL R-50 N 1x 37.4 model | log
FoveaBox + FL R-50 N 1x 36.3 model | log
FoveaBox + VFL R-50 N 1x 37.2 model | log
RepPoints + FL R-50 N 1x 38.3 model | log
RepPoints + VFL R-50 N 1x 39.7 model | log
ATSS + FL R-50 N 1x 39.3 model | log
ATSS + VFL R-50 N 1x 40.2 model | log

Notes:

  • We use 4 P100 GPUs for the training of these models (except ATSS, 8x2) with a mini-batch size of 16 images (4 images per GPU), as we found 4x4 training yielded slightly better results compared to 8x2 training.
  • You can find corresponding config files in configs/vfnet.
  • use_vfl flag in those config files controls whether to use the Varifocal Loss in training or not.