FoveaBox is an accurate, flexible and completely anchor-free object detection system for object detection framework, as presented in our paper https://arxiv.org/abs/1904.03797: Different from previous anchor-based methods, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object.
Backbone | Style | align | ms-train | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
---|---|---|---|---|---|---|---|---|---|
R-50 | pytorch | N | N | 1x | 5.6 | 24.1 | 36.5 | config | model | log |
R-50 | pytorch | N | N | 2x | 5.6 | - | 37.2 | config | model | log |
R-50 | pytorch | Y | N | 2x | 8.1 | 19.4 | 37.9 | config | model | log |
R-50 | pytorch | Y | Y | 2x | 8.1 | 18.3 | 40.4 | config | model | log |
R-101 | pytorch | N | N | 1x | 9.2 | 17.4 | 38.6 | config | model | log |
R-101 | pytorch | N | N | 2x | 11.7 | - | 40.0 | config | model | log |
R-101 | pytorch | Y | N | 2x | 11.7 | 14.7 | 40.0 | config | model | log |
R-101 | pytorch | Y | Y | 2x | 11.7 | 14.7 | 42.0 | config | model | log |
[1] 1x and 2x mean the model is trained for 12 and 24 epochs, respectively.
[2] Align means utilizing deformable convolution to align the cls branch.
[3] All results are obtained with a single model and without any test time data augmentation.
[4] We use 4 GPUs for training.
Any pull requests or issues are welcome.
Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows.
@article{kong2019foveabox,
title={FoveaBox: Beyond Anchor-based Object Detector},
author={Kong, Tao and Sun, Fuchun and Liu, Huaping and Jiang, Yuning and Shi, Jianbo},
journal={arXiv preprint arXiv:1904.03797},
year={2019}
}