Skip to content

Commit

Permalink
CodeCamp open-mmlab#38 RTMDet-R (open-mmlab#662)
Browse files Browse the repository at this point in the history
* Add DistributionAngleCoder

* fix keepdim

* rtm init

* fix distribution angle coder

* clean

* clean

* add probiou

* add hrsc cfg

* fix GDLoss with empty input

* add probiou cfg

* add mixup mosaic cfg

* Add DistributionAngleCoder

* fix keepdim

* rtm init

* fix distribution angle coder

* clean

* clean

* add probiou

* add hrsc cfg

* fix GDLoss with empty input

* add probiou cfg

* add mixup mosaic cfg

* fix mmdeploy vis error

* add function_rewriter for deploy

* fix hrsc config

* remove deploy

* move distance2obb to transforms, remove useless code

* refactor configs

* fix readme

* fix readme

* clean

* add docstring

* add unittest

* update metafile

* remove useless code

* add img height

* typo

* update results

* fix table

* optimize tables

* add

* fix config

* upload

* update pth link

* add readme pth

* update readme

* update readme

* remove useless cfg

* fix ut on cpu

* fix comment

* add comment for change

* add cite
  • Loading branch information
liuyanyi authored and triple-Mu committed Jan 31, 2023
1 parent 7a3485e commit 96ea787
Show file tree
Hide file tree
Showing 30 changed files with 2,580 additions and 13 deletions.
76 changes: 76 additions & 0 deletions configs/rotated_rtmdet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# RTMDet-R

> [RTMDet: An Empirical Study of Designing Real-Time Object Detectors](https://arxiv.org/abs/2212.07784)
<!-- [ALGORITHM] -->

## Abstract

In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection. To obtain a more efficient model architecture, we explore an architecture that has compatible capacities in the backbone and neck, constructed by a basic building block that consists of large-kernel depth-wise convolutions. We further introduce soft labels when calculating matching costs in the dynamic label assignment to improve accuracy. Together with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, outperforming the current mainstream industrial detectors. RTMDet achieves the best parameter-accuracy trade-off with tiny/small/medium/large/extra-large model sizes for various application scenarios, and obtains new state-of-the-art performance on real-time instance segmentation and rotated object detection. We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks.

<div align=center>
<img src="https://user-images.githubusercontent.com/11705038/204995787-ef739910-e196-42c7-a9db-c9c8e28a494d.jpg" height="360"/>
</div>

## Results and Models

### DOTA-v1.0

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/object-detection-in-aerial-images-on-dota-1)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dota-1?p=rtmdet-an-empirical-study-of-designing-real)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/one-stage-anchor-free-oriented-object-1)](https://paperswithcode.com/sota/one-stage-anchor-free-oriented-object-1?p=rtmdet-an-empirical-study-of-designing-real)

| Backbone | pretrain | Aug | mmAP | mAP50 | mAP75 | Params(M) | FLOPS(G) | TRT-FP16-Latency(ms) | Config | Download |
| :---------: | :------: | :---: | :---: | :---: | :---: | :-------: | :------: | :------------------: | :------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| RTMDet-tiny | IN | RR | 47.37 | 75.36 | 50.64 | 4.88 | 20.45 | 4.40 | [config](./rotated_rtmdet_tiny-3x-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota/rotated_rtmdet_tiny-3x-dota-9d821076.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota/rotated_rtmdet_tiny-3x-dota_20221201_120814.json) |
| RTMDet-tiny | IN | MS+RR | 53.59 | 79.82 | 58.87 | 4.88 | 20.45 | 4.40 | [config](./rotated_rtmdet_tiny-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms/rotated_rtmdet_tiny-3x-dota_ms-f12286ff.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms/rotated_rtmdet_tiny-3x-dota_ms_20221113_201235.log) |
| RTMDet-s | IN | RR | 48.16 | 76.93 | 50.59 | 8.86 | 37.62 | 4.86 | [config](./rotated_rtmdet_s-3x-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota/rotated_rtmdet_s-3x-dota-11f6ccf5.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota/rotated_rtmdet_s-3x-dota_20221124_081442.json) |
| RTMDet-s | IN | MS+RR | 54.43 | 79.98 | 60.07 | 8.86 | 37.62 | 4.86 | [config](./rotated_rtmdet_s-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms/rotated_rtmdet_s-3x-dota_ms-20ead048.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms/rotated_rtmdet_s-3x-dota_ms_20221113_201055.json) |
| RTMDet-m | IN | RR | 50.56 | 78.24 | 54.47 | 24.67 | 99.76 | 7.82 | [config](./rotated_rtmdet_m-3x-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota/rotated_rtmdet_m-3x-dota-beeadda6.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota/rotated_rtmdet_m-3x-dota_20221122_011234.json) |
| RTMDet-m | IN | MS+RR | 55.00 | 80.26 | 61.26 | 24.67 | 99.76 | 7.82 | [config](./rotated_rtmdet_m-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms/rotated_rtmdet_m-3x-dota_ms-c71eb375.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms/rotated_rtmdet_m-3x-dota_ms_20221122_011234.json) |
| RTMDet-l | IN | RR | 51.01 | 78.85 | 55.21 | 52.27 | 204.21 | 10.82 | [config](./rotated_rtmdet_l-3x-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota/rotated_rtmdet_l-3x-dota-23992372.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota/rotated_rtmdet_l-3x-dota_20221122_011241.json) |
| RTMDet-l | IN | MS+RR | 55.52 | 80.54 | 61.47 | 52.27 | 204.21 | 10.82 | [config](./rotated_rtmdet_l-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms/rotated_rtmdet_l-3x-dota_ms-2738da34.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms/rotated_rtmdet_l-3x-dota_ms_20221122_011241.json) |
| RTMDet-l | COCO | MS+RR | 56.74 | 81.33 | 63.45 | 52.27 | 204.21 | 10.82 | [config](./rotated_rtmdet_l-coco_pretrain-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms/rotated_rtmdet_l-coco_pretrain-3x-dota_ms-06d248a2.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms/rotated_rtmdet_l-coco_pretrain-3x-dota_ms_20221113_202010.json) |

- By default, DOTA-v1.0 dataset trained with 3x schedule and image size 1024\*1024.

### HRSC

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/object-detection-in-aerial-images-on-hrsc2016)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-hrsc2016?p=rtmdet-an-empirical-study-of-designing-real)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/one-stage-anchor-free-oriented-object-3)](https://paperswithcode.com/sota/one-stage-anchor-free-oriented-object-3?p=rtmdet-an-empirical-study-of-designing-real)

| Backbone | pretrain | Aug | mAP 07 | mAP 12 | Params(M) | FLOPS(G) | Config | Download |
| :---------: | :------: | :-: | :----: | :----: | :-------: | :------: | :----------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| RTMDet-tiny | IN | RR | 90.6 | 97.1 | 4.88 | 12.54 | [config](./rotated_rtmdet_tiny-9x-hrsc.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-9x-hrsc/rotated_rtmdet_tiny-9x-hrsc-9f2e3ca6.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-9x-hrsc/rotated_rtmdet_tiny-9x-hrsc_20221125_145920.json) |

- By default, HRSC dataset trained with 9x schedule and image size 800\*800.

### Stronger augmentation

We also provide configs with Mixup, Mosaic and RandomRotate with longer schedule. Training time is less than MS.

DOTA:

| Backbone | pretrain | schedule | Aug | mmAP | mAP50 | mAP75 | Config | Download |
| :------: | :------: | :------: | :-------------: | :---: | :---: | :---: | :-------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| RTMDet-l | IN | 100e | Mixup+Mosaic+RR | 54.59 | 80.16 | 61.16 | [config](./rotated_rtmdet_l-100e-aug-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-100e-aug-dota/rotated_rtmdet_l-100e-aug-dota-bc59fd88.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-100e-aug-dota/rotated_rtmdet_l-100e-aug-dota_20221124_224135.json) |

**Note**:

1. We follow the latest metrics from the DOTA evaluation server, original voc format mAP is now mAP50.
2. `IN` means ImageNet pretrain, `COCO` means COCO pretrain.
3. Different from the report, the inference speed here is measured on an NVIDIA 2080Ti GPU with TensorRT 8.4.3, cuDNN 8.2.0, FP16, batch size=1, and with NMS.

## Citation

```
@misc{lyu2022rtmdet,
title={RTMDet: An Empirical Study of Designing Real-Time Object Detectors},
author={Chengqi Lyu and Wenwei Zhang and Haian Huang and Yue Zhou and Yudong Wang and Yanyi Liu and Shilong Zhang and Kai Chen},
year={2022},
eprint={2212.07784},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
34 changes: 34 additions & 0 deletions configs/rotated_rtmdet/_base_/default_runtime.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
default_scope = 'mmrotate'

default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=12, max_keep_ckpts=3),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='mmdet.DetVisualizationHook'))

env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl'),
)

vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
type='RotLocalVisualizer', vis_backends=vis_backends, name='visualizer')
log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)

log_level = 'INFO'
load_from = None
resume = False

custom_hooks = [
dict(type='mmdet.NumClassCheckHook'),
dict(
type='EMAHook',
ema_type='mmdet.ExpMomentumEMA',
momentum=0.0002,
update_buffers=True,
priority=49)
]
104 changes: 104 additions & 0 deletions configs/rotated_rtmdet/_base_/dota_rr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# dataset settings
dataset_type = 'DOTADataset'
data_root = 'data/split_ss_dota/'

file_client_args = dict(backend='disk')

train_pipeline = [
dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='mmdet.RandomFlip',
prob=0.75,
direction=['horizontal', 'vertical', 'diagonal']),
dict(
type='RandomRotate',
prob=0.5,
angle_range=180,
rect_obj_labels=[9, 11]),
dict(
type='mmdet.Pad', size=(1024, 1024),
pad_val=dict(img=(114, 114, 114))),
dict(type='mmdet.PackDetInputs')
]
val_pipeline = [
dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
# avoid bboxes being resized
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(
type='mmdet.Pad', size=(1024, 1024),
pad_val=dict(img=(114, 114, 114))),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
test_pipeline = [
dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='mmdet.Pad', size=(1024, 1024),
pad_val=dict(img=(114, 114, 114))),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
train_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
batch_sampler=None,
pin_memory=False,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='trainval/annfiles/',
data_prefix=dict(img_path='trainval/images/'),
img_shape=(1024, 1024),
filter_cfg=dict(filter_empty_gt=True),
pipeline=train_pipeline))
val_dataloader = dict(
batch_size=1,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='trainval/annfiles/',
data_prefix=dict(img_path='trainval/images/'),
img_shape=(1024, 1024),
test_mode=True,
pipeline=val_pipeline))
test_dataloader = val_dataloader

val_evaluator = dict(type='DOTAMetric', metric='mAP')
test_evaluator = val_evaluator

# inference on test dataset and format the output results
# for submission. Note: the test set has no annotation.
# test_dataloader = dict(
# batch_size=8,
# num_workers=8,
# persistent_workers=False,
# drop_last=False,
# sampler=dict(type='DefaultSampler', shuffle=False),
# dataset=dict(
# type=dataset_type,
# data_root=data_root,
# data_prefix=dict(img_path='test/images/'),
# img_shape=(1024, 1024),
# test_mode=True,
# pipeline=test_pipeline))
# test_evaluator = dict(
# type='DOTAMetric',
# format_only=True,
# merge_patches=True,
# outfile_prefix='./work_dirs/rtmdet_r/Task1')
103 changes: 103 additions & 0 deletions configs/rotated_rtmdet/_base_/dota_rr_ms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# dataset settings
dataset_type = 'DOTADataset'
data_root = 'data/split_ms_dota/'
file_client_args = dict(backend='disk')

train_pipeline = [
dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='mmdet.RandomFlip',
prob=0.75,
direction=['horizontal', 'vertical', 'diagonal']),
dict(
type='RandomRotate',
prob=0.5,
angle_range=180,
rect_obj_labels=[9, 11]),
dict(
type='mmdet.Pad', size=(1024, 1024),
pad_val=dict(img=(114, 114, 114))),
dict(type='mmdet.PackDetInputs')
]
val_pipeline = [
dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
# avoid bboxes being resized
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(
type='mmdet.Pad', size=(1024, 1024),
pad_val=dict(img=(114, 114, 114))),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
test_pipeline = [
dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='mmdet.Pad', size=(1024, 1024),
pad_val=dict(img=(114, 114, 114))),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
train_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
batch_sampler=None,
pin_memory=False,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='trainval/annfiles/',
data_prefix=dict(img_path='trainval/images/'),
img_shape=(1024, 1024),
filter_cfg=dict(filter_empty_gt=True),
pipeline=train_pipeline))
val_dataloader = dict(
batch_size=1,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='trainval/annfiles/',
data_prefix=dict(img_path='trainval/images/'),
img_shape=(1024, 1024),
test_mode=True,
pipeline=val_pipeline))
test_dataloader = val_dataloader

val_evaluator = dict(type='DOTAMetric', metric='mAP')
test_evaluator = val_evaluator

# inference on test dataset and format the output results
# for submission. Note: the test set has no annotation.
# test_dataloader = dict(
# batch_size=8,
# num_workers=8,
# persistent_workers=False,
# drop_last=False,
# sampler=dict(type='DefaultSampler', shuffle=False),
# dataset=dict(
# type=dataset_type,
# data_root=data_root,
# data_prefix=dict(img_path='test/images/'),
# img_shape=(1024, 1024),
# test_mode=True,
# pipeline=test_pipeline))
# test_evaluator = dict(
# type='DOTAMetric',
# format_only=True,
# merge_patches=True,
# outfile_prefix='./work_dirs/rtmdet_r/Task1')
Loading

0 comments on commit 96ea787

Please sign in to comment.