diff --git a/configs/rotated_rtmdet/README.md b/configs/rotated_rtmdet/README.md
new file mode 100644
index 000000000..42a4d5bcb
--- /dev/null
+++ b/configs/rotated_rtmdet/README.md
@@ -0,0 +1,76 @@
+# RTMDet-R
+
+> [RTMDet: An Empirical Study of Designing Real-Time Object Detectors](https://arxiv.org/abs/2212.07784)
+
+
+
+## Abstract
+
+In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection. To obtain a more efficient model architecture, we explore an architecture that has compatible capacities in the backbone and neck, constructed by a basic building block that consists of large-kernel depth-wise convolutions. We further introduce soft labels when calculating matching costs in the dynamic label assignment to improve accuracy. Together with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, outperforming the current mainstream industrial detectors. RTMDet achieves the best parameter-accuracy trade-off with tiny/small/medium/large/extra-large model sizes for various application scenarios, and obtains new state-of-the-art performance on real-time instance segmentation and rotated object detection. We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks.
+
+
+
+
+
+## Results and Models
+
+### DOTA-v1.0
+
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/object-detection-in-aerial-images-on-dota-1)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dota-1?p=rtmdet-an-empirical-study-of-designing-real)
+
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/one-stage-anchor-free-oriented-object-1)](https://paperswithcode.com/sota/one-stage-anchor-free-oriented-object-1?p=rtmdet-an-empirical-study-of-designing-real)
+
+| Backbone | pretrain | Aug | mmAP | mAP50 | mAP75 | Params(M) | FLOPS(G) | TRT-FP16-Latency(ms) | Config | Download |
+| :---------: | :------: | :---: | :---: | :---: | :---: | :-------: | :------: | :------------------: | :------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| RTMDet-tiny | IN | RR | 47.37 | 75.36 | 50.64 | 4.88 | 20.45 | 4.40 | [config](./rotated_rtmdet_tiny-3x-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota/rotated_rtmdet_tiny-3x-dota-9d821076.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota/rotated_rtmdet_tiny-3x-dota_20221201_120814.json) |
+| RTMDet-tiny | IN | MS+RR | 53.59 | 79.82 | 58.87 | 4.88 | 20.45 | 4.40 | [config](./rotated_rtmdet_tiny-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms/rotated_rtmdet_tiny-3x-dota_ms-f12286ff.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms/rotated_rtmdet_tiny-3x-dota_ms_20221113_201235.log) |
+| RTMDet-s | IN | RR | 48.16 | 76.93 | 50.59 | 8.86 | 37.62 | 4.86 | [config](./rotated_rtmdet_s-3x-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota/rotated_rtmdet_s-3x-dota-11f6ccf5.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota/rotated_rtmdet_s-3x-dota_20221124_081442.json) |
+| RTMDet-s | IN | MS+RR | 54.43 | 79.98 | 60.07 | 8.86 | 37.62 | 4.86 | [config](./rotated_rtmdet_s-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms/rotated_rtmdet_s-3x-dota_ms-20ead048.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms/rotated_rtmdet_s-3x-dota_ms_20221113_201055.json) |
+| RTMDet-m | IN | RR | 50.56 | 78.24 | 54.47 | 24.67 | 99.76 | 7.82 | [config](./rotated_rtmdet_m-3x-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota/rotated_rtmdet_m-3x-dota-beeadda6.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota/rotated_rtmdet_m-3x-dota_20221122_011234.json) |
+| RTMDet-m | IN | MS+RR | 55.00 | 80.26 | 61.26 | 24.67 | 99.76 | 7.82 | [config](./rotated_rtmdet_m-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms/rotated_rtmdet_m-3x-dota_ms-c71eb375.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms/rotated_rtmdet_m-3x-dota_ms_20221122_011234.json) |
+| RTMDet-l | IN | RR | 51.01 | 78.85 | 55.21 | 52.27 | 204.21 | 10.82 | [config](./rotated_rtmdet_l-3x-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota/rotated_rtmdet_l-3x-dota-23992372.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota/rotated_rtmdet_l-3x-dota_20221122_011241.json) |
+| RTMDet-l | IN | MS+RR | 55.52 | 80.54 | 61.47 | 52.27 | 204.21 | 10.82 | [config](./rotated_rtmdet_l-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms/rotated_rtmdet_l-3x-dota_ms-2738da34.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms/rotated_rtmdet_l-3x-dota_ms_20221122_011241.json) |
+| RTMDet-l | COCO | MS+RR | 56.74 | 81.33 | 63.45 | 52.27 | 204.21 | 10.82 | [config](./rotated_rtmdet_l-coco_pretrain-3x-dota_ms.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms/rotated_rtmdet_l-coco_pretrain-3x-dota_ms-06d248a2.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms/rotated_rtmdet_l-coco_pretrain-3x-dota_ms_20221113_202010.json) |
+
+- By default, DOTA-v1.0 dataset trained with 3x schedule and image size 1024\*1024.
+
+### HRSC
+
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/object-detection-in-aerial-images-on-hrsc2016)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-hrsc2016?p=rtmdet-an-empirical-study-of-designing-real)
+
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rtmdet-an-empirical-study-of-designing-real/one-stage-anchor-free-oriented-object-3)](https://paperswithcode.com/sota/one-stage-anchor-free-oriented-object-3?p=rtmdet-an-empirical-study-of-designing-real)
+
+| Backbone | pretrain | Aug | mAP 07 | mAP 12 | Params(M) | FLOPS(G) | Config | Download |
+| :---------: | :------: | :-: | :----: | :----: | :-------: | :------: | :----------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| RTMDet-tiny | IN | RR | 90.6 | 97.1 | 4.88 | 12.54 | [config](./rotated_rtmdet_tiny-9x-hrsc.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-9x-hrsc/rotated_rtmdet_tiny-9x-hrsc-9f2e3ca6.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-9x-hrsc/rotated_rtmdet_tiny-9x-hrsc_20221125_145920.json) |
+
+- By default, HRSC dataset trained with 9x schedule and image size 800\*800.
+
+### Stronger augmentation
+
+We also provide configs with Mixup, Mosaic and RandomRotate with longer schedule. Training time is less than MS.
+
+DOTA:
+
+| Backbone | pretrain | schedule | Aug | mmAP | mAP50 | mAP75 | Config | Download |
+| :------: | :------: | :------: | :-------------: | :---: | :---: | :---: | :-------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| RTMDet-l | IN | 100e | Mixup+Mosaic+RR | 54.59 | 80.16 | 61.16 | [config](./rotated_rtmdet_l-100e-aug-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-100e-aug-dota/rotated_rtmdet_l-100e-aug-dota-bc59fd88.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-100e-aug-dota/rotated_rtmdet_l-100e-aug-dota_20221124_224135.json) |
+
+**Note**:
+
+1. We follow the latest metrics from the DOTA evaluation server, original voc format mAP is now mAP50.
+2. `IN` means ImageNet pretrain, `COCO` means COCO pretrain.
+3. Different from the report, the inference speed here is measured on an NVIDIA 2080Ti GPU with TensorRT 8.4.3, cuDNN 8.2.0, FP16, batch size=1, and with NMS.
+
+## Citation
+
+```
+@misc{lyu2022rtmdet,
+ title={RTMDet: An Empirical Study of Designing Real-Time Object Detectors},
+ author={Chengqi Lyu and Wenwei Zhang and Haian Huang and Yue Zhou and Yudong Wang and Yanyi Liu and Shilong Zhang and Kai Chen},
+ year={2022},
+ eprint={2212.07784},
+ archivePrefix={arXiv},
+ primaryClass={cs.CV}
+}
+```
diff --git a/configs/rotated_rtmdet/_base_/default_runtime.py b/configs/rotated_rtmdet/_base_/default_runtime.py
new file mode 100644
index 000000000..6a53c9901
--- /dev/null
+++ b/configs/rotated_rtmdet/_base_/default_runtime.py
@@ -0,0 +1,34 @@
+default_scope = 'mmrotate'
+
+default_hooks = dict(
+ timer=dict(type='IterTimerHook'),
+ logger=dict(type='LoggerHook', interval=50),
+ param_scheduler=dict(type='ParamSchedulerHook'),
+ checkpoint=dict(type='CheckpointHook', interval=12, max_keep_ckpts=3),
+ sampler_seed=dict(type='DistSamplerSeedHook'),
+ visualization=dict(type='mmdet.DetVisualizationHook'))
+
+env_cfg = dict(
+ cudnn_benchmark=False,
+ mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
+ dist_cfg=dict(backend='nccl'),
+)
+
+vis_backends = [dict(type='LocalVisBackend')]
+visualizer = dict(
+ type='RotLocalVisualizer', vis_backends=vis_backends, name='visualizer')
+log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)
+
+log_level = 'INFO'
+load_from = None
+resume = False
+
+custom_hooks = [
+ dict(type='mmdet.NumClassCheckHook'),
+ dict(
+ type='EMAHook',
+ ema_type='mmdet.ExpMomentumEMA',
+ momentum=0.0002,
+ update_buffers=True,
+ priority=49)
+]
diff --git a/configs/rotated_rtmdet/_base_/dota_rr.py b/configs/rotated_rtmdet/_base_/dota_rr.py
new file mode 100644
index 000000000..dbc854e3b
--- /dev/null
+++ b/configs/rotated_rtmdet/_base_/dota_rr.py
@@ -0,0 +1,104 @@
+# dataset settings
+dataset_type = 'DOTADataset'
+data_root = 'data/split_ss_dota/'
+
+file_client_args = dict(backend='disk')
+
+train_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(
+ type='RandomRotate',
+ prob=0.5,
+ angle_range=180,
+ rect_obj_labels=[9, 11]),
+ dict(
+ type='mmdet.Pad', size=(1024, 1024),
+ pad_val=dict(img=(114, 114, 114))),
+ dict(type='mmdet.PackDetInputs')
+]
+val_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
+ # avoid bboxes being resized
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(
+ type='mmdet.Pad', size=(1024, 1024),
+ pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.PackDetInputs',
+ meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+ 'scale_factor'))
+]
+test_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
+ dict(
+ type='mmdet.Pad', size=(1024, 1024),
+ pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.PackDetInputs',
+ meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+ 'scale_factor'))
+]
+train_dataloader = dict(
+ batch_size=8,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ batch_sampler=None,
+ pin_memory=False,
+ dataset=dict(
+ type=dataset_type,
+ data_root=data_root,
+ ann_file='trainval/annfiles/',
+ data_prefix=dict(img_path='trainval/images/'),
+ img_shape=(1024, 1024),
+ filter_cfg=dict(filter_empty_gt=True),
+ pipeline=train_pipeline))
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=2,
+ persistent_workers=True,
+ drop_last=False,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=dict(
+ type=dataset_type,
+ data_root=data_root,
+ ann_file='trainval/annfiles/',
+ data_prefix=dict(img_path='trainval/images/'),
+ img_shape=(1024, 1024),
+ test_mode=True,
+ pipeline=val_pipeline))
+test_dataloader = val_dataloader
+
+val_evaluator = dict(type='DOTAMetric', metric='mAP')
+test_evaluator = val_evaluator
+
+# inference on test dataset and format the output results
+# for submission. Note: the test set has no annotation.
+# test_dataloader = dict(
+# batch_size=8,
+# num_workers=8,
+# persistent_workers=False,
+# drop_last=False,
+# sampler=dict(type='DefaultSampler', shuffle=False),
+# dataset=dict(
+# type=dataset_type,
+# data_root=data_root,
+# data_prefix=dict(img_path='test/images/'),
+# img_shape=(1024, 1024),
+# test_mode=True,
+# pipeline=test_pipeline))
+# test_evaluator = dict(
+# type='DOTAMetric',
+# format_only=True,
+# merge_patches=True,
+# outfile_prefix='./work_dirs/rtmdet_r/Task1')
diff --git a/configs/rotated_rtmdet/_base_/dota_rr_ms.py b/configs/rotated_rtmdet/_base_/dota_rr_ms.py
new file mode 100644
index 000000000..c75bb2c8f
--- /dev/null
+++ b/configs/rotated_rtmdet/_base_/dota_rr_ms.py
@@ -0,0 +1,103 @@
+# dataset settings
+dataset_type = 'DOTADataset'
+data_root = 'data/split_ms_dota/'
+file_client_args = dict(backend='disk')
+
+train_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(
+ type='RandomRotate',
+ prob=0.5,
+ angle_range=180,
+ rect_obj_labels=[9, 11]),
+ dict(
+ type='mmdet.Pad', size=(1024, 1024),
+ pad_val=dict(img=(114, 114, 114))),
+ dict(type='mmdet.PackDetInputs')
+]
+val_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
+ # avoid bboxes being resized
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(
+ type='mmdet.Pad', size=(1024, 1024),
+ pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.PackDetInputs',
+ meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+ 'scale_factor'))
+]
+test_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
+ dict(
+ type='mmdet.Pad', size=(1024, 1024),
+ pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.PackDetInputs',
+ meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+ 'scale_factor'))
+]
+train_dataloader = dict(
+ batch_size=8,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ batch_sampler=None,
+ pin_memory=False,
+ dataset=dict(
+ type=dataset_type,
+ data_root=data_root,
+ ann_file='trainval/annfiles/',
+ data_prefix=dict(img_path='trainval/images/'),
+ img_shape=(1024, 1024),
+ filter_cfg=dict(filter_empty_gt=True),
+ pipeline=train_pipeline))
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=2,
+ persistent_workers=True,
+ drop_last=False,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=dict(
+ type=dataset_type,
+ data_root=data_root,
+ ann_file='trainval/annfiles/',
+ data_prefix=dict(img_path='trainval/images/'),
+ img_shape=(1024, 1024),
+ test_mode=True,
+ pipeline=val_pipeline))
+test_dataloader = val_dataloader
+
+val_evaluator = dict(type='DOTAMetric', metric='mAP')
+test_evaluator = val_evaluator
+
+# inference on test dataset and format the output results
+# for submission. Note: the test set has no annotation.
+# test_dataloader = dict(
+# batch_size=8,
+# num_workers=8,
+# persistent_workers=False,
+# drop_last=False,
+# sampler=dict(type='DefaultSampler', shuffle=False),
+# dataset=dict(
+# type=dataset_type,
+# data_root=data_root,
+# data_prefix=dict(img_path='test/images/'),
+# img_shape=(1024, 1024),
+# test_mode=True,
+# pipeline=test_pipeline))
+# test_evaluator = dict(
+# type='DOTAMetric',
+# format_only=True,
+# merge_patches=True,
+# outfile_prefix='./work_dirs/rtmdet_r/Task1')
diff --git a/configs/rotated_rtmdet/_base_/hrsc_rr.py b/configs/rotated_rtmdet/_base_/hrsc_rr.py
new file mode 100644
index 000000000..d2518ea39
--- /dev/null
+++ b/configs/rotated_rtmdet/_base_/hrsc_rr.py
@@ -0,0 +1,81 @@
+# dataset settings
+dataset_type = 'HRSCDataset'
+data_root = 'data/hrsc/'
+file_client_args = dict(backend='disk')
+
+train_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(type='RandomRotate', prob=0.5, angle_range=180),
+ dict(type='mmdet.Pad', size=(800, 800), pad_val=dict(img=(114, 114, 114))),
+ dict(type='mmdet.PackDetInputs')
+]
+val_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
+ # avoid bboxes being resized
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(type='mmdet.Pad', size=(800, 800), pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.PackDetInputs',
+ meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+ 'scale_factor'))
+]
+test_pipeline = [
+ dict(type='mmdet.LoadImageFromFile', file_client_args=file_client_args),
+ dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
+ dict(type='mmdet.Pad', size=(800, 800), pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.PackDetInputs',
+ meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+ 'scale_factor'))
+]
+train_dataloader = dict(
+ batch_size=8,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ batch_sampler=None,
+ pin_memory=True,
+ dataset=dict(
+ type='RepeatDataset',
+ times=3,
+ dataset=dict(
+ type=dataset_type,
+ data_root=data_root,
+ ann_file='ImageSets/trainval.txt',
+ data_prefix=dict(sub_data_root='FullDataSet/'),
+ filter_cfg=dict(filter_empty_gt=True),
+ pipeline=train_pipeline)))
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=2,
+ persistent_workers=True,
+ drop_last=False,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=dict(
+ type=dataset_type,
+ data_root=data_root,
+ ann_file='ImageSets/test.txt',
+ data_prefix=dict(sub_data_root='FullDataSet/'),
+ test_mode=True,
+ pipeline=val_pipeline))
+test_dataloader = val_dataloader
+
+val_evaluator = [
+ dict(
+ type='DOTAMetric',
+ eval_mode='11points',
+ prefix='dota_ap07',
+ metric='mAP'),
+ dict(
+ type='DOTAMetric', eval_mode='area', prefix='dota_ap12', metric='mAP'),
+]
+test_evaluator = val_evaluator
diff --git a/configs/rotated_rtmdet/_base_/schedule_3x.py b/configs/rotated_rtmdet/_base_/schedule_3x.py
new file mode 100644
index 000000000..110a8acc5
--- /dev/null
+++ b/configs/rotated_rtmdet/_base_/schedule_3x.py
@@ -0,0 +1,33 @@
+max_epochs = 3 * 12
+base_lr = 0.004 / 16
+interval = 12
+
+train_cfg = dict(
+ type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=interval)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+
+# learning rate
+param_scheduler = [
+ dict(
+ type='LinearLR',
+ start_factor=1.0e-5,
+ by_epoch=False,
+ begin=0,
+ end=1000),
+ dict(
+ type='CosineAnnealingLR',
+ eta_min=base_lr * 0.05,
+ begin=max_epochs // 2,
+ end=max_epochs,
+ T_max=max_epochs // 2,
+ by_epoch=True,
+ convert_to_iter_based=True),
+]
+
+# optimizer
+optim_wrapper = dict(
+ type='OptimWrapper',
+ optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05),
+ paramwise_cfg=dict(
+ norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))
diff --git a/configs/rotated_rtmdet/metafile.yml b/configs/rotated_rtmdet/metafile.yml
new file mode 100644
index 000000000..013efde78
--- /dev/null
+++ b/configs/rotated_rtmdet/metafile.yml
@@ -0,0 +1,147 @@
+Collections:
+ - Name: rotated_rtmdet
+ Metadata:
+ Training Data:
+ - DOTAv1.0
+ - HRSC
+ Training Techniques:
+ - AdamW
+ - Flat Cosine Annealing
+ Training Resources: 1x RTX3090 GPUs
+ Architecture:
+ - CSPNeXt
+ - CSPNeXtPAFPN
+ README: configs/rotated_rtmdet/README.md
+
+Models:
+ - Name: rotated_rtmdet_tiny-3x-dota
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 75.60
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota/rotated_rtmdet_tiny-3x-dota-9d821076.pth
+
+ - Name: rotated_rtmdet_tiny-3x-dota_ms
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 79.82
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms/rotated_rtmdet_tiny-3x-dota_ms-f12286ff.pth
+
+ - Name: rotated_rtmdet_s-3x-dota
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_s-3x-dota.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 76.93
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota/rotated_rtmdet_s-3x-dota-11f6ccf5.pth
+
+ - Name: rotated_rtmdet_s-3x-dota_ms
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 79.98
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms/rotated_rtmdet_s-3x-dota_ms-20ead048.pth
+
+ - Name: rotated_rtmdet_m-3x-dota
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_m-3x-dota.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 78.24
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota/rotated_rtmdet_m-3x-dota-beeadda6.pth
+
+ - Name: rotated_rtmdet_m-3x-dota_ms
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 80.26
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms/rotated_rtmdet_m-3x-dota_ms-c71eb375.pth
+
+ - Name: rotated_rtmdet_l-3x-dota
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_l-3x-dota.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 78.85
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota/rotated_rtmdet_l-3x-dota-23992372.pth
+
+ - Name: rotated_rtmdet_l-3x-dota_ms
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 80.54
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms/rotated_rtmdet_l-3x-dota_ms-2738da34.pth
+
+ - Name: rotated_rtmdet_l-coco_pretrain-3x-dota_ms
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms.py
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 81.33
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms/rotated_rtmdet_l-coco_pretrain-3x-dota_ms-06d248a2.pth
+
+ - Name: rotated_rtmdet_tiny-9x-hrsc
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/
+ Metadata:
+ Training Data: HRSC
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: HRSC
+ Metrics:
+ mAP: 90.6
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_tiny-9x-hrsc/rotated_rtmdet_tiny-9x-hrsc-9f2e3ca6.pth
+
+ - Name: rotated_rtmdet_l-100e-aug-dota
+ In Collection: rotated_rtmdet
+ Config: configs/rotated_rtmdet/
+ Metadata:
+ Training Data: DOTAv1.0
+ Results:
+ - Task: Oriented Object Detection
+ Dataset: DOTAv1.0
+ Metrics:
+ mAP: 80.16
+ Weights: https://download.openmmlab.com/mmrotate/v1.0/rotated_rtmdet/rotated_rtmdet_l-100e-aug-dota/rotated_rtmdet_l-100e-aug-dota-bc59fd88.pth
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_l-100e-aug-dota.py b/configs/rotated_rtmdet/rotated_rtmdet_l-100e-aug-dota.py
new file mode 100644
index 000000000..be85e12e8
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_l-100e-aug-dota.py
@@ -0,0 +1,185 @@
+_base_ = [
+ './_base_/default_runtime.py', './_base_/schedule_3x.py',
+ './_base_/dota_rr.py'
+]
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth' # noqa
+
+angle_version = 'le90'
+model = dict(
+ type='mmdet.RTMDet',
+ data_preprocessor=dict(
+ type='mmdet.DetDataPreprocessor',
+ mean=[103.53, 116.28, 123.675],
+ std=[57.375, 57.12, 58.395],
+ bgr_to_rgb=False,
+ boxtype2tensor=False,
+ batch_augments=None),
+ backbone=dict(
+ type='mmdet.CSPNeXt',
+ arch='P5',
+ expand_ratio=0.5,
+ deepen_factor=1,
+ widen_factor=1,
+ channel_attention=True,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU'),
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(
+ type='mmdet.CSPNeXtPAFPN',
+ in_channels=[256, 512, 1024],
+ out_channels=256,
+ num_csp_blocks=3,
+ expand_ratio=0.5,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ bbox_head=dict(
+ type='RotatedRTMDetSepBNHead',
+ num_classes=15,
+ in_channels=256,
+ stacked_convs=2,
+ feat_channels=256,
+ angle_version=angle_version,
+ anchor_generator=dict(
+ type='mmdet.MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
+ bbox_coder=dict(
+ type='DistanceAnglePointCoder', angle_version=angle_version),
+ loss_cls=dict(
+ type='mmdet.QualityFocalLoss',
+ use_sigmoid=True,
+ beta=2.0,
+ loss_weight=1.0),
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ with_objectness=False,
+ exp_on_reg=True,
+ share_conv=True,
+ pred_kernel_size=1,
+ use_hbbox_loss=False,
+ scale_angle=False,
+ loss_angle=None,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ train_cfg=dict(
+ assigner=dict(
+ type='mmdet.DynamicSoftLabelAssigner',
+ iou_calculator=dict(type='RBboxOverlaps2D'),
+ topk=13),
+ allowed_border=-1,
+ pos_weight=-1,
+ debug=False),
+ test_cfg=dict(
+ nms_pre=2000,
+ min_bbox_size=0,
+ score_thr=0.05,
+ nms=dict(type='nms_rotated', iou_threshold=0.1),
+ max_per_img=2000),
+)
+
+train_pipeline = [
+ dict(
+ type='mmdet.LoadImageFromFile',
+ file_client_args={{_base_.file_client_args}}),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(type='mmdet.CachedMosaic', img_scale=(1024, 1024), pad_val=114.0),
+ dict(
+ type='mmdet.RandomResize',
+ resize_type='mmdet.Resize',
+ scale=(2048, 2048),
+ ratio_range=(0.1, 2.0),
+ keep_ratio=True),
+ dict(
+ type='RandomRotate',
+ prob=0.5,
+ angle_range=180,
+ rect_obj_labels=[9, 11]),
+ dict(type='mmdet.RandomCrop', crop_size=(1024, 1024)),
+ dict(type='mmdet.YOLOXHSVRandomAug'),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(
+ type='mmdet.Pad', size=(1024, 1024),
+ pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.CachedMixUp',
+ img_scale=(1024, 1024),
+ ratio_range=(1.0, 1.0),
+ max_cached_images=20,
+ pad_val=(114, 114, 114)),
+ dict(type='mmdet.PackDetInputs')
+]
+
+train_pipeline_stage2 = [
+ dict(
+ type='mmdet.LoadImageFromFile',
+ file_client_args={{_base_.file_client_args}}),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(
+ type='mmdet.RandomResize',
+ resize_type='mmdet.Resize',
+ scale=(1024, 1024),
+ ratio_range=(0.1, 2.0),
+ keep_ratio=True),
+ dict(
+ type='RandomRotate',
+ prob=0.5,
+ angle_range=180,
+ rect_obj_labels=[9, 11]),
+ dict(type='mmdet.RandomCrop', crop_size=(1024, 1024)),
+ dict(type='mmdet.YOLOXHSVRandomAug'),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(
+ type='mmdet.Pad', size=(1024, 1024),
+ pad_val=dict(img=(114, 114, 114))),
+ dict(type='mmdet.PackDetInputs')
+]
+
+# batch_size = (2 GPUs) x (4 samples per GPU) = 8
+train_dataloader = dict(
+ batch_size=4, num_workers=4, dataset=dict(pipeline=train_pipeline))
+
+max_epochs = 100
+stage2_num_epochs = 10
+base_lr = 0.004 / 16
+interval = 20
+
+train_cfg = dict(max_epochs=max_epochs, val_interval=interval)
+
+# learning rate
+param_scheduler = [
+ dict(
+ type='LinearLR',
+ start_factor=1.0e-5,
+ by_epoch=False,
+ begin=0,
+ end=1000),
+ dict(
+ # use cosine lr from 50 to 100 epoch
+ type='CosineAnnealingLR',
+ eta_min=base_lr * 0.05,
+ begin=max_epochs // 2,
+ end=max_epochs,
+ T_max=max_epochs // 2,
+ by_epoch=True,
+ convert_to_iter_based=True),
+]
+
+custom_hooks = [
+ dict(type='mmdet.NumClassCheckHook'),
+ dict(
+ type='EMAHook',
+ ema_type='mmdet.ExpMomentumEMA',
+ momentum=0.0002,
+ update_buffers=True,
+ priority=49),
+ dict(
+ type='mmdet.PipelineSwitchHook',
+ switch_epoch=max_epochs - stage2_num_epochs,
+ switch_pipeline=train_pipeline_stage2)
+]
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_l-300e-aug-hrsc.py b/configs/rotated_rtmdet/rotated_rtmdet_l-300e-aug-hrsc.py
new file mode 100644
index 000000000..f7c44d06d
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_l-300e-aug-hrsc.py
@@ -0,0 +1,183 @@
+_base_ = [
+ './_base_/default_runtime.py', './_base_/schedule_3x.py',
+ './_base_/hrsc_rr.py'
+]
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth' # noqa
+
+angle_version = 'le90'
+model = dict(
+ type='mmdet.RTMDet',
+ data_preprocessor=dict(
+ type='mmdet.DetDataPreprocessor',
+ mean=[103.53, 116.28, 123.675],
+ std=[57.375, 57.12, 58.395],
+ bgr_to_rgb=False,
+ boxtype2tensor=False,
+ batch_augments=None),
+ backbone=dict(
+ type='mmdet.CSPNeXt',
+ arch='P5',
+ expand_ratio=0.5,
+ deepen_factor=1,
+ widen_factor=1,
+ channel_attention=True,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU'),
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(
+ type='mmdet.CSPNeXtPAFPN',
+ in_channels=[256, 512, 1024],
+ out_channels=256,
+ num_csp_blocks=3,
+ expand_ratio=0.5,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ bbox_head=dict(
+ type='RotatedRTMDetSepBNHead',
+ num_classes=1,
+ in_channels=256,
+ stacked_convs=2,
+ feat_channels=256,
+ angle_version=angle_version,
+ anchor_generator=dict(
+ type='mmdet.MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
+ bbox_coder=dict(
+ type='DistanceAnglePointCoder', angle_version=angle_version),
+ loss_cls=dict(
+ type='mmdet.QualityFocalLoss',
+ use_sigmoid=True,
+ beta=2.0,
+ loss_weight=1.0),
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ with_objectness=False,
+ exp_on_reg=True,
+ share_conv=True,
+ pred_kernel_size=1,
+ use_hbbox_loss=False,
+ scale_angle=False,
+ loss_angle=None,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ train_cfg=dict(
+ assigner=dict(
+ type='mmdet.DynamicSoftLabelAssigner',
+ iou_calculator=dict(type='RBboxOverlaps2D'),
+ topk=13),
+ allowed_border=-1,
+ pos_weight=-1,
+ debug=False),
+ test_cfg=dict(
+ nms_pre=2000,
+ min_bbox_size=0,
+ score_thr=0.05,
+ nms=dict(type='nms_rotated', iou_threshold=0.1),
+ max_per_img=2000),
+)
+
+train_pipeline = [
+ dict(
+ type='mmdet.LoadImageFromFile',
+ file_client_args={{_base_.file_client_args}}),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(type='mmdet.CachedMosaic', img_scale=(800, 800), pad_val=114.0),
+ dict(
+ type='mmdet.RandomResize',
+ resize_type='mmdet.Resize',
+ scale=(1600, 1600),
+ ratio_range=(0.1, 2.0),
+ keep_ratio=True),
+ dict(
+ type='RandomRotate',
+ prob=0.5,
+ angle_range=180,
+ rect_obj_labels=[9, 11]),
+ dict(type='mmdet.RandomCrop', crop_size=(1024, 1024)),
+ dict(type='mmdet.YOLOXHSVRandomAug'),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(type='mmdet.Pad', size=(800, 800), pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.CachedMixUp',
+ img_scale=(800, 800),
+ ratio_range=(1.0, 1.0),
+ max_cached_images=20,
+ pad_val=(114, 114, 114)),
+ dict(type='mmdet.PackDetInputs')
+]
+
+train_pipeline_stage2 = [
+ dict(
+ type='mmdet.LoadImageFromFile',
+ file_client_args={{_base_.file_client_args}}),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(
+ type='mmdet.RandomResize',
+ resize_type='mmdet.Resize',
+ scale=(800, 800),
+ ratio_range=(0.1, 2.0),
+ keep_ratio=True),
+ dict(type='RandomRotate', prob=0.5, angle_range=180),
+ dict(type='mmdet.RandomCrop', crop_size=(800, 800)),
+ dict(type='mmdet.YOLOXHSVRandomAug'),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(type='mmdet.Pad', size=(800, 800), pad_val=dict(img=(114, 114, 114))),
+ dict(type='mmdet.PackDetInputs')
+]
+
+# batch_size = (1 GPUs) x (8 samples per GPU) = 8
+train_dataloader = dict(
+ batch_size=8,
+ num_workers=8,
+ dataset=dict(dataset=dict(pipeline=train_pipeline)))
+
+# training schedule, hrsc dataset is repeated 3 times, in
+# `./_base_/hrsc_rr.py`, so the actual epoch = 100 * 3 = 300
+max_epochs = 100
+stage2_num_epochs = 10
+
+# hrsc dataset use larger learning rate for better performance
+base_lr = 0.004 / 2
+interval = 20
+
+train_cfg = dict(max_epochs=max_epochs, val_interval=interval)
+
+# learning rate
+param_scheduler = [
+ dict(
+ type='LinearLR',
+ start_factor=1.0e-5,
+ by_epoch=False,
+ begin=0,
+ end=1000),
+ dict(
+ # use cosine lr from 150 to 300 epoch
+ type='CosineAnnealingLR',
+ eta_min=base_lr * 0.05,
+ begin=max_epochs // 2,
+ end=max_epochs,
+ T_max=max_epochs // 2,
+ by_epoch=True,
+ convert_to_iter_based=True),
+]
+
+custom_hooks = [
+ dict(type='mmdet.NumClassCheckHook'),
+ dict(
+ type='EMAHook',
+ ema_type='mmdet.ExpMomentumEMA',
+ momentum=0.0002,
+ update_buffers=True,
+ priority=49),
+ dict(
+ type='mmdet.PipelineSwitchHook',
+ switch_epoch=max_epochs - stage2_num_epochs,
+ switch_pipeline=train_pipeline_stage2)
+]
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_l-3x-dota.py b/configs/rotated_rtmdet/rotated_rtmdet_l-3x-dota.py
new file mode 100644
index 000000000..7587fbb19
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_l-3x-dota.py
@@ -0,0 +1,79 @@
+_base_ = [
+ './_base_/default_runtime.py', './_base_/schedule_3x.py',
+ './_base_/dota_rr.py'
+]
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth' # noqa
+
+angle_version = 'le90'
+model = dict(
+ type='mmdet.RTMDet',
+ data_preprocessor=dict(
+ type='mmdet.DetDataPreprocessor',
+ mean=[103.53, 116.28, 123.675],
+ std=[57.375, 57.12, 58.395],
+ bgr_to_rgb=False,
+ boxtype2tensor=False,
+ batch_augments=None),
+ backbone=dict(
+ type='mmdet.CSPNeXt',
+ arch='P5',
+ expand_ratio=0.5,
+ deepen_factor=1,
+ widen_factor=1,
+ channel_attention=True,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU'),
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(
+ type='mmdet.CSPNeXtPAFPN',
+ in_channels=[256, 512, 1024],
+ out_channels=256,
+ num_csp_blocks=3,
+ expand_ratio=0.5,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ bbox_head=dict(
+ type='RotatedRTMDetSepBNHead',
+ num_classes=15,
+ in_channels=256,
+ stacked_convs=2,
+ feat_channels=256,
+ angle_version=angle_version,
+ anchor_generator=dict(
+ type='mmdet.MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
+ bbox_coder=dict(
+ type='DistanceAnglePointCoder', angle_version=angle_version),
+ loss_cls=dict(
+ type='mmdet.QualityFocalLoss',
+ use_sigmoid=True,
+ beta=2.0,
+ loss_weight=1.0),
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ with_objectness=False,
+ exp_on_reg=True,
+ share_conv=True,
+ pred_kernel_size=1,
+ use_hbbox_loss=False,
+ scale_angle=False,
+ loss_angle=None,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ train_cfg=dict(
+ assigner=dict(
+ type='mmdet.DynamicSoftLabelAssigner',
+ iou_calculator=dict(type='RBboxOverlaps2D'),
+ topk=13),
+ allowed_border=-1,
+ pos_weight=-1,
+ debug=False),
+ test_cfg=dict(
+ nms_pre=2000,
+ min_bbox_size=0,
+ score_thr=0.05,
+ nms=dict(type='nms_rotated', iou_threshold=0.1),
+ max_per_img=2000),
+)
+
+# batch_size = (2 GPUs) x (4 samples per GPU) = 8
+train_dataloader = dict(batch_size=4, num_workers=4)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms.py b/configs/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms.py
new file mode 100644
index 000000000..d5395b501
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_l-3x-dota_ms.py
@@ -0,0 +1,79 @@
+_base_ = [
+ './_base_/default_runtime.py', './_base_/schedule_3x.py',
+ './_base_/dota_rr_ms.py'
+]
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth' # noqa
+
+angle_version = 'le90'
+model = dict(
+ type='mmdet.RTMDet',
+ data_preprocessor=dict(
+ type='mmdet.DetDataPreprocessor',
+ mean=[103.53, 116.28, 123.675],
+ std=[57.375, 57.12, 58.395],
+ bgr_to_rgb=False,
+ boxtype2tensor=False,
+ batch_augments=None),
+ backbone=dict(
+ type='mmdet.CSPNeXt',
+ arch='P5',
+ expand_ratio=0.5,
+ deepen_factor=1,
+ widen_factor=1,
+ channel_attention=True,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU'),
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(
+ type='mmdet.CSPNeXtPAFPN',
+ in_channels=[256, 512, 1024],
+ out_channels=256,
+ num_csp_blocks=3,
+ expand_ratio=0.5,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ bbox_head=dict(
+ type='RotatedRTMDetSepBNHead',
+ num_classes=15,
+ in_channels=256,
+ stacked_convs=2,
+ feat_channels=256,
+ angle_version=angle_version,
+ anchor_generator=dict(
+ type='mmdet.MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
+ bbox_coder=dict(
+ type='DistanceAnglePointCoder', angle_version=angle_version),
+ loss_cls=dict(
+ type='mmdet.QualityFocalLoss',
+ use_sigmoid=True,
+ beta=2.0,
+ loss_weight=1.0),
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ with_objectness=False,
+ exp_on_reg=True,
+ share_conv=True,
+ pred_kernel_size=1,
+ use_hbbox_loss=False,
+ scale_angle=False,
+ loss_angle=None,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ train_cfg=dict(
+ assigner=dict(
+ type='mmdet.DynamicSoftLabelAssigner',
+ iou_calculator=dict(type='RBboxOverlaps2D'),
+ topk=13),
+ allowed_border=-1,
+ pos_weight=-1,
+ debug=False),
+ test_cfg=dict(
+ nms_pre=2000,
+ min_bbox_size=0,
+ score_thr=0.05,
+ nms=dict(type='nms_rotated', iou_threshold=0.1),
+ max_per_img=2000),
+)
+
+# batch_size = (2 GPUs) x (4 samples per GPU) = 8
+train_dataloader = dict(batch_size=4, num_workers=4)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_l-9x-hrsc.py b/configs/rotated_rtmdet/rotated_rtmdet_l-9x-hrsc.py
new file mode 100644
index 000000000..1ae47c013
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_l-9x-hrsc.py
@@ -0,0 +1,105 @@
+_base_ = [
+ './_base_/default_runtime.py', './_base_/schedule_3x.py',
+ './_base_/hrsc_rr.py'
+]
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth' # noqa
+
+angle_version = 'le90'
+model = dict(
+ type='mmdet.RTMDet',
+ data_preprocessor=dict(
+ type='mmdet.DetDataPreprocessor',
+ mean=[103.53, 116.28, 123.675],
+ std=[57.375, 57.12, 58.395],
+ bgr_to_rgb=False,
+ boxtype2tensor=False,
+ batch_augments=None),
+ backbone=dict(
+ type='mmdet.CSPNeXt',
+ arch='P5',
+ expand_ratio=0.5,
+ deepen_factor=1,
+ widen_factor=1,
+ channel_attention=True,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU'),
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(
+ type='mmdet.CSPNeXtPAFPN',
+ in_channels=[256, 512, 1024],
+ out_channels=256,
+ num_csp_blocks=3,
+ expand_ratio=0.5,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ bbox_head=dict(
+ type='RotatedRTMDetSepBNHead',
+ num_classes=1,
+ in_channels=256,
+ stacked_convs=2,
+ feat_channels=256,
+ angle_version=angle_version,
+ anchor_generator=dict(
+ type='mmdet.MlvlPointGenerator', offset=0, strides=[8, 16, 32]),
+ bbox_coder=dict(
+ type='DistanceAnglePointCoder', angle_version=angle_version),
+ loss_cls=dict(
+ type='mmdet.QualityFocalLoss',
+ use_sigmoid=True,
+ beta=2.0,
+ loss_weight=1.0),
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ with_objectness=False,
+ exp_on_reg=True,
+ share_conv=True,
+ pred_kernel_size=1,
+ use_hbbox_loss=False,
+ scale_angle=False,
+ loss_angle=None,
+ norm_cfg=dict(type='SyncBN'),
+ act_cfg=dict(type='SiLU')),
+ train_cfg=dict(
+ assigner=dict(
+ type='mmdet.DynamicSoftLabelAssigner',
+ iou_calculator=dict(type='RBboxOverlaps2D'),
+ topk=13),
+ allowed_border=-1,
+ pos_weight=-1,
+ debug=False),
+ test_cfg=dict(
+ nms_pre=2000,
+ min_bbox_size=0,
+ score_thr=0.05,
+ nms=dict(type='nms_rotated', iou_threshold=0.1),
+ max_per_img=2000),
+)
+
+# training schedule, hrsc dataset is repeated 3 times, in
+# `./_base_/hrsc_rr.py`, so the actual epoch = 3 * 3 * 12 = 9 * 12
+max_epochs = 3 * 12
+
+# hrsc dataset use larger learning rate for better performance
+base_lr = 0.004 / 2
+
+# learning rate
+param_scheduler = [
+ dict(
+ type='LinearLR',
+ start_factor=1.0e-5,
+ by_epoch=False,
+ begin=0,
+ end=1000),
+ dict(
+ # use cosine lr from 54 to 108 epoch
+ type='CosineAnnealingLR',
+ eta_min=base_lr * 0.05,
+ begin=max_epochs // 2,
+ end=max_epochs,
+ T_max=max_epochs // 2,
+ by_epoch=True,
+ convert_to_iter_based=True),
+]
+
+# optimizer
+optim_wrapper = dict(optimizer=dict(lr=base_lr))
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms.py b/configs/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms.py
new file mode 100644
index 000000000..be32c8c98
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dota_ms.py
@@ -0,0 +1,17 @@
+_base_ = './rotated_rtmdet_l-3x-dota_ms.py'
+
+coco_ckpt = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet_l_8xb32-300e_coco/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=coco_ckpt)),
+ neck=dict(
+ init_cfg=dict(type='Pretrained', prefix='neck.',
+ checkpoint=coco_ckpt)),
+ bbox_head=dict(
+ init_cfg=dict(
+ type='Pretrained', prefix='bbox_head.', checkpoint=coco_ckpt)))
+
+# batch_size = (2 GPUs) x (4 samples per GPU) = 8
+train_dataloader = dict(batch_size=4, num_workers=4)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_m-3x-dota.py b/configs/rotated_rtmdet/rotated_rtmdet_m-3x-dota.py
new file mode 100644
index 000000000..b341449f8
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_m-3x-dota.py
@@ -0,0 +1,18 @@
+_base_ = './rotated_rtmdet_l-3x-dota.py'
+
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ deepen_factor=0.67,
+ widen_factor=0.75,
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(in_channels=[192, 384, 768], out_channels=192, num_csp_blocks=2),
+ bbox_head=dict(
+ in_channels=192,
+ feat_channels=192,
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0)))
+
+# batch_size = (1 GPUs) x (8 samples per GPU) = 8
+train_dataloader = dict(batch_size=8, num_workers=8)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms.py b/configs/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms.py
new file mode 100644
index 000000000..eb4326606
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_m-3x-dota_ms.py
@@ -0,0 +1,18 @@
+_base_ = './rotated_rtmdet_l-3x-dota_ms.py'
+
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ deepen_factor=0.67,
+ widen_factor=0.75,
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(in_channels=[192, 384, 768], out_channels=192, num_csp_blocks=2),
+ bbox_head=dict(
+ in_channels=192,
+ feat_channels=192,
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0)))
+
+# batch_size = (1 GPUs) x (8 samples per GPU) = 8
+train_dataloader = dict(batch_size=8, num_workers=8)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_s-3x-dota.py b/configs/rotated_rtmdet/rotated_rtmdet_s-3x-dota.py
new file mode 100644
index 000000000..41cb55b13
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_s-3x-dota.py
@@ -0,0 +1,20 @@
+_base_ = './rotated_rtmdet_l-3x-dota.py'
+
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-s_imagenet_600e.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ deepen_factor=0.33,
+ widen_factor=0.5,
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(in_channels=[128, 256, 512], out_channels=128, num_csp_blocks=1),
+ bbox_head=dict(
+ in_channels=128,
+ feat_channels=128,
+ exp_on_reg=False,
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ ))
+
+# batch_size = (1 GPUs) x (8 samples per GPU) = 8
+train_dataloader = dict(batch_size=8, num_workers=8)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms.py b/configs/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms.py
new file mode 100644
index 000000000..5574e34fb
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_s-3x-dota_ms.py
@@ -0,0 +1,20 @@
+_base_ = './rotated_rtmdet_l-3x-dota_ms.py'
+
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-s_imagenet_600e.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ deepen_factor=0.33,
+ widen_factor=0.5,
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(in_channels=[128, 256, 512], out_channels=128, num_csp_blocks=1),
+ bbox_head=dict(
+ in_channels=128,
+ feat_channels=128,
+ exp_on_reg=False,
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ ))
+
+# batch_size = (1 GPUs) x (8 samples per GPU) = 8
+train_dataloader = dict(batch_size=8, num_workers=8)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_tiny-300e-aug-hrsc.py b/configs/rotated_rtmdet/rotated_rtmdet_tiny-300e-aug-hrsc.py
new file mode 100644
index 000000000..43a8d92f9
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_tiny-300e-aug-hrsc.py
@@ -0,0 +1,92 @@
+_base_ = './rotated_rtmdet_l-300e-aug-hrsc.py'
+
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ deepen_factor=0.167,
+ widen_factor=0.375,
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(in_channels=[96, 192, 384], out_channels=96, num_csp_blocks=1),
+ bbox_head=dict(
+ in_channels=96,
+ feat_channels=96,
+ exp_on_reg=False,
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ ))
+
+train_pipeline = [
+ dict(
+ type='mmdet.LoadImageFromFile',
+ file_client_args={{_base_.file_client_args}}),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(
+ type='mmdet.CachedMosaic',
+ img_scale=(800, 800),
+ pad_val=114.0,
+ max_cached_images=20,
+ random_pop=False),
+ dict(
+ type='mmdet.RandomResize',
+ resize_type='mmdet.Resize',
+ scale=(1600, 1600),
+ ratio_range=(0.5, 2.0),
+ keep_ratio=True),
+ dict(type='RandomRotate', prob=0.5, angle_range=180),
+ dict(type='mmdet.RandomCrop', crop_size=(800, 800)),
+ dict(type='mmdet.YOLOXHSVRandomAug'),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(type='mmdet.Pad', size=(800, 800), pad_val=dict(img=(114, 114, 114))),
+ dict(
+ type='mmdet.CachedMixUp',
+ img_scale=(800, 800),
+ ratio_range=(1.0, 1.0),
+ max_cached_images=10,
+ random_pop=False,
+ pad_val=(114, 114, 114),
+ prob=0.5),
+ dict(type='mmdet.PackDetInputs')
+]
+
+train_pipeline_stage2 = [
+ dict(
+ type='mmdet.LoadImageFromFile',
+ file_client_args={{_base_.file_client_args}}),
+ dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+ dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+ dict(
+ type='mmdet.RandomResize',
+ resize_type='mmdet.Resize',
+ scale=(800, 800),
+ ratio_range=(0.5, 2.0),
+ keep_ratio=True),
+ dict(type='RandomRotate', prob=0.5, angle_range=180),
+ dict(type='mmdet.RandomCrop', crop_size=(800, 800)),
+ dict(type='mmdet.YOLOXHSVRandomAug'),
+ dict(
+ type='mmdet.RandomFlip',
+ prob=0.75,
+ direction=['horizontal', 'vertical', 'diagonal']),
+ dict(type='mmdet.Pad', size=(800, 800), pad_val=dict(img=(114, 114, 114))),
+ dict(type='mmdet.PackDetInputs')
+]
+
+train_dataloader = dict(dataset=dict(dataset=dict(pipeline=train_pipeline)))
+custom_hooks = [
+ dict(type='mmdet.NumClassCheckHook'),
+ dict(
+ type='EMAHook',
+ ema_type='mmdet.ExpMomentumEMA',
+ momentum=0.0002,
+ update_buffers=True,
+ priority=49),
+ dict(
+ type='mmdet.PipelineSwitchHook',
+ switch_epoch=90,
+ switch_pipeline=train_pipeline_stage2)
+]
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota.py b/configs/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota.py
new file mode 100644
index 000000000..fb573fba4
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota.py
@@ -0,0 +1,20 @@
+_base_ = './rotated_rtmdet_l-3x-dota.py'
+
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ deepen_factor=0.167,
+ widen_factor=0.375,
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(in_channels=[96, 192, 384], out_channels=96, num_csp_blocks=1),
+ bbox_head=dict(
+ in_channels=96,
+ feat_channels=96,
+ exp_on_reg=False,
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ ))
+
+# batch_size = (1 GPUs) x (8 samples per GPU) = 8
+train_dataloader = dict(batch_size=8, num_workers=8)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms.py b/configs/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms.py
new file mode 100644
index 000000000..c422eedd1
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_tiny-3x-dota_ms.py
@@ -0,0 +1,20 @@
+_base_ = './rotated_rtmdet_l-3x-dota_ms.py'
+
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ deepen_factor=0.167,
+ widen_factor=0.375,
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(in_channels=[96, 192, 384], out_channels=96, num_csp_blocks=1),
+ bbox_head=dict(
+ in_channels=96,
+ feat_channels=96,
+ exp_on_reg=False,
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ ))
+
+# batch_size = (1 GPUs) x (8 samples per GPU) = 8
+train_dataloader = dict(batch_size=8, num_workers=8)
diff --git a/configs/rotated_rtmdet/rotated_rtmdet_tiny-9x-hrsc.py b/configs/rotated_rtmdet/rotated_rtmdet_tiny-9x-hrsc.py
new file mode 100644
index 000000000..07b85fb5c
--- /dev/null
+++ b/configs/rotated_rtmdet/rotated_rtmdet_tiny-9x-hrsc.py
@@ -0,0 +1,17 @@
+_base_ = './rotated_rtmdet_l-9x-hrsc.py'
+
+checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth' # noqa
+
+model = dict(
+ backbone=dict(
+ deepen_factor=0.167,
+ widen_factor=0.375,
+ init_cfg=dict(
+ type='Pretrained', prefix='backbone.', checkpoint=checkpoint)),
+ neck=dict(in_channels=[96, 192, 384], out_channels=96, num_csp_blocks=1),
+ bbox_head=dict(
+ in_channels=96,
+ feat_channels=96,
+ exp_on_reg=False,
+ loss_bbox=dict(type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ ))
diff --git a/mmrotate/models/dense_heads/__init__.py b/mmrotate/models/dense_heads/__init__.py
index e88cc2d22..0c4b6ef36 100644
--- a/mmrotate/models/dense_heads/__init__.py
+++ b/mmrotate/models/dense_heads/__init__.py
@@ -9,6 +9,7 @@
from .rotated_fcos_head import RotatedFCOSHead
from .rotated_reppoints_head import RotatedRepPointsHead
from .rotated_retina_head import RotatedRetinaHead
+from .rotated_rtmdet_head import RotatedRTMDetHead, RotatedRTMDetSepBNHead
from .s2a_head import S2AHead, S2ARefineHead
from .sam_reppoints_head import SAMRepPointsHead
@@ -16,5 +17,6 @@
'RotatedRetinaHead', 'OrientedRPNHead', 'RotatedRepPointsHead',
'SAMRepPointsHead', 'AngleBranchRetinaHead', 'RotatedATSSHead',
'RotatedFCOSHead', 'OrientedRepPointsHead', 'R3Head', 'R3RefineHead',
- 'S2AHead', 'S2ARefineHead', 'CFAHead', 'H2RBoxHead'
+ 'S2AHead', 'S2ARefineHead', 'CFAHead', 'H2RBoxHead', 'RotatedRTMDetHead',
+ 'RotatedRTMDetSepBNHead'
]
diff --git a/mmrotate/models/dense_heads/rotated_rtmdet_head.py b/mmrotate/models/dense_heads/rotated_rtmdet_head.py
new file mode 100644
index 000000000..42b7b7770
--- /dev/null
+++ b/mmrotate/models/dense_heads/rotated_rtmdet_head.py
@@ -0,0 +1,861 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import copy
+from typing import List, Optional, Tuple
+
+import torch
+from mmcv.cnn import ConvModule, Scale, is_norm
+from mmdet.models import inverse_sigmoid
+from mmdet.models.dense_heads import RTMDetHead
+from mmdet.models.task_modules import anchor_inside_flags
+from mmdet.models.utils import (filter_scores_and_topk, multi_apply,
+ select_single_mlvl, sigmoid_geometric_mean,
+ unmap)
+from mmdet.structures.bbox import bbox_cxcywh_to_xyxy, cat_boxes, distance2bbox
+from mmdet.utils import (ConfigType, InstanceList, OptConfigType,
+ OptInstanceList, reduce_mean)
+from mmengine import ConfigDict
+from mmengine.model import bias_init_with_prob, constant_init, normal_init
+from mmengine.structures import InstanceData
+from torch import Tensor, nn
+
+from mmrotate.registry import MODELS, TASK_UTILS
+from mmrotate.structures import RotatedBoxes, distance2obb
+
+
+@MODELS.register_module()
+class RotatedRTMDetHead(RTMDetHead):
+ """Detection Head of Rotated RTMDet.
+
+ Args:
+ num_classes (int): Number of categories excluding the background
+ category.
+ in_channels (int): Number of channels in the input feature map.
+ angle_version (str): Angle representations. Defaults to 'le90'.
+ use_hbbox_loss (bool): If true, use horizontal bbox loss and
+ loss_angle should not be None. Default to False.
+ scale_angle (bool): If true, add scale to angle pred branch.
+ Default to True.
+ angle_coder (:obj:`ConfigDict` or dict): Config of angle coder.
+ loss_angle (:obj:`ConfigDict` or dict, Optional): Config of angle loss.
+ """
+
+ def __init__(self,
+ num_classes: int,
+ in_channels: int,
+ angle_version: str = 'le90',
+ use_hbbox_loss: bool = False,
+ scale_angle: bool = True,
+ angle_coder: ConfigType = dict(type='PseudoAngleCoder'),
+ loss_angle: OptConfigType = None,
+ **kwargs) -> None:
+ self.angle_version = angle_version
+ self.use_hbbox_loss = use_hbbox_loss
+ self.is_scale_angle = scale_angle
+ self.angle_coder = TASK_UTILS.build(angle_coder)
+ super().__init__(
+ num_classes,
+ in_channels,
+ # useless, but error
+ loss_centerness=dict(
+ type='mmdet.CrossEntropyLoss',
+ use_sigmoid=True,
+ loss_weight=1.0),
+ **kwargs)
+ if loss_angle is not None:
+ self.loss_angle = MODELS.build(loss_angle)
+ else:
+ self.loss_angle = None
+
+ def _init_layers(self):
+ """Initialize layers of the head."""
+ super()._init_layers()
+ pred_pad_size = self.pred_kernel_size // 2
+ self.rtm_ang = nn.Conv2d(
+ self.feat_channels,
+ self.num_base_priors * self.angle_coder.encode_size,
+ self.pred_kernel_size,
+ padding=pred_pad_size)
+ if self.is_scale_angle:
+ self.scale_angle = Scale(1.0)
+
+ def init_weights(self) -> None:
+ """Initialize weights of the head."""
+ super().init_weights()
+ normal_init(self.rtm_ang, std=0.01)
+
+ def forward(self, feats: Tuple[Tensor, ...]) -> tuple:
+ """Forward features from the upstream network.
+
+ Args:
+ feats (tuple[Tensor]): Features from the upstream network, each is
+ a 4D-tensor.
+
+ Returns:
+ tuple: Usually a tuple of classification scores and bbox prediction
+ - cls_scores (list[Tensor]): Classification scores for all scale
+ levels, each is a 4D-tensor, the channels number is
+ num_base_priors * num_classes.
+ - bbox_preds (list[Tensor]): Box energies / deltas for all scale
+ levels, each is a 4D-tensor, the channels number is
+ num_base_priors * 4.
+ - angle_preds (list[Tensor]): Angle prediction for all scale
+ levels, each is a 4D-tensor, the channels number is
+ num_base_priors * angle_dim.
+ """
+
+ cls_scores = []
+ bbox_preds = []
+ angle_preds = []
+ for idx, (x, scale, stride) in enumerate(
+ zip(feats, self.scales, self.prior_generator.strides)):
+ cls_feat = x
+ reg_feat = x
+
+ for cls_layer in self.cls_convs:
+ cls_feat = cls_layer(cls_feat)
+ cls_score = self.rtm_cls(cls_feat)
+
+ for reg_layer in self.reg_convs:
+ reg_feat = reg_layer(reg_feat)
+
+ if self.with_objectness:
+ objectness = self.rtm_obj(reg_feat)
+ cls_score = inverse_sigmoid(
+ sigmoid_geometric_mean(cls_score, objectness))
+
+ reg_dist = scale(self.rtm_reg(reg_feat).exp()).float() * stride[0]
+ if self.is_scale_angle:
+ angle_pred = self.scale_angle(self.rtm_ang(reg_feat)).float()
+ else:
+ angle_pred = self.rtm_ang(reg_feat).float()
+
+ cls_scores.append(cls_score)
+ bbox_preds.append(reg_dist)
+ angle_preds.append(angle_pred)
+ return tuple(cls_scores), tuple(bbox_preds), tuple(angle_preds)
+
+ def loss_by_feat_single(self, cls_score: Tensor, bbox_pred: Tensor,
+ angle_pred: Tensor, labels: Tensor,
+ label_weights: Tensor, bbox_targets: Tensor,
+ assign_metrics: Tensor, stride: List[int]):
+ """Compute loss of a single scale level.
+
+ Args:
+ cls_score (Tensor): Box scores for each scale level
+ Has shape (N, num_anchors * num_classes, H, W).
+ bbox_pred (Tensor): Decoded bboxes for each scale
+ level with shape (N, num_anchors * 5, H, W) for rbox loss
+ or (N, num_anchors * 4, H, W) for hbox loss.
+ angle_pred (Tensor): Decoded bboxes for each scale
+ level with shape (N, num_anchors * angle_dim, H, W).
+ labels (Tensor): Labels of each anchors with shape
+ (N, num_total_anchors).
+ label_weights (Tensor): Label weights of each anchor with shape
+ (N, num_total_anchors).
+ bbox_targets (Tensor): BBox regression targets of each anchor with
+ shape (N, num_total_anchors, 4).
+ assign_metrics (Tensor): Assign metrics with shape
+ (N, num_total_anchors).
+ stride (List[int]): Downsample stride of the feature map.
+
+ Returns:
+ dict[str, Tensor]: A dictionary of loss components.
+ """
+ assert stride[0] == stride[1], 'h stride is not equal to w stride!'
+ cls_score = cls_score.permute(0, 2, 3, 1).reshape(
+ -1, self.cls_out_channels).contiguous()
+
+ if self.use_hbbox_loss:
+ bbox_pred = bbox_pred.reshape(-1, 4)
+ else:
+ bbox_pred = bbox_pred.reshape(-1, 5)
+ bbox_targets = bbox_targets.reshape(-1, 5)
+
+ labels = labels.reshape(-1)
+ assign_metrics = assign_metrics.reshape(-1)
+ label_weights = label_weights.reshape(-1)
+ targets = (labels, assign_metrics)
+
+ loss_cls = self.loss_cls(
+ cls_score, targets, label_weights, avg_factor=1.0)
+
+ # FG cat_id: [0, num_classes -1], BG cat_id: num_classes
+ bg_class_ind = self.num_classes
+ pos_inds = ((labels >= 0)
+ & (labels < bg_class_ind)).nonzero().squeeze(1)
+
+ if len(pos_inds) > 0:
+ pos_bbox_targets = bbox_targets[pos_inds]
+ pos_bbox_pred = bbox_pred[pos_inds]
+
+ pos_decode_bbox_pred = pos_bbox_pred
+ pos_decode_bbox_targets = pos_bbox_targets
+ if self.use_hbbox_loss:
+ pos_decode_bbox_targets = bbox_cxcywh_to_xyxy(
+ pos_bbox_targets[:, :4])
+
+ # regression loss
+ pos_bbox_weight = assign_metrics[pos_inds]
+
+ loss_angle = angle_pred.sum() * 0
+ if self.loss_angle is not None:
+ angle_pred = angle_pred.reshape(-1,
+ self.angle_coder.encode_size)
+ pos_angle_pred = angle_pred[pos_inds]
+ pos_angle_target = pos_bbox_targets[:, 4:5]
+ pos_angle_target = self.angle_coder.encode(pos_angle_target)
+ if pos_angle_target.dim() == 2:
+ pos_angle_weight = pos_bbox_weight.unsqueeze(-1)
+ else:
+ pos_angle_weight = pos_bbox_weight
+ loss_angle = self.loss_angle(
+ pos_angle_pred,
+ pos_angle_target,
+ weight=pos_angle_weight,
+ avg_factor=1.0)
+
+ loss_bbox = self.loss_bbox(
+ pos_decode_bbox_pred,
+ pos_decode_bbox_targets,
+ weight=pos_bbox_weight,
+ avg_factor=1.0)
+
+ else:
+ loss_bbox = bbox_pred.sum() * 0
+ pos_bbox_weight = bbox_targets.new_tensor(0.)
+ loss_angle = angle_pred.sum() * 0
+
+ return (loss_cls, loss_bbox, loss_angle, assign_metrics.sum(),
+ pos_bbox_weight.sum(), pos_bbox_weight.sum())
+
+ def loss_by_feat(self,
+ cls_scores: List[Tensor],
+ bbox_preds: List[Tensor],
+ angle_preds: List[Tensor],
+ batch_gt_instances: InstanceList,
+ batch_img_metas: List[dict],
+ batch_gt_instances_ignore: OptInstanceList = None):
+ """Compute losses of the head.
+
+ Args:
+ cls_scores (list[Tensor]): Box scores for each scale level
+ Has shape (N, num_anchors * num_classes, H, W).
+ bbox_preds (list[Tensor]): Box predict for each scale
+ level with shape (N, num_anchors * 4, H, W) in
+ [t, b, l, r] format.
+ bbox_preds (list[Tensor]): Angle pred for each scale
+ level with shape (N, num_anchors * angle_dim, H, W).
+ batch_gt_instances (list[:obj:`InstanceData`]): Batch of
+ gt_instance. It usually includes ``bboxes`` and ``labels``
+ attributes.
+ batch_img_metas (list[dict]): Meta information of each image, e.g.,
+ image size, scaling factor, etc.
+ batch_gt_instances_ignore (list[:obj:`InstanceData`], Optional):
+ Batch of gt_instances_ignore. It includes ``bboxes`` attribute
+ data that is ignored during training and testing.
+ Defaults to None.
+
+ Returns:
+ dict[str, Tensor]: A dictionary of loss components.
+ """
+ num_imgs = len(batch_img_metas)
+ featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
+ assert len(featmap_sizes) == self.prior_generator.num_levels
+
+ device = cls_scores[0].device
+ anchor_list, valid_flag_list = self.get_anchors(
+ featmap_sizes, batch_img_metas, device=device)
+ flatten_cls_scores = torch.cat([
+ cls_score.permute(0, 2, 3, 1).reshape(num_imgs, -1,
+ self.cls_out_channels)
+ for cls_score in cls_scores
+ ], 1)
+
+ decoded_bboxes = []
+ decoded_hbboxes = []
+ angle_preds_list = []
+ for anchor, bbox_pred, angle_pred in zip(anchor_list[0], bbox_preds,
+ angle_preds):
+ anchor = anchor.reshape(-1, 4)
+ bbox_pred = bbox_pred.permute(0, 2, 3, 1).reshape(num_imgs, -1, 4)
+ angle_pred = angle_pred.permute(0, 2, 3, 1).reshape(
+ num_imgs, -1, self.angle_coder.encode_size)
+
+ if self.use_hbbox_loss:
+ hbbox_pred = distance2bbox(anchor, bbox_pred)
+ decoded_hbboxes.append(hbbox_pred)
+
+ decoded_angle = self.angle_coder.decode(angle_pred, keepdim=True)
+ bbox_pred = torch.cat([bbox_pred, decoded_angle], dim=-1)
+
+ bbox_pred = distance2obb(
+ anchor, bbox_pred, angle_version=self.angle_version)
+ decoded_bboxes.append(bbox_pred)
+ angle_preds_list.append(angle_pred)
+
+ # flatten_bboxes is rbox, for target assign
+ flatten_bboxes = torch.cat(decoded_bboxes, 1)
+
+ cls_reg_targets = self.get_targets(
+ flatten_cls_scores,
+ flatten_bboxes,
+ anchor_list,
+ valid_flag_list,
+ batch_gt_instances,
+ batch_img_metas,
+ batch_gt_instances_ignore=batch_gt_instances_ignore)
+ (anchor_list, labels_list, label_weights_list, bbox_targets_list,
+ assign_metrics_list) = cls_reg_targets
+
+ if self.use_hbbox_loss:
+ decoded_bboxes = decoded_hbboxes
+
+ (losses_cls, losses_bbox, losses_angle, cls_avg_factors,
+ bbox_avg_factors, angle_avg_factors) = multi_apply(
+ self.loss_by_feat_single, cls_scores, decoded_bboxes,
+ angle_preds_list, labels_list, label_weights_list,
+ bbox_targets_list, assign_metrics_list,
+ self.prior_generator.strides)
+
+ cls_avg_factor = reduce_mean(sum(cls_avg_factors)).clamp_(min=1).item()
+ losses_cls = list(map(lambda x: x / cls_avg_factor, losses_cls))
+
+ bbox_avg_factor = reduce_mean(
+ sum(bbox_avg_factors)).clamp_(min=1).item()
+ losses_bbox = list(map(lambda x: x / bbox_avg_factor, losses_bbox))
+ if self.loss_angle is not None:
+ angle_avg_factors = reduce_mean(
+ sum(angle_avg_factors)).clamp_(min=1).item()
+ losses_angle = list(
+ map(lambda x: x / angle_avg_factors, losses_angle))
+ return dict(
+ loss_cls=losses_cls,
+ loss_bbox=losses_bbox,
+ loss_angle=losses_angle)
+ else:
+ return dict(loss_cls=losses_cls, loss_bbox=losses_bbox)
+
+ def _get_targets_single(self,
+ cls_scores: Tensor,
+ bbox_preds: Tensor,
+ flat_anchors: Tensor,
+ valid_flags: Tensor,
+ gt_instances: InstanceData,
+ img_meta: dict,
+ gt_instances_ignore: Optional[InstanceData] = None,
+ unmap_outputs=True):
+ """Compute regression, classification targets for anchors in a single
+ image.
+
+ Args:
+ cls_scores (list(Tensor)): Box scores for each image.
+ bbox_preds (list(Tensor)): Box energies / deltas for each image.
+ flat_anchors (Tensor): Multi-level anchors of the image, which are
+ concatenated into a single tensor of shape (num_anchors ,4)
+ valid_flags (Tensor): Multi level valid flags of the image,
+ which are concatenated into a single tensor of
+ shape (num_anchors,).
+ gt_instances (:obj:`InstanceData`): Ground truth of instance
+ annotations. It usually includes ``bboxes`` and ``labels``
+ attributes.
+ img_meta (dict): Meta information for current image.
+ gt_instances_ignore (:obj:`InstanceData`, optional): Instances
+ to be ignored during training. It includes ``bboxes`` attribute
+ data that is ignored during training and testing.
+ Defaults to None.
+ unmap_outputs (bool): Whether to map outputs back to the original
+ set of anchors. Defaults to True.
+
+ Returns:
+ tuple: N is the number of total anchors in the image.
+
+ - anchors (Tensor): All anchors in the image with shape (N, 4).
+ - labels (Tensor): Labels of all anchors in the image with shape
+ (N,).
+ - label_weights (Tensor): Label weights of all anchor in the
+ image with shape (N,).
+ - bbox_targets (Tensor): BBox targets of all anchors in the
+ image with shape (N, 5).
+ - norm_alignment_metrics (Tensor): Normalized alignment metrics
+ of all priors in the image with shape (N,).
+ """
+ inside_flags = anchor_inside_flags(flat_anchors, valid_flags,
+ img_meta['img_shape'][:2],
+ self.train_cfg['allowed_border'])
+ if not inside_flags.any():
+ return (None, ) * 7
+ # assign gt and sample anchors
+ anchors = flat_anchors[inside_flags, :]
+
+ pred_instances = InstanceData(
+ scores=cls_scores[inside_flags, :],
+ bboxes=bbox_preds[inside_flags, :],
+ priors=anchors)
+
+ assign_result = self.assigner.assign(pred_instances, gt_instances,
+ gt_instances_ignore)
+
+ sampling_result = self.sampler.sample(assign_result, pred_instances,
+ gt_instances)
+
+ num_valid_anchors = anchors.shape[0]
+ bbox_targets = anchors.new_zeros((*anchors.size()[:-1], 5))
+ labels = anchors.new_full((num_valid_anchors, ),
+ self.num_classes,
+ dtype=torch.long)
+ label_weights = anchors.new_zeros(num_valid_anchors, dtype=torch.float)
+ assign_metrics = anchors.new_zeros(
+ num_valid_anchors, dtype=torch.float)
+
+ pos_inds = sampling_result.pos_inds
+ neg_inds = sampling_result.neg_inds
+ if len(pos_inds) > 0:
+ # point-based
+ pos_bbox_targets = sampling_result.pos_gt_bboxes
+ pos_bbox_targets = pos_bbox_targets.regularize_boxes(
+ self.angle_version)
+ bbox_targets[pos_inds, :] = pos_bbox_targets
+
+ labels[pos_inds] = sampling_result.pos_gt_labels
+ if self.train_cfg['pos_weight'] <= 0:
+ label_weights[pos_inds] = 1.0
+ else:
+ label_weights[pos_inds] = self.train_cfg['pos_weight']
+ if len(neg_inds) > 0:
+ label_weights[neg_inds] = 1.0
+
+ class_assigned_gt_inds = torch.unique(
+ sampling_result.pos_assigned_gt_inds)
+ for gt_inds in class_assigned_gt_inds:
+ gt_class_inds = pos_inds[sampling_result.pos_assigned_gt_inds ==
+ gt_inds]
+ assign_metrics[gt_class_inds] = assign_result.max_overlaps[
+ gt_class_inds]
+
+ # map up to original set of anchors
+ if unmap_outputs:
+ num_total_anchors = flat_anchors.size(0)
+ anchors = unmap(anchors, num_total_anchors, inside_flags)
+ labels = unmap(
+ labels, num_total_anchors, inside_flags, fill=self.num_classes)
+ label_weights = unmap(label_weights, num_total_anchors,
+ inside_flags)
+ bbox_targets = unmap(bbox_targets, num_total_anchors, inside_flags)
+ assign_metrics = unmap(assign_metrics, num_total_anchors,
+ inside_flags)
+ return (anchors, labels, label_weights, bbox_targets, assign_metrics)
+
+ def predict_by_feat(self,
+ cls_scores: List[Tensor],
+ bbox_preds: List[Tensor],
+ angle_preds: List[Tensor],
+ score_factors: Optional[List[Tensor]] = None,
+ batch_img_metas: Optional[List[dict]] = None,
+ cfg: Optional[ConfigDict] = None,
+ rescale: bool = False,
+ with_nms: bool = True) -> InstanceList:
+ """Transform a batch of output features extracted from the head into
+ bbox results.
+ Note: When score_factors is not None, the cls_scores are
+ usually multiplied by it then obtain the real score used in NMS,
+ such as CenterNess in FCOS, IoU branch in ATSS.
+ Args:
+ cls_scores (list[Tensor]): Classification scores for all
+ scale levels, each is a 4D-tensor, has shape
+ (batch_size, num_priors * num_classes, H, W).
+ bbox_preds (list[Tensor]): Box energies / deltas for all
+ scale levels, each is a 4D-tensor, has shape
+ (batch_size, num_priors * 4, H, W).
+ angle_preds (list[Tensor]): Box angle for each scale level
+ with shape (N, num_points * angle_dim, H, W)
+ score_factors (list[Tensor], optional): Score factor for
+ all scale level, each is a 4D-tensor, has shape
+ (batch_size, num_priors * 1, H, W). Defaults to None.
+ batch_img_metas (list[dict], Optional): Batch image meta info.
+ Defaults to None.
+ cfg (ConfigDict, optional): Test / postprocessing
+ configuration, if None, test_cfg would be used.
+ Defaults to None.
+ rescale (bool): If True, return boxes in original image space.
+ Defaults to False.
+ with_nms (bool): If True, do nms before return boxes.
+ Defaults to True.
+ Returns:
+ list[:obj:`InstanceData`]: Object detection results of each image
+ after the post process. Each item usually contains following keys.
+ - scores (Tensor): Classification scores, has a shape
+ (num_instance, )
+ - labels (Tensor): Labels of bboxes, has a shape
+ (num_instances, ).
+ - bboxes (Tensor): Has a shape (num_instances, 5),
+ the last dimension 5 arrange as (x, y, w, h, t).
+ """
+ assert len(cls_scores) == len(bbox_preds)
+
+ if score_factors is None:
+ # e.g. Retina, FreeAnchor, Foveabox, etc.
+ with_score_factors = False
+ else:
+ # e.g. FCOS, PAA, ATSS, AutoAssign, etc.
+ with_score_factors = True
+ assert len(cls_scores) == len(score_factors)
+
+ num_levels = len(cls_scores)
+
+ featmap_sizes = [cls_scores[i].shape[-2:] for i in range(num_levels)]
+ mlvl_priors = self.prior_generator.grid_priors(
+ featmap_sizes,
+ dtype=cls_scores[0].dtype,
+ device=cls_scores[0].device)
+
+ result_list = []
+
+ for img_id in range(len(batch_img_metas)):
+ img_meta = batch_img_metas[img_id]
+ cls_score_list = select_single_mlvl(
+ cls_scores, img_id, detach=True)
+ bbox_pred_list = select_single_mlvl(
+ bbox_preds, img_id, detach=True)
+ angle_pred_list = select_single_mlvl(
+ angle_preds, img_id, detach=True)
+ if with_score_factors:
+ score_factor_list = select_single_mlvl(
+ score_factors, img_id, detach=True)
+ else:
+ score_factor_list = [None for _ in range(num_levels)]
+
+ results = self._predict_by_feat_single(
+ cls_score_list=cls_score_list,
+ bbox_pred_list=bbox_pred_list,
+ angle_pred_list=angle_pred_list,
+ score_factor_list=score_factor_list,
+ mlvl_priors=mlvl_priors,
+ img_meta=img_meta,
+ cfg=cfg,
+ rescale=rescale,
+ with_nms=with_nms)
+ result_list.append(results)
+ return result_list
+
+ def _predict_by_feat_single(self,
+ cls_score_list: List[Tensor],
+ bbox_pred_list: List[Tensor],
+ angle_pred_list: List[Tensor],
+ score_factor_list: List[Tensor],
+ mlvl_priors: List[Tensor],
+ img_meta: dict,
+ cfg: ConfigDict,
+ rescale: bool = False,
+ with_nms: bool = True) -> InstanceData:
+ """Transform a single image's features extracted from the head into
+ bbox results.
+ Args:
+ cls_score_list (list[Tensor]): Box scores from all scale
+ levels of a single image, each item has shape
+ (num_priors * num_classes, H, W).
+ bbox_pred_list (list[Tensor]): Box energies / deltas from
+ all scale levels of a single image, each item has shape
+ (num_priors * 4, H, W).
+ angle_pred_list (list[Tensor]): Box angle for a single scale
+ level with shape (N, num_points * angle_dim, H, W).
+ score_factor_list (list[Tensor]): Score factor from all scale
+ levels of a single image, each item has shape
+ (num_priors * 1, H, W).
+ mlvl_priors (list[Tensor]): Each element in the list is
+ the priors of a single level in feature pyramid. In all
+ anchor-based methods, it has shape (num_priors, 4). In
+ all anchor-free methods, it has shape (num_priors, 2)
+ when `with_stride=True`, otherwise it still has shape
+ (num_priors, 4).
+ img_meta (dict): Image meta info.
+ cfg (mmengine.Config): Test / postprocessing configuration,
+ if None, test_cfg would be used.
+ rescale (bool): If True, return boxes in original image space.
+ Defaults to False.
+ with_nms (bool): If True, do nms before return boxes.
+ Defaults to True.
+ Returns:
+ :obj:`InstanceData`: Detection results of each image
+ after the post process.
+ Each item usually contains following keys.
+ - scores (Tensor): Classification scores, has a shape
+ (num_instance, )
+ - labels (Tensor): Labels of bboxes, has a shape
+ (num_instances, ).
+ - bboxes (Tensor): Has a shape (num_instances, 5),
+ the last dimension 5 arrange as (x, y, w, h, t).
+ """
+ if score_factor_list[0] is None:
+ # e.g. Retina, FreeAnchor, etc.
+ with_score_factors = False
+ else:
+ # e.g. FCOS, PAA, ATSS, etc.
+ with_score_factors = True
+
+ cfg = self.test_cfg if cfg is None else cfg
+ cfg = copy.deepcopy(cfg)
+ img_shape = img_meta['img_shape']
+ nms_pre = cfg.get('nms_pre', -1)
+
+ mlvl_bbox_preds = []
+ mlvl_valid_priors = []
+ mlvl_scores = []
+ mlvl_labels = []
+ if with_score_factors:
+ mlvl_score_factors = []
+ else:
+ mlvl_score_factors = None
+ for level_idx, (
+ cls_score, bbox_pred, angle_pred, score_factor, priors) in \
+ enumerate(zip(cls_score_list, bbox_pred_list, angle_pred_list,
+ score_factor_list, mlvl_priors)):
+
+ assert cls_score.size()[-2:] == bbox_pred.size()[-2:]
+
+ bbox_pred = bbox_pred.permute(1, 2, 0).reshape(-1, 4)
+ angle_pred = angle_pred.permute(1, 2, 0).reshape(
+ -1, self.angle_coder.encode_size)
+ if with_score_factors:
+ score_factor = score_factor.permute(1, 2,
+ 0).reshape(-1).sigmoid()
+ cls_score = cls_score.permute(1, 2,
+ 0).reshape(-1, self.cls_out_channels)
+ if self.use_sigmoid_cls:
+ scores = cls_score.sigmoid()
+ else:
+ # remind that we set FG labels to [0, num_class-1]
+ # since mmdet v2.0
+ # BG cat_id: num_class
+ scores = cls_score.softmax(-1)[:, :-1]
+
+ # After https://github.com/open-mmlab/mmdetection/pull/6268/,
+ # this operation keeps fewer bboxes under the same `nms_pre`.
+ # There is no difference in performance for most models. If you
+ # find a slight drop in performance, you can set a larger
+ # `nms_pre` than before.
+ score_thr = cfg.get('score_thr', 0)
+
+ results = filter_scores_and_topk(
+ scores, score_thr, nms_pre,
+ dict(
+ bbox_pred=bbox_pred, angle_pred=angle_pred, priors=priors))
+ scores, labels, keep_idxs, filtered_results = results
+
+ bbox_pred = filtered_results['bbox_pred']
+ angle_pred = filtered_results['angle_pred']
+ priors = filtered_results['priors']
+
+ decoded_angle = self.angle_coder.decode(angle_pred, keepdim=True)
+ bbox_pred = torch.cat([bbox_pred, decoded_angle], dim=-1)
+
+ if with_score_factors:
+ score_factor = score_factor[keep_idxs]
+
+ mlvl_bbox_preds.append(bbox_pred)
+ mlvl_valid_priors.append(priors)
+ mlvl_scores.append(scores)
+ mlvl_labels.append(labels)
+
+ if with_score_factors:
+ mlvl_score_factors.append(score_factor)
+
+ bbox_pred = torch.cat(mlvl_bbox_preds)
+ priors = cat_boxes(mlvl_valid_priors)
+ bboxes = self.bbox_coder.decode(priors, bbox_pred, max_shape=img_shape)
+
+ results = InstanceData()
+ results.bboxes = RotatedBoxes(bboxes)
+ results.scores = torch.cat(mlvl_scores)
+ results.labels = torch.cat(mlvl_labels)
+ if with_score_factors:
+ results.score_factors = torch.cat(mlvl_score_factors)
+
+ return self._bbox_post_process(
+ results=results,
+ cfg=cfg,
+ rescale=rescale,
+ with_nms=with_nms,
+ img_meta=img_meta)
+
+
+@MODELS.register_module()
+class RotatedRTMDetSepBNHead(RotatedRTMDetHead):
+ """Rotated RTMDetHead with separated BN layers and shared conv layers.
+
+ Args:
+ num_classes (int): Number of categories excluding the background
+ category.
+ in_channels (int): Number of channels in the input feature map.
+ share_conv (bool): Whether to share conv layers between stages.
+ Defaults to True.
+ scale_angle (bool): Does not support in RotatedRTMDetSepBNHead,
+ Defaults to False.
+ norm_cfg (:obj:`ConfigDict` or dict)): Config dict for normalization
+ layer. Defaults to dict(type='BN', momentum=0.03, eps=0.001).
+ act_cfg (:obj:`ConfigDict` or dict)): Config dict for activation layer.
+ Defaults to dict(type='SiLU').
+ pred_kernel_size (int): Kernel size of prediction layer. Defaults to 1.
+ exp_on_reg (bool): Whether to apply exponential on bbox_pred.
+ Defaults to False.
+ """
+
+ def __init__(self,
+ num_classes: int,
+ in_channels: int,
+ share_conv: bool = True,
+ scale_angle: bool = False,
+ norm_cfg: ConfigType = dict(
+ type='BN', momentum=0.03, eps=0.001),
+ act_cfg: ConfigType = dict(type='SiLU'),
+ pred_kernel_size: int = 1,
+ exp_on_reg: bool = False,
+ **kwargs) -> None:
+ self.share_conv = share_conv
+ self.exp_on_reg = exp_on_reg
+ assert scale_angle is False, \
+ 'scale_angle does not support in RotatedRTMDetSepBNHead'
+ super().__init__(
+ num_classes,
+ in_channels,
+ norm_cfg=norm_cfg,
+ act_cfg=act_cfg,
+ pred_kernel_size=pred_kernel_size,
+ scale_angle=False,
+ **kwargs)
+
+ def _init_layers(self) -> None:
+ """Initialize layers of the head."""
+ self.cls_convs = nn.ModuleList()
+ self.reg_convs = nn.ModuleList()
+
+ self.rtm_cls = nn.ModuleList()
+ self.rtm_reg = nn.ModuleList()
+ self.rtm_ang = nn.ModuleList()
+ if self.with_objectness:
+ self.rtm_obj = nn.ModuleList()
+ for n in range(len(self.prior_generator.strides)):
+ cls_convs = nn.ModuleList()
+ reg_convs = nn.ModuleList()
+ for i in range(self.stacked_convs):
+ chn = self.in_channels if i == 0 else self.feat_channels
+ cls_convs.append(
+ ConvModule(
+ chn,
+ self.feat_channels,
+ 3,
+ stride=1,
+ padding=1,
+ conv_cfg=self.conv_cfg,
+ norm_cfg=self.norm_cfg,
+ act_cfg=self.act_cfg))
+ reg_convs.append(
+ ConvModule(
+ chn,
+ self.feat_channels,
+ 3,
+ stride=1,
+ padding=1,
+ conv_cfg=self.conv_cfg,
+ norm_cfg=self.norm_cfg,
+ act_cfg=self.act_cfg))
+ self.cls_convs.append(cls_convs)
+ self.reg_convs.append(reg_convs)
+
+ self.rtm_cls.append(
+ nn.Conv2d(
+ self.feat_channels,
+ self.num_base_priors * self.cls_out_channels,
+ self.pred_kernel_size,
+ padding=self.pred_kernel_size // 2))
+ self.rtm_reg.append(
+ nn.Conv2d(
+ self.feat_channels,
+ self.num_base_priors * 4,
+ self.pred_kernel_size,
+ padding=self.pred_kernel_size // 2))
+ self.rtm_ang.append(
+ nn.Conv2d(
+ self.feat_channels,
+ self.num_base_priors * self.angle_coder.encode_size,
+ self.pred_kernel_size,
+ padding=self.pred_kernel_size // 2))
+ if self.with_objectness:
+ self.rtm_obj.append(
+ nn.Conv2d(
+ self.feat_channels,
+ 1,
+ self.pred_kernel_size,
+ padding=self.pred_kernel_size // 2))
+
+ if self.share_conv:
+ for n in range(len(self.prior_generator.strides)):
+ for i in range(self.stacked_convs):
+ self.cls_convs[n][i].conv = self.cls_convs[0][i].conv
+ self.reg_convs[n][i].conv = self.reg_convs[0][i].conv
+
+ def init_weights(self) -> None:
+ """Initialize weights of the head."""
+ for m in self.modules():
+ if isinstance(m, nn.Conv2d):
+ normal_init(m, mean=0, std=0.01)
+ if is_norm(m):
+ constant_init(m, 1)
+ bias_cls = bias_init_with_prob(0.01)
+ for rtm_cls, rtm_reg, rtm_ang in zip(self.rtm_cls, self.rtm_reg,
+ self.rtm_ang):
+ normal_init(rtm_cls, std=0.01, bias=bias_cls)
+ normal_init(rtm_reg, std=0.01)
+ normal_init(rtm_ang, std=0.01)
+ if self.with_objectness:
+ for rtm_obj in self.rtm_obj:
+ normal_init(rtm_obj, std=0.01, bias=bias_cls)
+
+ def forward(self, feats: Tuple[Tensor, ...]) -> tuple:
+ """Forward features from the upstream network.
+
+ Args:
+ feats (tuple[Tensor]): Features from the upstream network, each is
+ a 4D-tensor.
+
+ Returns:
+ tuple: Usually a tuple of classification scores and bbox prediction
+ - cls_scores (list[Tensor]): Classification scores for all scale
+ levels, each is a 4D-tensor, the channels number is
+ num_base_priors * num_classes.
+ - bbox_preds (list[Tensor]): Box energies / deltas for all scale
+ levels, each is a 4D-tensor, the channels number is
+ num_base_priors * 4.
+ - angle_preds (list[Tensor]): Angle prediction for all scale
+ levels, each is a 4D-tensor, the channels number is
+ num_base_priors * angle_dim.
+ """
+ cls_scores = []
+ bbox_preds = []
+ angle_preds = []
+ for idx, (x, stride) in enumerate(
+ zip(feats, self.prior_generator.strides)):
+ cls_feat = x
+ reg_feat = x
+
+ for cls_layer in self.cls_convs[idx]:
+ cls_feat = cls_layer(cls_feat)
+ cls_score = self.rtm_cls[idx](cls_feat)
+
+ for reg_layer in self.reg_convs[idx]:
+ reg_feat = reg_layer(reg_feat)
+
+ if self.with_objectness:
+ objectness = self.rtm_obj[idx](reg_feat)
+ cls_score = inverse_sigmoid(
+ sigmoid_geometric_mean(cls_score, objectness))
+ if self.exp_on_reg:
+ reg_dist = self.rtm_reg[idx](reg_feat).exp() * stride[0]
+ else:
+ reg_dist = self.rtm_reg[idx](reg_feat) * stride[0]
+
+ angle_pred = self.rtm_ang[idx](reg_feat)
+
+ cls_scores.append(cls_score)
+ bbox_preds.append(reg_dist)
+ angle_preds.append(angle_pred)
+ return tuple(cls_scores), tuple(bbox_preds), tuple(angle_preds)
diff --git a/mmrotate/models/losses/gaussian_dist_loss.py b/mmrotate/models/losses/gaussian_dist_loss.py
index 782441212..91889bb05 100644
--- a/mmrotate/models/losses/gaussian_dist_loss.py
+++ b/mmrotate/models/losses/gaussian_dist_loss.py
@@ -386,7 +386,10 @@ def forward(self,
reduction_override if reduction_override else self.reduction)
if (weight is not None) and (not torch.any(weight > 0)) and (
reduction != 'none'):
- return (pred * weight).sum()
+ # handle different dim of weight
+ if pred.dim() == weight.dim() + 1:
+ weight = weight.unsqueeze(1)
+ return (pred * weight).sum() # 0
if weight is not None and weight.dim() > 1:
assert weight.shape == pred.shape
weight = weight.mean(-1)
diff --git a/mmrotate/models/losses/gaussian_dist_loss_v1.py b/mmrotate/models/losses/gaussian_dist_loss_v1.py
index 1685ae89c..4a9f10601 100644
--- a/mmrotate/models/losses/gaussian_dist_loss_v1.py
+++ b/mmrotate/models/losses/gaussian_dist_loss_v1.py
@@ -213,7 +213,10 @@ def forward(self,
reduction_override if reduction_override else self.reduction)
if (weight is not None) and (not torch.any(weight > 0)) and (
reduction != 'none'):
- return (pred * weight).sum()
+ # handle different dim of weight
+ if pred.dim() == weight.dim() + 1:
+ weight = weight.unsqueeze(1)
+ return (pred * weight).sum() # 0
if weight is not None and weight.dim() > 1:
assert weight.shape == pred.shape
weight = weight.mean(-1)
diff --git a/mmrotate/models/task_modules/coders/distance_angle_point_coder.py b/mmrotate/models/task_modules/coders/distance_angle_point_coder.py
index d456d7202..b92647088 100644
--- a/mmrotate/models/task_modules/coders/distance_angle_point_coder.py
+++ b/mmrotate/models/task_modules/coders/distance_angle_point_coder.py
@@ -95,17 +95,18 @@ def distance2obb(self,
distance,
max_shape=None,
angle_version='oc'):
- distance, angle = distance.split([4, 1], dim=1)
+ distance, angle = distance.split([4, 1], dim=-1)
cos_angle, sin_angle = torch.cos(angle), torch.sin(angle)
+
rot_matrix = torch.cat([cos_angle, -sin_angle, sin_angle, cos_angle],
- dim=1).reshape(-1, 2, 2)
+ dim=-1)
+ rot_matrix = rot_matrix.reshape(*rot_matrix.shape[:-1], 2, 2)
- wh = distance[:, :2] + distance[:, 2:]
- offset_t = (distance[:, 2:] - distance[:, :2]) / 2
- offset_t = offset_t.unsqueeze(2)
- offset = torch.bmm(rot_matrix, offset_t).squeeze(2)
- ctr = points + offset
+ wh = distance[..., :2] + distance[..., 2:]
+ offset_t = (distance[..., 2:] - distance[..., :2]) / 2
+ offset = torch.matmul(rot_matrix, offset_t[..., None]).squeeze(-1)
+ ctr = points[..., :2] + offset
angle_regular = norm_angle(angle, angle_version)
return torch.cat([ctr, wh, angle_regular], dim=-1)
diff --git a/mmrotate/structures/bbox/__init__.py b/mmrotate/structures/bbox/__init__.py
index 98e3a809b..895ade012 100644
--- a/mmrotate/structures/bbox/__init__.py
+++ b/mmrotate/structures/bbox/__init__.py
@@ -4,10 +4,10 @@
rbox2hbox, rbox2qbox)
from .quadri_boxes import QuadriBoxes
from .rotated_boxes import RotatedBoxes
-from .transforms import gaussian2bbox, gt2gaussian, norm_angle
+from .transforms import distance2obb, gaussian2bbox, gt2gaussian, norm_angle
__all__ = [
'QuadriBoxes', 'RotatedBoxes', 'hbox2rbox', 'hbox2qbox', 'rbox2hbox',
'rbox2qbox', 'qbox2hbox', 'qbox2rbox', 'gaussian2bbox', 'gt2gaussian',
- 'norm_angle', 'rbbox_overlaps', 'fake_rbbox_overlaps'
+ 'norm_angle', 'rbbox_overlaps', 'fake_rbbox_overlaps', 'distance2obb'
]
diff --git a/mmrotate/structures/bbox/transforms.py b/mmrotate/structures/bbox/transforms.py
index 6d0d72a12..83c74a620 100644
--- a/mmrotate/structures/bbox/transforms.py
+++ b/mmrotate/structures/bbox/transforms.py
@@ -78,3 +78,35 @@ def gt2gaussian(target):
R = torch.stack([cos_sin * neg, cos_sin[..., [1, 0]]], dim=-2)
return (center, R.matmul(diag).matmul(R.transpose(-1, -2)))
+
+
+def distance2obb(points: torch.Tensor,
+ distance: torch.Tensor,
+ angle_version: str = 'oc'):
+ """Convert distance angle to rotated boxes.
+
+ Args:
+ points (Tensor): Shape (B, N, 2) or (N, 2).
+ distance (Tensor): Distance from the given point to 4
+ boundaries and angle (left, top, right, bottom, angle).
+ Shape (B, N, 5) or (N, 5)
+ angle_version: angle representations.
+ Returns:
+ dict[str, torch.Tensor]: Gaussian distributions.
+ """
+ distance, angle = distance.split([4, 1], dim=-1)
+
+ cos_angle, sin_angle = torch.cos(angle), torch.sin(angle)
+
+ rot_matrix = torch.cat([cos_angle, -sin_angle, sin_angle, cos_angle],
+ dim=-1)
+ rot_matrix = rot_matrix.reshape(*rot_matrix.shape[:-1], 2, 2)
+
+ wh = distance[..., :2] + distance[..., 2:]
+ offset_t = (distance[..., 2:] - distance[..., :2]) / 2
+ offset_t = offset_t.unsqueeze(-1)
+ offset = torch.matmul(rot_matrix, offset_t).squeeze(-1)
+ ctr = points[..., :2] + offset
+
+ angle_regular = norm_angle(angle, angle_version)
+ return torch.cat([ctr, wh, angle_regular], dim=-1)
diff --git a/mmrotate/visualization/local_visualizer.py b/mmrotate/visualization/local_visualizer.py
index 3d54e9fce..ea28c6383 100644
--- a/mmrotate/visualization/local_visualizer.py
+++ b/mmrotate/visualization/local_visualizer.py
@@ -81,8 +81,9 @@ def _draw_instances(self, image: np.ndarray, instances: ['InstanceData'],
'or (n, 8), but get `bboxes` with shape being '
f'{bboxes.shape}.')
+ bboxes = bboxes.cpu()
polygons = bboxes.convert_to('qbox').tensor
- polygons = polygons.reshape(-1, 4, 2).numpy()
+ polygons = polygons.reshape(-1, 4, 2)
polygons = [p for p in polygons]
self.draw_polygons(
polygons,
diff --git a/tests/test_models/test_dense_heads/test_rotated_rtmdet_head.py b/tests/test_models/test_dense_heads/test_rotated_rtmdet_head.py
new file mode 100644
index 000000000..76a69270c
--- /dev/null
+++ b/tests/test_models/test_dense_heads/test_rotated_rtmdet_head.py
@@ -0,0 +1,213 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import unittest
+
+import torch
+from mmdet.models import L1Loss
+from mmengine.structures import InstanceData
+from parameterized import parameterized
+
+from mmrotate.models.dense_heads import (RotatedRTMDetHead,
+ RotatedRTMDetSepBNHead)
+from mmrotate.structures import RotatedBoxes
+from mmrotate.utils import register_all_modules
+
+
+class TestRotatedRTMDetHead(unittest.TestCase):
+
+ def setUp(self):
+ register_all_modules()
+
+ @parameterized.expand([(RotatedRTMDetHead, ), (RotatedRTMDetSepBNHead, )])
+ def test_rotated_rtmdet_head_loss(self, head_cls):
+ """Tests rotated rtmdet head loss when truth is empty and non-empty."""
+ if not torch.cuda.is_available():
+ return unittest.skip('test requires GPU and torch+cuda')
+
+ angle_version = 'le90'
+ s = 256
+ img_metas = [{
+ 'img_shape': (s, s, 3),
+ 'pad_shape': (s, s, 3),
+ 'scale_factor': 1,
+ }]
+ rtm_head = head_cls(
+ num_classes=4,
+ in_channels=1,
+ feat_channels=1,
+ stacked_convs=1,
+ angle_version=angle_version,
+ anchor_generator=dict(
+ type='mmdet.MlvlPointGenerator', offset=0, strides=[8, 16,
+ 32]),
+ bbox_coder=dict(
+ type='DistanceAnglePointCoder', angle_version=angle_version),
+ loss_cls=dict(
+ type='mmdet.QualityFocalLoss',
+ use_sigmoid=True,
+ beta=2.0,
+ loss_weight=1.0),
+ loss_bbox=dict(
+ type='RotatedIoULoss', mode='linear', loss_weight=2.0),
+ with_objectness=False,
+ pred_kernel_size=1,
+ use_hbbox_loss=False,
+ scale_angle=False,
+ loss_angle=None,
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='SiLU'),
+ train_cfg=dict(
+ assigner=dict(
+ type='mmdet.DynamicSoftLabelAssigner',
+ iou_calculator=dict(type='RBboxOverlaps2D'),
+ topk=13),
+ allowed_border=-1,
+ pos_weight=-1,
+ debug=False)).cuda()
+
+ # Rotated RTMDet head expects a multiple levels of features per image
+ feats = (
+ torch.rand(1, 1, s // stride[1], s // stride[0]).cuda()
+ for stride in rtm_head.prior_generator.strides)
+ cls_scores, bbox_preds, angle_preds = rtm_head.forward(feats)
+
+ # Test that empty ground truth encourages the network to
+ # predict background
+ gt_instances = InstanceData()
+ gt_instances.bboxes = torch.empty((0, 5)).cuda()
+ gt_instances.labels = torch.LongTensor([]).cuda()
+
+ empty_gt_losses = rtm_head.loss_by_feat(cls_scores, bbox_preds,
+ angle_preds, [gt_instances],
+ img_metas)
+ # When there is no truth, the cls loss should be nonzero but
+ # box loss and centerness loss should be zero
+ empty_cls_loss = sum(empty_gt_losses['loss_cls'])
+ empty_box_loss = sum(empty_gt_losses['loss_bbox'])
+ self.assertGreater(empty_cls_loss, 0, 'cls loss should be non-zero')
+ self.assertEqual(
+ empty_box_loss, 0,
+ 'there should be no box loss when there are no true boxes')
+
+ # When truth is non-empty then all cls, box loss and centerness loss
+ # should be nonzero for random inputs
+ gt_instances = InstanceData()
+ gt_instances.bboxes = RotatedBoxes(
+ torch.Tensor([[130.6667, 86.8757, 100.6326, 70.8874, 0.2]]).cuda())
+ gt_instances.labels = torch.LongTensor([2]).cuda()
+
+ one_gt_losses = rtm_head.loss_by_feat(cls_scores, bbox_preds,
+ angle_preds, [gt_instances],
+ img_metas)
+ onegt_cls_loss = sum(one_gt_losses['loss_cls'])
+ onegt_box_loss = sum(one_gt_losses['loss_bbox'])
+ self.assertGreater(onegt_cls_loss, 0, 'cls loss should be non-zero')
+ self.assertGreater(onegt_box_loss, 0, 'box loss should be non-zero')
+
+ # Test head with angle_loss
+ rtm_head.loss_angle = L1Loss(loss_weight=0.2)
+ with_ang_losses = rtm_head.loss_by_feat(cls_scores, bbox_preds,
+ angle_preds, [gt_instances],
+ img_metas)
+ with_ang_cls_loss = sum(with_ang_losses['loss_cls'])
+ with_ang_box_loss = sum(with_ang_losses['loss_bbox'])
+ with_ang_ang_loss = sum(with_ang_losses['loss_angle'])
+
+ self.assertGreater(with_ang_cls_loss, 0, 'cls loss should be non-zero')
+ self.assertGreater(with_ang_box_loss, 0, 'box loss should be non-zero')
+ self.assertGreater(with_ang_ang_loss, 0,
+ 'angle loss should be non-zero')
+
+ @parameterized.expand([(RotatedRTMDetHead, ), (RotatedRTMDetSepBNHead, )])
+ def test_rotated_rtmdet_head_loss_with_hbb(self, head_cls):
+ """Tests rotated rtmdet head loss when truth is empty and non-empty."""
+ angle_version = 'le90'
+ s = 256
+ img_metas = [{
+ 'img_shape': (s, s, 3),
+ 'pad_shape': (s, s, 3),
+ 'scale_factor': 1,
+ }]
+ rtm_head = head_cls(
+ num_classes=4,
+ in_channels=1,
+ feat_channels=1,
+ stacked_convs=1,
+ angle_version=angle_version,
+ anchor_generator=dict(
+ type='mmdet.MlvlPointGenerator', offset=0, strides=[8, 16,
+ 32]),
+ bbox_coder=dict(
+ type='DistanceAnglePointCoder', angle_version=angle_version),
+ loss_cls=dict(
+ type='mmdet.QualityFocalLoss',
+ use_sigmoid=True,
+ beta=2.0,
+ loss_weight=1.0),
+ loss_bbox=dict(type='mmdet.IoULoss', loss_weight=1.0),
+ angle_coder=dict(
+ type='CSLCoder',
+ angle_version='le90',
+ omega=1,
+ window='gaussian',
+ radius=1),
+ loss_angle=dict(
+ type='SmoothFocalLoss', gamma=2.0, alpha=0.25,
+ loss_weight=0.2),
+ with_objectness=False,
+ pred_kernel_size=1,
+ use_hbbox_loss=True,
+ scale_angle=False,
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='SiLU'),
+ train_cfg=dict(
+ assigner=dict(
+ type='mmdet.DynamicSoftLabelAssigner',
+ iou_calculator=dict(type='RBboxOverlaps2D'),
+ topk=13),
+ allowed_border=-1,
+ pos_weight=-1,
+ debug=False))
+
+ feats = (
+ torch.rand(1, 1, s // stride[1], s // stride[0])
+ for stride in rtm_head.prior_generator.strides)
+ cls_scores, bbox_preds, angle_preds = rtm_head.forward(feats)
+
+ # Test that empty ground truth encourages the network to
+ # predict background
+ gt_instances = InstanceData()
+ gt_instances.bboxes = torch.empty((0, 5))
+ gt_instances.labels = torch.LongTensor([])
+
+ empty_gt_losses = rtm_head.loss_by_feat(cls_scores, bbox_preds,
+ angle_preds, [gt_instances],
+ img_metas)
+ # When there is no truth, the cls loss should be nonzero but
+ # box loss and centerness loss should be zero
+ empty_cls_loss = sum(empty_gt_losses['loss_cls'])
+ empty_box_loss = sum(empty_gt_losses['loss_bbox'])
+ empty_ang_loss = sum(empty_gt_losses['loss_angle'])
+ self.assertGreater(empty_cls_loss, 0, 'cls loss should be non-zero')
+ self.assertEqual(
+ empty_box_loss, 0,
+ 'there should be no box loss when there are no true boxes')
+ self.assertEqual(
+ empty_ang_loss, 0,
+ 'there should be no angle loss when there are no true boxes')
+
+ # When truth is non-empty then all cls, box loss and centerness loss
+ # should be nonzero for random inputs
+ gt_instances = InstanceData()
+ gt_instances.bboxes = RotatedBoxes(
+ torch.Tensor([[130.6667, 86.8757, 100.6326, 70.8874, 0.2]]))
+ gt_instances.labels = torch.LongTensor([2])
+
+ one_gt_losses = rtm_head.loss_by_feat(cls_scores, bbox_preds,
+ angle_preds, [gt_instances],
+ img_metas)
+ onegt_cls_loss = sum(one_gt_losses['loss_cls'])
+ onegt_box_loss = sum(one_gt_losses['loss_bbox'])
+ onegt_ang_loss = sum(one_gt_losses['loss_angle'])
+ self.assertGreater(onegt_cls_loss, 0, 'cls loss should be non-zero')
+ self.assertGreater(onegt_box_loss, 0, 'box loss should be non-zero')
+ self.assertGreater(onegt_ang_loss, 0, 'angle loss should be non-zero')