Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add indoor monocular 3d object detector (ImVoxelNet on SUN RGB-D) #1738

Merged
merged 12 commits into from
Aug 17, 2022
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,10 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md).
<li><a href="configs/pgd">PGD (CoRL'2021)</a></li>
<li><a href="configs/monoflex">MonoFlex (CVPR'2021)</a></li>
</ul>
<li><b>Indoor</b></li>
<ul>
<li><a href="configs/imvoxelnet">ImVoxelNet (WACV'2022)</a></li>
</ul>
</td>
<td>
<li><b>Outdoor</b></li>
Expand Down
4 changes: 4 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,10 @@ MMDetection3D 是一个基于 PyTorch 的目标检测开源工具箱, 下一代
<li><a href="configs/pgd">PGD (CoRL'2021)</a></li>
<li><a href="configs/monoflex">MonoFlex (CVPR'2021)</a></li>
</ul>
<li><b>室内</b></li>
<ul>
<li><a href="configs/imvoxelnet">ImVoxelNet (WACV'2022)</a></li>
</ul>
</td>
<td>
<li><b>室外</b></li>
Expand Down
22 changes: 14 additions & 8 deletions configs/imvoxelnet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,25 +14,31 @@ In this paper, we introduce the task of multi-view RGB-based 3D object detection

## Introduction

We implement a monocular 3D detector ImVoxelNet and provide its results and checkpoints on KITTI dataset.
Results for SUN RGB-D, ScanNet and nuScenes are currently available in ImVoxelNet authors
[repo](https://github.com/saic-vul/imvoxelnet) (based on mmdetection3d).
We implement a monocular 3D detector ImVoxelNet and provide its results and checkpoints on KITTI and SUN RGB-D datasets.
Inference time is given for a single NVidia RTX 3090 GPU. Results for ScanNet and nuScenes are currently available in ImVoxelNet authors [repo](https://github.com/saic-vul/imvoxelnet) (based on mmdetection3d).

## Results and models

### KITTI

| Backbone | Class | Lr schd | Mem (GB) | Inf time (fps) | mAP | Download |
| :---------------------------------------: | :---: | :-----: | :------: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [ResNet-50](./imvoxelnet_kitti-3d-car.py) | Car | 3x | | | 17.26 | [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvoxelnet/imvoxelnet_4x8_kitti-3d-car/imvoxelnet_4x8_kitti-3d-car_20210830_003014-3d0ffdf4.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvoxelnet/imvoxelnet_4x8_kitti-3d-car/imvoxelnet_4x8_kitti-3d-car_20210830_003014.log.json) |
| Backbone | Class | Lr schd | Mem (GB) | Inf time (fps) | mAP | Download |
| :-------------------------------------------: | :---: | :-----: | :------: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [ResNet-50](./imvoxelnet_4x8_kitti-3d-car.py) | Car | 3x | 14.8 | 8.4 | 17.26 | [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvoxelnet/imvoxelnet_4x8_kitti-3d-car/imvoxelnet_4x8_kitti-3d-car_20210830_003014-3d0ffdf4.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvoxelnet/imvoxelnet_4x8_kitti-3d-car/imvoxelnet_4x8_kitti-3d-car_20210830_003014.log.json) |

### SUN RGB-D

| Backbone | Lr schd | Mem (GB) | Inf time (fps) | [email protected] | [email protected] | Download |
| :-------------------------------------------------: | :-----: | :------: | :------------: | :------: | :-----: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [ResNet-50](./imvoxelnet_4x2_sunrgbd-3d-10class.py) | 2x | 7.2 | 22.5 | 40.96 | 13.50 | [model](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvoxelnet/imvoxelnet_4x2_sunrgbd-3d-10class/imvoxelnet_4x2_sunrgbd-3d-10class_20220809_184416-29ca7d2e.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvoxelnet/imvoxelnet_4x2_sunrgbd-3d-10class/imvoxelnet_4x2_sunrgbd-3d-10class_20220809_184416.log.json) |

## Citation

```latex
@article{rukhovich2021imvoxelnet,
title={ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection},
author={Danila Rukhovich, Anna Vorontsova, Anton Konushin},
journal={arXiv preprint arXiv:2106.01178},
year={2021}
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={2397--2406},
year={2022}
}
```
127 changes: 127 additions & 0 deletions configs/imvoxelnet/imvoxelnet_4x2_sunrgbd-3d-10class.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
prior_generator = dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[-3.2, -0.2, -2.28, 3.2, 6.2, 0.28]],
rotations=[.0])
model = dict(
type='ImVoxelNet',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=4),
neck_3d=dict(
type='IndoorImVoxelNeck',
in_channels=256,
out_channels=128,
n_blocks=[1, 1, 1]),
bbox_head=dict(
type='ImVoxelHead',
n_classes=10,
n_levels=3,
n_channels=128,
n_reg_outs=7,
pts_assign_threshold=27,
pts_center_threshold=18,
prior_generator=prior_generator),
prior_generator=prior_generator,
n_voxels=[40, 40, 16],
coord_type='DEPTH',
train_cfg=dict(),
test_cfg=dict(nms_pre=1000, iou_thr=.25, score_thr=.01))
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

dataset_type = 'SUNRGBDDataset'
data_root = 'data/sunrgbd/'
class_names = ('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser',
'night_stand', 'bookshelf', 'bathtub')

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations3D'),
dict(
type='Resize',
img_scale=[(512, 384), (768, 576)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['img', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', img_scale=(640, 480), keep_ratio=True),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img'])
]

data = dict(
samples_per_gpu=4,
workers_per_gpu=4,
train=dict(
type='RepeatDataset',
times=2,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'sunrgbd_infos_train.pkl',
pipeline=train_pipeline,
classes=class_names,
filter_empty_gt=True,
box_type_3d='Depth')),
val=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'sunrgbd_infos_val.pkl',
pipeline=test_pipeline,
classes=class_names,
test_mode=True,
box_type_3d='Depth'),
test=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'sunrgbd_infos_val.pkl',
pipeline=test_pipeline,
classes=class_names,
test_mode=True,
box_type_3d='Depth'))

optimizer = dict(
type='AdamW',
lr=0.0001,
weight_decay=0.0001,
paramwise_cfg=dict(
custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0)}))
optimizer_config = dict(grad_clip=dict(max_norm=35., norm_type=2))
lr_config = dict(policy='step', step=[8, 11])
total_epochs = 12

checkpoint_config = dict(interval=1, max_keep_ckpts=1)
log_config = dict(
interval=50,
hooks=[dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')])
evaluation = dict(interval=1)
dist_params = dict(backend='nccl')
find_unused_parameters = True # only 1 of 4 FPN outputs is used
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
3 changes: 2 additions & 1 deletion configs/imvoxelnet/imvoxelnet_4x8_kitti-3d-car.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@
loss_dir=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
n_voxels=[216, 248, 12],
anchor_generator=dict(
coord_type='LIDAR',
prior_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[-0.16, -39.68, -3.08, 68.96, 39.68, 0.76]],
rotations=[.0]),
Expand Down
27 changes: 22 additions & 5 deletions configs/imvoxelnet/metafile.yml
Original file line number Diff line number Diff line change
@@ -1,29 +1,46 @@
Collections:
- Name: ImVoxelNet
Metadata:
Training Data: KITTI
Training Techniques:
- AdamW
Training Resources: 8x Tesla P40
Architecture:
- Anchor3DHead
- ResNet
Paper:
URL: https://arxiv.org/abs/2106.01178
Title: 'ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection'
README: configs/imvoxelnet/README.md
Code:
URL: https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/models/detectors/imvoxelnet.py#L11
Version: v0.15.0
Version: v1.0.0

Models:
- Name: imvoxelnet_4x8_kitti-3d-car
In Collection: ImVoxelNet
Config: configs/imvoxelnet/imvoxelnet_4x8_kitti-3d-car.py
Metadata:
Training Memory (GB): 15.0
Training Data: KITTI
Training Resources: 8x Tesla V100
Training Memory (GB): 14.8
Architecture:
- Anchor3DHead
Results:
- Task: 3D Object Detection
Dataset: KITTI
Metrics:
mAP: 17.26
Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvoxelnet/imvoxelnet_4x8_kitti-3d-car/imvoxelnet_4x8_kitti-3d-car_20210830_003014-3d0ffdf4.pth

- Name: imvoxelnet_4x2_sunrgbd-3d-10class
In Collection: ImVoxelNet
Config: configs/imvoxelnet/imvoxelnet_4x2_sunrgbd-3d-10class.py
Metadata:
Training Data: SUNRGBD
Training Resources: 2x Tesla P40
Training Memory (GB): 7.2
Results:
- Task: 3D Object Detection
Dataset: SUNRGBD
Metrics:
[email protected]: 40.96
[email protected]: 13.50
Weights: https://download.openmmlab.com/mmdetection3d/v1.0.0_models/imvoxelnet/imvoxelnet_4x2_sunrgbd-3d-10class/imvoxelnet_4x2_sunrgbd-3d-10class_20220809_184416-29ca7d2e.pth
3 changes: 2 additions & 1 deletion mmdet3d/models/dense_heads/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from .fcos_mono3d_head import FCOSMono3DHead
from .free_anchor3d_head import FreeAnchor3DHead
from .groupfree3d_head import GroupFree3DHead
from .imvoxel_head import ImVoxelHead
from .monoflex_head import MonoFlexHead
from .parta2_rpn_head import PartA2RPNHead
from .pgd_head import PGDHead
Expand All @@ -22,5 +23,5 @@
'SSD3DHead', 'BaseConvBboxHead', 'CenterHead', 'ShapeAwareHead',
'BaseMono3DDenseHead', 'AnchorFreeMono3DHead', 'FCOSMono3DHead',
'GroupFree3DHead', 'PointRPNHead', 'SMOKEMono3DHead', 'PGDHead',
'MonoFlexHead', 'FCAF3DHead'
'MonoFlexHead', 'FCAF3DHead', 'ImVoxelHead'
]
Loading