GET STARTED AGAGIN

NOTE: This code is based on HRNet-Object-Detecton

Comparison with original code

SyncBN: using NVIDIA/apex SyncBN
Multi-scale training (pad): training detection models with SyncBN and multi-scale training will crash to terrible results (mAP=0.0 or no boxes are predicted). Fortunatedly, we've investigated a method to solve it by padding input images of different scales(600*1000, 800*1333, 1000*1600) to a fixed scale 1000*1600 and keeping the original aspect ratios.
Multi-scale training (SimpleDet version): we've implemented multi-scale training strategy used in SimpleDet
Multi-node & multi-gpu training: we've tested our code when training with multiple nodes (ONLY on AZURE!). Providing a MASTER IP and PORT. Training without SyncBN will reach normal results while training with SyncBN will fail.

Start

SyncBN

ResNet:

see configs/syncbn/faster_rcnn_r50_fpn_sync_1x.py

change normalize in model config:

normalize = dict(type='SyncBN', frozen=False)

HRNet

HRNet in this repo doesn't support normalize but HRNet in mmdetection supports it.

see configs/hrnet/faster_rcnn_hrnetv2p_w18_sync_1x.py

change backbone.type to SyncHighResolutionNet.

results

Backbone	LR Schd	SyncBN	mAP
HRNetV2-W18	1x	No	36.1
HRNetV2-W18	1x	Yes	37.2

relevant files

mmdet/models/backbones/hrnet_sync.py
mmdet/models/utils/norm.py

Multi-scale training (pad)

see configs/hrnet/faster_rcnn_hrnetv2p_w18_syncbn_16batch_mstrain_pad_1x.py

set maximum padding size pad_size
set scales for multi-scale training.

data = dict(
    imgs_per_gpu=4,
    workers_per_gpu=8,
    pad_size=(1600, 1024),
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'images/train2017.zip',
        img_scale=[(1600, 1000), (1000, 600), (1333, 800)],
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=True,
        with_label=True),

results

Backbone	LR Schd	SyncBN	MSTrain	mAP	model
HRNetV2-W18	1x	No	Yes	36.5
HRNetV2-W18	1x	Yes	Yes	37.6	model
HRNetV2-W18	2x	Yes	Yes	39.4	model
HRNetV2-W32	1x	Yes	Yes	41.0	model
HRNetV2-W32	2x	Yes	Yes	42.6	model

mmdet/datasets/loader/collate.py
mmdet/datasets/loader/build_loader.py

Multi-scale training (SimpleDet version)

see configs/hrnet/faster_rcnn_hrnetv2p_w18_randresizecrop_1x.py

set maximum padding size pad_size=(1200,800)
set scales for multi-scale training.
add extra data augmentation

    imgs_per_gpu=2,
    workers_per_gpu=4,
    pad_size=(1216, 800),
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017.zip',
        img_scale=(1200, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=1,
        extra_aug=dict(
            rand_resize_crop=dict(
                scales=[[1400, 600], [1400, 800], [1400, 1000]],
                size=[1200, 800]
            )),
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=True,
        with_label=True),

results

Backbone	LR Schd	SyncBN	MSTrain	mAP	model
HRNetV2-W18	1x	No	Yes	36.3
HRNetV2-W18	1x	Yes	Yes	37.6	model

relevant files

mmdet/datasets/loader/collate.py
mmdet/datasets/loader/build_loader.py
mmdet/datasets/zip_dataset.py
mmdet/datasets/extra_aug.py

Multi-node & multi-gpu training

Note: ONLY on AZURE!

Set environment variables

export NCCL_SOCKET_IFNAME=eth0 设置network interface
export NCCL_DEBUG=INFO  提供更多的nccl信息
export NCCL_IB_DISABLE=1 关掉IB

Running (different node has different node_rank)

python -m torch.distributed.launch --nproc_per_node=4 --nnodes=2 --node_rank=0 tools/train.py <CONFIG-FILE> --launcher pytorch --master_ip_port tcp://xx.xx.xx.xx:1234

relevant files

mmdet/apis/env.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GET_STARTED_AGAIN.md

GET_STARTED_AGAIN.md

GET STARTED AGAGIN

Comparison with original code

Start

SyncBN

Multi-scale training (pad)

Multi-scale training (SimpleDet version)

Multi-node & multi-gpu training

Files

GET_STARTED_AGAIN.md

Latest commit

History

GET_STARTED_AGAIN.md

File metadata and controls

GET STARTED AGAGIN

Comparison with original code

Start

SyncBN

Multi-scale training (pad)

Multi-scale training (SimpleDet version)

Multi-node & multi-gpu training