This is the official PyTorch implementation of ASAG (ICCV 2023).
- Recent sparse detectors with multiple, e.g. six, decoder layers achieve promising performance but much inference time due to complex heads. Previous works have explored using dense priors as initialization and built one-decoder-layer detectors. Although they gain remarkable acceleration, their performance still lags behind their six-decoder-layer counterparts by a large margin. In this work, we aim to bridge this performance gap while retaining fast speed. We find that the architecture discrepancy between dense and sparse detectors leads to feature conflict, hampering the performance of one-decoder-layer detectors. Thus we propose Adaptive Sparse Anchor Generator (ASAG) which predicts dynamic anchors on patches rather than grids in a sparse way so that it alleviates the feature conflict problem. For each image, ASAG dynamically selects which feature maps and which locations to predict, forming a fully adaptive way to generate image-specific anchors. Further, a simple and effective Query Weighting method eases the training instability from adaptiveness. Extensive experiments show that our method outperforms dense-initialized ones and achieves a better speed-accuracy trade-off.
- Our ASAG starts predicting dynamic anchors from fixed feature maps and then adaptively explores large feature maps using Adaptive Probing, which runs top-down and coarse-to-fine. We can even discard large feature maps manually for efficient inference.
name | backbone | epoch | #queries | box AP | Where in Our Paper | |
---|---|---|---|---|---|---|
1 | ASAG-A | R50 | 12 | 107 | 42.6 | Table 2 |
2 | ASAG-A | R50 | 12 | 329 | 43.6 | Table 2 |
3 | ASAG-A | R50 | 36 | 102 | 45.3 | Table 4 |
4 | ASAG-A | R50 | 36 | 312 | 46.3 | Table 4 |
5 | ASAG-A | R101 | 36 | 296 | 47.5 | Table 4 |
6 | ASAG-S | R50 | 36 | 100 | 43.9 | Table 3 & 4 |
7 | ASAG-S | R50 | 36 | 312 | 45.0 | Table 3 & 4 |
8 | ASAG-A-dn | R50 | 12 | 106 | 43.1 | Table A-1 |
9 | ASAG-A-crosscl | R50 | 12 | 103 | 43.8 |
-
Notes:
- All the checkpoints and logs are be found in Google Drive / Baidu (pwd: asag)
- Results in the above table are tested on COCO dataset.
- In ASAG, we use 4 parallel decoders, most of which perform similarly (~0.2AP).
- To test speed, users need to slightly modify the code, including:
- use only one decoder:
--num_decoder_layers 1
- use
fast_inference
api rather thanforward
inmodels/anchor_generator.py
- use only one decoder:
Download and extract COCO 2017 train and val images with annotations from here.
We expect the directory structure to be the following:
path/to/coco/
annotations/ # annotation json files
train2017/ # train images
val2017/ # val images
- To prevent users from confusing different ImageNet pretrained checkpoints, we require users to download the corresponding version of the checkpoint from TorchVision manually. (i.e. R50v1 and R101v1)
- Our environment
- NVIDIA RTX 3090
- python: 3.7.12
- Torch: 1.10.2 + cu113
- Torchvision: 0.11.3 + cu113
ASAG-A (1x, R50, 100 queries)
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100.pth --used_head aux_2
ASAG-A (1x, R50, 300 queries)
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_300.pth --used_head aux_2 --num_query 300
ASAG-A (3x, R50, 100 queries)
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --training_schedule 3x
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_3x_100.pth --used_head main
ASAG-A (3x, R50, 300 queries)
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_3x_300.pth --used_head aux_2 --num_query 300
ASAG-A (3x, R101, 300 queries)
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet101 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet101 --eval --resume ASAG_A_r101_3x_300.pth --used_head aux_2 --num_query 300
ASAG-S (3x, R50, 100 queries)
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --training_schedule 3x --decoder_type SparseRCNN
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --decoder_type SparseRCNN --resume ASAG_S_r50_3x_100.pth --used_head aux_2
ASAG-S (3x, R50, 300 queries)
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x --decoder_type SparseRCNN
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_S_r50_3x_300.pth --used_head aux_2 --num_query 300 --decoder_type SparseRCNN
ASAG-A+dn (1x, R50, 100 queries)
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --use_dn --fix_noise_scale
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100_dn.pth --used_head aux_2
-
Taking ASAG-A (1x, R50, 100 queries) as an example.
-
--used_inference_level
can choose from['P3P6', 'P4P6', 'P5P6']
.python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100.pth --used_head aux_2 --used_inference_level P5P6
name | AP(↑) | mMR(↓) | R(↑) | Where in Our Paper | |
---|---|---|---|---|---|
1 | Deformable DETR | 86.7 | 54.0 | 92.5 | Table 6 |
2 | Sparse RCNN | 89.2 | 48.3 | 95.9 | Table 6 |
3 | ASAG-S | 91.3 | 43.5 | 96.9 | Table 6 |
-
We also run ASAG-S on CrowdHuman dataset with R50, 50 epochs and the average number of anchors within 500.
-
Data preparation. After downloading the dataset, users should first convert the annotations to the coco format by running
crowdhumantools/convert_crowdhuman_to_coco.py
. Before running it, please make sure the file paths in it are correct.path/to/crowdhuman/ annotations/ # annotation json files CrowdHuman_train/ # train images CrowdHuman_val/ # val images
-
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --dataset_file crowdhuman --coco_path YOUR_CROWDHUMAN_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --decoder_type SparseRCNN
-
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --dataset_file crowdhuman --coco_path YOUR_CROWDHUMAN_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_S_crowdhuman.pth --used_head aux_0 --decoder_type SparseRCNN
backbone | AP | APs | APm | APl | |
---|---|---|---|---|---|
1 | torchvision R50 | 42.6 | 25.9 | 45.8 | 56.9 |
2 | CrossCL R50 | 43.8 | 26.1 | 47.4 | 59.3 |
-
We run ASAG-A with our self-supervised pretrained backbone CrossCL under 1x schedule, which can boost ASAG by 1.2 AP.
-
The pretrained backbone can be found in Google Drive / Baidu (pwd: asag).
-
Training
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint crosscl_resnet50.pth
-
Inference
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100_crosscl.pth --used_head aux_2
ASAG is released under the Apache 2.0 license. Please see the LICENSE file for more information.
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@inproceedings{fu2023asag,
title={ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation},
author={Fu, Shenghao and Yan, Junkai and Gao, Yipeng and Xie, Xiaohua and Zheng, Wei-Shi},
booktitle={ICCV},
year={2023},
}
@inproceedings{yan2023cross,
title={Self-supervised Cross-stage Regional Contrastive Learning for Object Detection},
author={Yan, Junkai and Yang, Lingxiao and Gao, Yipeng and Zheng, Wei-Shi},
booktitle={ICME},
year={2023},
}
Our ASAG is heavily inspired by many outstanding prior works, including
Thank the authors of above projects for open-sourcing their implementation codes!