GitHub - iSEE-Laboratory/ASAG: This is the official PyTorch implementation of ASAG (ICCV 2023).

ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation

This is the official PyTorch implementation of ASAG (ICCV 2023).

1 Introduction

Recent sparse detectors with multiple, e.g. six, decoder layers achieve promising performance but much inference time due to complex heads. Previous works have explored using dense priors as initialization and built one-decoder-layer detectors. Although they gain remarkable acceleration, their performance still lags behind their six-decoder-layer counterparts by a large margin. In this work, we aim to bridge this performance gap while retaining fast speed. We find that the architecture discrepancy between dense and sparse detectors leads to feature conflict, hampering the performance of one-decoder-layer detectors. Thus we propose Adaptive Sparse Anchor Generator (ASAG) which predicts dynamic anchors on patches rather than grids in a sparse way so that it alleviates the feature conflict problem. For each image, ASAG dynamically selects which feature maps and which locations to predict, forming a fully adaptive way to generate image-specific anchors. Further, a simple and effective Query Weighting method eases the training instability from adaptiveness. Extensive experiments show that our method outperforms dense-initialized ones and achieves a better speed-accuracy trade-off.
Our ASAG starts predicting dynamic anchors from fixed feature maps and then adaptively explores large feature maps using Adaptive Probing, which runs top-down and coarse-to-fine. We can even discard large feature maps manually for efficient inference.

2 Model Zoo

	name	backbone	epoch	#queries	box AP	Where in Our Paper
1	ASAG-A	R50	12	107	42.6	Table 2
2	ASAG-A	R50	12	329	43.6	Table 2
3	ASAG-A	R50	36	102	45.3	Table 4
4	ASAG-A	R50	36	312	46.3	Table 4
5	ASAG-A	R101	36	296	47.5	Table 4
6	ASAG-S	R50	36	100	43.9	Table 3 & 4
7	ASAG-S	R50	36	312	45.0	Table 3 & 4
8	ASAG-A-dn	R50	12	106	43.1	Table A-1
9	ASAG-A-crosscl	R50	12	103	43.8

Notes:
- All the checkpoints and logs are be found in Google Drive / Baidu (pwd: asag)
- Results in the above table are tested on COCO dataset.
- In ASAG, we use 4 parallel decoders, most of which perform similarly (~0.2AP).
- To test speed, users need to slightly modify the code, including:
  - use only one decoder: --num_decoder_layers 1
  - use fast_inference api rather than forward in models/anchor_generator.py

3 Data preparation

Download and extract COCO 2017 train and val images with annotations from here.

We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

4 Usage

To prevent users from confusing different ImageNet pretrained checkpoints, we require users to download the corresponding version of the checkpoint from TorchVision manually. (i.e. R50v1 and R101v1)
Our environment
- NVIDIA RTX 3090
- python: 3.7.12
- Torch: 1.10.2 + cu113
- Torchvision: 0.11.3 + cu113

ASAG-A (1x, R50, 100 queries)

Training


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT

Inference


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100.pth --used_head aux_2

ASAG-A (1x, R50, 300 queries)

Training


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300

Inference


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_300.pth --used_head aux_2 --num_query 300

ASAG-A (3x, R50, 100 queries)

Training


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --training_schedule 3x

Inference


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_3x_100.pth --used_head main

ASAG-A (3x, R50, 300 queries)

Training


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x

Inference


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_3x_300.pth --used_head aux_2 --num_query 300

ASAG-A (3x, R101, 300 queries)

Training


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet101 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x

Inference


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet101 --eval --resume ASAG_A_r101_3x_300.pth --used_head aux_2 --num_query 300

ASAG-S (3x, R50, 100 queries)

Training


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --training_schedule 3x --decoder_type SparseRCNN

Inference


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --decoder_type SparseRCNN --resume ASAG_S_r50_3x_100.pth --used_head aux_2

ASAG-S (3x, R50, 300 queries)

Training


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --num_query 300 --training_schedule 3x --decoder_type SparseRCNN

Inference


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_S_r50_3x_300.pth --used_head aux_2 --num_query 300 --decoder_type SparseRCNN

ASAG-A+dn (1x, R50, 100 queries)

Training


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --use_dn --fix_noise_scale

Inference


python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100_dn.pth --used_head aux_2

5 Efficient inference

Taking ASAG-A (1x, R50, 100 queries) as an example.

--used_inference_level can choose from ['P3P6', 'P4P6', 'P5P6'].

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100.pth --used_head aux_2 --used_inference_level P5P6

6 CrowdHuman Results

	name	AP(↑)	mMR(↓)	R(↑)	Where in Our Paper
1	Deformable DETR	86.7	54.0	92.5	Table 6
2	Sparse RCNN	89.2	48.3	95.9	Table 6
3	ASAG-S	91.3	43.5	96.9	Table 6

We also run ASAG-S on CrowdHuman dataset with R50, 50 epochs and the average number of anchors within 500.
Data preparation. After downloading the dataset, users should first convert the annotations to the coco format by running crowdhumantools/convert_crowdhuman_to_coco.py. Before running it, please make sure the file paths in it are correct.
```
path/to/crowdhuman/
  annotations/  				# annotation json files
  CrowdHuman_train/    	# train images
  CrowdHuman_val/      	# val images
```

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --dataset_file crowdhuman --coco_path YOUR_CROWDHUMAN_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint YOUR_DOWNLOADED_CHECKPOINT --decoder_type SparseRCNN

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --dataset_file crowdhuman --coco_path YOUR_CROWDHUMAN_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_S_crowdhuman.pth --used_head aux_0 --decoder_type SparseRCNN

7 Equipping with stronger backbone

	backbone	AP	APs	APm	APl
1	torchvision R50	42.6	25.9	45.8	56.9
2	CrossCL R50	43.8	26.1	47.4	59.3

We run ASAG-A with our self-supervised pretrained backbone CrossCL under 1x schedule, which can boost ASAG by 1.2 AP.
The pretrained backbone can be found in Google Drive / Baidu (pwd: asag).

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --pretrained_checkpoint crosscl_resnet50.pth

Inference

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --coco_path YOUR_COCO_PATH --batch_size 4 --output_dir output --backbone resnet50 --eval --resume ASAG_A_r50_1x_100_crosscl.pth --used_head aux_2

8 License

ASAG is released under the Apache 2.0 license. Please see the LICENSE file for more information.

9 Bibtex

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{fu2023asag,
  title={ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation},
  author={Fu, Shenghao and Yan, Junkai and Gao, Yipeng and Xie, Xiaohua and Zheng, Wei-Shi},
  booktitle={ICCV},
  year={2023},
}

@inproceedings{yan2023cross,
  title={Self-supervised Cross-stage Regional Contrastive Learning for Object Detection},
  author={Yan, Junkai and Yang, Lingxiao and Gao, Yipeng and Zheng, Wei-Shi},
  booktitle={ICME},
  year={2023},
}

10 Acknowledgement

Our ASAG is heavily inspired by many outstanding prior works, including

Thank the authors of above projects for open-sourcing their implementation codes!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
crowdhumantools		crowdhumantools
datasets		datasets
models		models
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation

1 Introduction

2 Model Zoo

3 Data preparation

4 Usage

5 Efficient inference

6 CrowdHuman Results

7 Equipping with stronger backbone

8 License

9 Bibtex

10 Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

iSEE-Laboratory/ASAG

Folders and files

Latest commit

History

Repository files navigation

ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation

1 Introduction

2 Model Zoo

3 Data preparation

4 Usage

5 Efficient inference

6 CrowdHuman Results

7 Equipping with stronger backbone

8 License

9 Bibtex

10 Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages