English | 简体中文
This repository is an official implementation of the Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement accepeted to CVPR 2024 (score 553). Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen.
💖 If our Salience-DETR is helpful to your researches or projects, please star this repository. Thanks! 🤗
✨Highlights
- We offer a deepened analysis for scale bias and query redundancy issues of two-stage DETR-like methods.
- We present a hierarchical filtering mechanism to reduce the computational complexity under salience supervision. The proposed salience supervision benefits to capture fine-grained object contours even with bounding box annotations.
- Salience DETR achieves +4.0%, +0.2%, and +4.4% AP on three challenging defect detection tasks, and comparable performance (49.2 AP) with about only 70% FLOPs on COCO 2017.
🔎Visualization
- Queries in the two-stage selection of existing DETR-like methods is usually redundant and have scale bias (left).
- Salience supervision benefits to capture object contours even with only bounding box annotations, for both defect detection and object detection tasks (right).
-
[2024-07-18] We release Relation-DETR, a general and strong object detection model that achieves 40+% AP using only 2 epochs and suppresses most SOTA methods including DDQ-DETR, StableDINO, Rank-DETR, MS-DETR. Code and checkpoints are available here.
-
[2024-04-19] Salience DETR with FocalNet-Large achieves 56.8 AP on COCO val2017, config and checkpoint are available!
-
[2024-04-08] Update config and checkpoint of Salience DETR with ConvNeXt-L backbone trained on COCO 2017 (12epoch).
-
[2024-04-01] Our Salience DETR with Swin-L backbone achieves 56.5 AP on COCO 2017 (12epoch). The model config and checkpoint are available.
-
[2024-03-26] We release code of Salience DETR and pretrained weights on COCO 2017 for Salience DETR with ResNet50 backbone.
-
[2024-02-29] Salience DETR is accepted in CVPR2024, and code will be released in the repo. Welcome to your attention!
Model | backbone | mAP | AP50 | AP75 | APS | APM | APL | Download |
---|---|---|---|---|---|---|---|---|
Salience DETR | ResNet50 | 50.0 | 67.7 | 54.2 | 33.3 | 54.4 | 64.4 | config / checkpoint |
Salience DETR | ConvNeXt-L | 54.2 | 72.4 | 59.1 | 38.8 | 58.3 | 69.6 | config / checkpoint |
Salience DETR | Swin-L(IN-22K) | 56.5 | 75.0 | 61.5 | 40.2 | 61.2 | 72.8 | config / checkpoint |
Salience DETR | FocalNet-L(IN-22K) | 57.3 | 75.5 | 62.3 | 40.9 | 61.8 | 74.5 | config / checkpoint |
Model | backbone | mAP | AP50 | AP75 | APS | APM | APL | Download |
---|---|---|---|---|---|---|---|---|
Salience DETR | ResNet50 | 51.2 | 68.9 | 55.7 | 33.9 | 55.5 | 65.6 | config / checkpoint |
-
Clone the repository locally:
git clone https://github.com/xiuqhou/Salience-DETR.git cd Salience-DETR/
-
Create a conda environment and activate it:
conda create -n salience_detr python=3.8 conda activate salience_detr
-
Install PyTorch and Torchvision following the instruction on https://pytorch.org/get-started/locally/. The code requires
python>=3.8, torch>=1.11.0, torchvision>=0.12.0
.conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
-
Install other dependencies with:
conda install --file requirements.txt -c conda-forge
That's all, you don't need to compile CUDA operators mannually since we load it automatically when running for the first time.
Please download COCO 2017 or prepare your own datasets into data/
, and organize them as following. You can use tools/visualize_datasets.py
to visualize the dataset annotations to verify its correctness.
coco/
├── train2017/
├── val2017/
└── annotations/
├── instances_train2017.json
└── instances_val2017.json
Example for visualization
python tools/visualize_datasets.py \
--coco-img data/coco/val2017 \
--coco-ann data/coco/annotations/instances_val2017.json \
--show-dir visualize_dataset/
We use accelerate
package to natively handle multi GPUs, use CUDA_VISIBLE_DEVICES
to specify GPU/GPUs. If not specified, the script will use all available GPUs on the node to train.
CUDA_VISIBLE_DEVICES=0 accelerate launch main.py # train with 1 GPU
CUDA_VISIBLE_DEVICES=0,1 accelerate launch main.py # train with 2 GPUs
Before start training, modify parameters in configs/train_config.py
.
A simple example for train config
from torch import optim
from datasets.coco import CocoDetection
from transforms import presets
from optimizer import param_dict
# Commonly changed training configurations
num_epochs = 12 # train epochs
batch_size = 2 # total_batch_size = #GPU x batch_size
num_workers = 4 # workers for pytorch DataLoader
pin_memory = True # whether pin_memory for pytorch DataLoader
print_freq = 50 # frequency to print logs
starting_epoch = 0
max_norm = 0.1 # clip gradient norm
output_dir = None # path to save checkpoints, default for None: checkpoints/{model_name}
find_unused_parameters = False # useful for debugging distributed training
# define dataset for train
coco_path = "data/coco" # /PATH/TO/YOUR/COCODIR
train_transform = presets.detr # see transforms/presets to choose a transform
train_dataset = CocoDetection(
img_folder=f"{coco_path}/train2017",
ann_file=f"{coco_path}/annotations/instances_train2017.json",
transforms=train_transform,
train=True,
)
test_dataset = CocoDetection(
img_folder=f"{coco_path}/val2017",
ann_file=f"{coco_path}/annotations/instances_val2017.json",
transforms=None, # the eval_transform is integrated in the model
)
# model config to train
model_path = "configs/salience_detr/salience_detr_resnet50_800_1333.py"
# specify a checkpoint folder to resume, or a pretrained ".pth" to finetune, for example:
# checkpoints/salience_detr_resnet50_800_1333/train/2024-03-22-09_38_50
# checkpoints/salience_detr_resnet50_800_1333/train/2024-03-22-09_38_50/best_ap.pth
resume_from_checkpoint = None
learning_rate = 1e-4 # initial learning rate
optimizer = optim.AdamW(lr=learning_rate, weight_decay=1e-4, betas=(0.9, 0.999))
lr_scheduler = optim.lr_scheduler.MultiStepLR(milestones=[10], gamma=0.1)
# This define parameter groups with different learning rate
param_dicts = param_dict.finetune_backbone_and_linear_projection(lr=learning_rate)
To evaluate a model with one or more GPUs, specify CUDA_VISIBLE_DEVICES
, dataset
, model
and checkpoint
.
CUDA_VISIBLE_DEVICES=<gpu_ids> accelerate launch test.py --coco-path /path/to/coco --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth
Optional parameters are as follows, see test.py for full parameters:
--show-dir
: path to save detection visualization results.--result
: specify a file to save detection numeric results, end with.json
.
An example for evaluation
To evaluate salience_detr_resnet50_800_1333
on coco
using 8 GPUs, save predictions to result.json
and visualize results to visualization/
:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch test.py
--coco-path data/coco \
--model-config configs/salience_detr/salience_detr_resnet50_800_1333.py \
--checkpoint https://github.com/xiuqhou/Salience-DETR/releases/download/v1.0.0/salience_detr_resnet50_800_1333_coco_1x.pth \
--result result.json \
--show-dir visualization/
Evaluate a json result file
To evaluate the json result file obtained above, specify the --result
but not specify --model
.
CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --coco-path /path/to/coco --result /path/to/result.json
Optional parameters, see test.py for full parameters:
--show-dir
: path to save detection visualization results.
Use inference.py
to perform inference on images. You should specify the image directory using --image-dir
.
python inference.py --image-dir /path/to/images --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth --show-dir /path/to/dir
An example for inference on an image folder
To performa inference for images under images/
and save visualizations to visualization/
:
python inference.py \
--image-dir images/ \
--model-config configs/salience_detr/salience_detr_resnet50_800_1333.py \
--checkpoint checkpoint.pth \
--show-dir visualization/
See inference.ipynb
for inference on single image and visualization.
To test the inference speed, memory cost and parameters of a model, use tools/benchmark_model.py
.
python tools/benchmark_model.py --model-config configs/salience_detr/salience_detr_resnet50_800_1333.py
To train your own datasets, there are some things to do before training:
-
Prepare your datasets with COCO annotation format, and modify
coco_path
inconfigs/train_config.py
accordingly. -
Open model configs under
configs/salience_detr
and modify thenum_classes
to a number larger thanmax_category_id + 1
of your dataset. For example, from the following annotation ininstances_val2017.json
, we can find the maximum category_id is90
for COCO, so we setnum_classes = 91
.{"supercategory": "indoor","id": 90,"name": "toothbrush"}
You can simply set
num_classes
to a large enough number if not sure what to set. (For example,num_classes = 92
ornum_classes = 365
also work for COCO.) -
If necessary, modify other parameters in model configs under
configs/salience_detr
andtrain_config.py
.
For advanced users who want to deploy our model, we provide a script to export an ONNX file.
python tools/pytorch2onnx.py \
--model-config /path/to/model.py \
--checkpoint /path/to/checkpoint.pth \
--save-file /path/to/save.onnx \
--simplify \ # use onnxsim to simplify the exported onnx file
--verify # verify the error between onnx model and pytorch model
For inference using the ONNX file, see ONNXDetector
in tools/pytorch2onnx.py
If you find our work helpful for your research, please consider citing:
@InProceedings{Hou_2024_CVPR,
author = {Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong},
title = {Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {17574-17583}
}
@inproceedings{hou2024relation,
title={Relation DETR: Exploring Explicit Position Relation Prior for Object Detection},
author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong and Lan, Xuguang},
booktitle={European conference on computer vision},
year={2024},
organization={Springer}
}