Skip to content

Latest commit

 

History

History
246 lines (193 loc) · 14.6 KB

README.md

File metadata and controls

246 lines (193 loc) · 14.6 KB

Team DETR: Guide Queries as a Professional Team in Detection Transformers

conferencecodedoclicense

Authors: Tian Qiu, Linyun Zhou, Wenxiang Xu, Lechao Cheng, Zunlei Feng, Mingli Song
Affiliation: Zhejiang University
Paper Link: [arXiv] / [IEEE ICIP]

Abstract

Recent proposed DETR variants have made tremendous progress in various scenarios due to their streamlined processes and remarkable performance. However, the learned queries usually explore the global context to generate the final set prediction, resulting in redundant burdens and unfaithful results. More specifically, a query is commonly responsible for objects of different scales and positions, which is a challenge for the query itself, and will cause spatial resource competition among queries. To alleviate this issue, we propose Team DETR, which leverages query collaboration and position constraints to embrace objects of interest more precisely. We also dynamically cater to each query member's prediction preference, offering the query better scale and spatial priors. In addition, the proposed Team DETR is flexible enough to be adapted to other existing DETR variants without increasing parameters and calculations. Extensive experiments on the COCO dataset show that Team DETR achieves remarkable gains, especially for small and large objects.

Framework

The proposed Team DETR is based on the basic architecture of DAB-DETR. The CNN backbone is used to extract image features, which are then fused by the transformer encoder. The decoder utilizes several learned queries to match objects for the image features. A query is represented as an anchor box (x, y, w, h) and is dynamically updated based on the offset (Δx, Δy, Δw, Δh) predicted by each decoder layer. Building upon this, we introduce a query teamwork approach in which the queries are grouped, and each group is responsible for objects within a specific scale range. To avoid resource competition, the management area of each query is limited. Furthermore, the prediction preferences of each query are dynamically extracted, and the anchor is updated accordingly.

Model Zoo

Without increasing parameters and calculations, our query teamwork can be easily integrated into DAB-based DETRs, including DAB-DETR, DN-DETR and the single-stage DINO.

[Model Zoo in 百度网盘](提取码:team)

[Model Zoo in Google Drive]

Model w/ Team DETR Epochs AP APs APm APl Params checkpoint & log
DN-DETR-R50 12 37.3 17.2 40.1 55.6 44M
DN-DETR-R50 12 37.7 18.0 40.0 56.8 44M 百度网盘 / Google Drive
DAB-DETR-R50 12 33.7 15.3 36.5 49.7 44M
DAB-DETR-R50 12 35.3 17.3 37.5 52.9 44M 百度网盘 / Google Drive
DAB-DETR-R50 50 42.2 22.5 45.9 60.2 44M
DAB-DETR-R50 50 43.0 24.4 46.1 62.6 44M 百度网盘 / Google Drive
DAB-DETR-R101 12 36.1 17.3 39.5 52.5 63M
DAB-DETR-R101 12 37.4 18.4 40.3 55.5 63M 百度网盘 / Google Drive
DAB-DETR-R101 50 43.3 24.0 47.1 61.2 63M
DAB-DETR-R101 50 44.1 25.0 47.1 63.7 63M 百度网盘 / Google Drive
DINO-4scale-1stage-R50 12 44.5 24.2 48.0 61.2 47M
DINO-4scale-1stage-R50 12 46.3 28.6 48.9 61.2 47M 百度网盘 / Google Drive

Note: The result of DAB-DETR-R50 w/ Team-DETR under the 50-epoch setting is different from which we report in the paper because we lost this checkpoint, and here is the one we retrained.

Usage

Installation

Our code contains three projects, Team-DAB-DETR, Team-DN-DETR, and Team-DINO, based on DAB-DETR, DN-DETR, and DINO, respectively, and no extra dependency is needed. So each of our projects can be installed the same way as its codebase.

[Installation of DAB-DETR]

[Installation of DN-DETR]

[Installation of DINO]

Our experimental environment is python 3.7 & pytorch 1.11.0+cu113. We strongly recommend you use pytorch >= 1.11.0 for its less GPU memory consumption.

Dataset

COCO2017 is used to validate our method. The directory structure is as follows:

COCODIR/
  ├── train2017/
  ├── val2017/
  └── annotations/
      ├── instances_train2017.json
      └── instances_val2017.json

Evaluation

You can download our pre-trained models (百度网盘 / Google Drive) or use your own for evaluation.

In default, we divide the queries into three groups, with the proportions of 65%, 20%, and 15%, corresponding to the relative scales of (0, 0.2], (0.2, 0.4], and (0.4, 1], respectively. --q_splits is to set the proportion of each group. --matcher has two options, ori(original HungarianMatcher) and team(TeamHungarianMatcher).

Note: The evaluation result under different batch sizes will have slight differences. We used 2 GPUs, and the batch size (per GPU) we used to train each model is marked on the checkpoint filename (e.g., b8, b6). If you use the checkpoints we provide for evaluation and want to get the same results as we report, please keep the same setting as ours.

# Team-DAB-DETR and Team-DN-DETR
# multi-gpu
python -m torch.distributed.launch --nproc_per_node=2 main.py \
  --coco_path /path/to/your/COCODIR \
  --resume /path/to/your/checkpoint \
  --output_dir /path/to/your/output/dir \
  --batch_size 8 \
  --matcher team \
  --q_splits 65 20 15 \
  --eval

# single-gpu
python main.py \
  --coco_path /path/to/your/COCODIR \
  --resume /path/to/your/checkpoint \
  --output_dir /path/to/your/output/dir \
  --batch_size 8 \
  --matcher team \
  --q_splits 65 20 15 \
  --eval

# --------------------------------------------

# Team-DINO
# You need to write config and .sh files in advance.
# multi-gpu
bash scripts/DINO_4scale_1stage_team_r50_e12_eval.sh /path/to/your/COCODIR /path/to/your/output/dir /path/to/your/checkpoint

Training

In default, we divide the queries into three groups, with the proportions of 65%, 20%, and 15%, corresponding to the relative scales of (0, 0.2], (0.2, 0.4], and (0.4, 1], respectively. --q_splits is to set the proportion of each group. --matcher has two options, ori(original HungarianMatcher) and team(TeamHungarianMatcher). If you want to change the responsible scale range of each group, you can modify matcher.py for Team-DAB-DETR and Team-DN-DETR or the config file for Team-DINO.

# Team-DAB-DETR and Team-DN-DETR
# multi-gpu (12-epoch setting / 1x setting)
python -m torch.distributed.launch --nproc_per_node=2 main.py \
  --coco_path /path/to/your/COCODIR \
  --output_dir /path/to/your/output/dir \
  --batch_size 8 \
  --epochs 12 \
  --lr_drop 8 \
  --matcher team \
  --q_splits 65 20 15

# multi-gpu (50-epoch setting)
python -m torch.distributed.launch --nproc_per_node=2 main.py \
  --coco_path /path/to/your/COCODIR \
  --output_dir /path/to/your/output/dir \
  --batch_size 8 \
  --epochs 50 \
  --lr_drop 40 \
  --matcher team \
  --q_splits 65 20 15

# single-gpu (12-epoch setting / 1x setting)
python main.py \
  --coco_path /path/to/your/COCODIR \
  --output_dir /path/to/your/output/dir \
  --batch_size 8 \
  --epochs 12 \
  --lr_drop 8 \
  --matcher team \
  --q_splits 65 20 15

# single-gpu (50-epoch setting)
python main.py \
  --coco_path /path/to/your/COCODIR \
  --output_dir /path/to/your/output/dir \
  --batch_size 8 \
  --epochs 50 \
  --lr_drop 40 \
  --matcher team \
  --q_splits 65 20 15

# --------------------------------------------

# Team-DINO
# You need to write config and .sh files in advance.
# multi-gpu
bash scripts/DINO_4scale_1stage_team_r50_e12.sh /path/to/your/COCODIR /path/to/your/output/dir

How to integrate query teamwork into your model

The query teamwork contains three parts: scale-wise grouping, position constraint, and preference extraction.

For details, you can refer to our code. Based on the source code of DAB-DETR / DN-DETR / DINO, every change in our code is clearly marked with "# qt ...". The changes involve main.py, [DABDETR.py,] *transformer.py, matcher.py and engine.py.

Links

Our Team DETR is based on the basic architecture of DAB-DETR and is flexible enough to be adapted to DAB-based DETRs:

  • DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
    Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang
    International Conference on Learning Representations (ICLR) 2022
    [Paper] [Code]
  • DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
    Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
    [Paper] [Code]
  • DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
    Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
    International Conference on Learning Representations (ICLR) 2023
    [Paper] [Code]

License

Team DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Copyright (c) QIU Tian and ZJU-VIPA Lab. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Citation

If you find the paper useful in your research, please consider citing:

@inproceedings{qiu2023teamdetr,
  author={Qiu, Tian and Zhou, Linyun and Xu, Wenxiang and Cheng, Lechao and Feng, Zunlei and Song, Mingli},
  booktitle={IEEE International Conference on Image Processing (ICIP)}, 
  title={Team DETR: Guide Queries as a Professional Team in Detection Transformers}, 
  year={2023},
  pages={450-454},
  doi={10.1109/ICIP49359.2023.10222890}
}