Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection/Object Detection
by Xubin Zhong, Changxing Ding, Zijian Li and Shaoli Huang.
This repository contains the official implementation of the paper "Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection", which is accepted to ECCV2022.
To the best of our knowledge, HQM is the first approach that promotes the robustness of DETR-based models from the perspective of hard example mining. Moreover, HQM is plug-and-play and can be readily applied to many DETR-based HOI detection methods.
An efficient code implemenation of GBS on CDN is available /code_path/CDN/exp/train_hico.sh. Adding GBS, CDN-S can achieve 32.29 mAP within 60 epochs.
Our implementation uses external libraries such as NumPy, PyTorch and 8 2080Ti GPUs.You can resolve the dependencies with the following command.
pip install numpy
pip install -r requirements.txt
HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz
) to the data
directory.
Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.
HQM
|─ data
│ └─ hico_20160224_det
| |─ annotations
| | |─ trainval_hico.json
| | |─ test_hico.json
| | └─ corre_hico.npy
: :
|─ params
│ └─ detr-r50-pre.pth
The annotations file and pre-trained weights can be downloaded here.
python -m torch.distributed.launch \
--nproc_per_node=8 \
--use_env \
main.py \
--hoi \
--dataset_file hico_gt \
--model_name HQM \
--hoi_path data/hico_20160224_det/ \
--num_obj_classes 80 \
--num_verb_classes 117 \
--backbone resnet50 \
--set_cost_bbox 2.5 \
--set_cost_giou 1 \
--bbox_loss_coef 2.5 \
--giou_loss_coef 1 \
--find_unused_parameters \
--AJL
You can conduct the evaluation with trained parameters as follows. The trained parameters are available here.
python -m torch.distributed.launch \
--nproc_per_node=8 \
--use_env \
main.py \
--hoi \
--dataset_file hico_gt \
--model_name HQM \
--hoi_path data/hico_20160224_det/ \
--num_obj_classes 80 \
--num_verb_classes 117 \
--backbone resnet50 \
--set_cost_bbox 2.5 \
--set_cost_giou 1 \
--bbox_loss_coef 2.5 \
--giou_loss_coef 1 \
--find_unused_parameters \
--AJL \
--eval \
--resume params/checkpoint_best.pth
The results are like below:
"test_mAP": 0.313470564574163, "test_mAP rare": 0.26546478777620686, "test_mAP non-rare": 0.32780995244887723
test_mAP
, test_mAP rare
, and test_mAP non-rare
are the results of the default full, rare, and non-rare setting, respectively.
HOI Detection HICO-DET.
Full (D) | Rare (D) | Non-rare (D) | Full(KO) | Rare (KO) | Non-rare (KO) | |
---|---|---|---|---|---|---|
HOTR + HQM (ResNet50) | 25.69 | 24.70 | 25.98 | 28.24 | 27.35 | 28.51 |
QPIC + HQM (ResNet50) | 31.34 | 26.54 | 32.78 | 34.09 | 29.63 | 35.42 |
CDN-S + HQM (ResNet50) | 32.47 | 28.15 | 33.76 | 35.17 | 30.73 | 36.50 |
D: Default, KO: Known object
HOI Detection V-COCO.
Scenario 1 | |
---|---|
ours (ResNet50) | 63.6 |
Object Detection COCO.
AP | AP_0.5 | AP_0.75 | AP_S | AP_M | AP_L | |
---|---|---|---|---|---|---|
SMCA | 35.08 | 56.47 | 35.91 | 15.14 | 38.01 | 54.51 |
SMCA + HQM | 36.48 | 57.02 | 38.19 | 16.48 | 40.62 | 54.91 |
Please consider citing our papers if it helps your research.
@inproceedings{zhong_eccv2022,
author = {Zhong, Xubin and Ding, Changxing and Li, Zijian and Huang, Shaoli},
title = {Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection},
booktitle={ECCV},
year = {2022},
}
@InProceedings{Qu_2022_CVPR,
author = {Qu, Xian and Ding, Changxing and Li, Xingao and Zhong, Xubin and Tao, Dacheng},
title = {Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {19558-19567}
}
@inproceedings{zhang2022accelerating,
title={Accelerating DETR convergence via semantic-aligned matching},
author={Zhang, Gongjie and Luo, Zhipeng and Yu, Yingchen and Cui, Kaiwen and Lu, Shijian},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={949--958},
year={2022}
}