Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeCamp2023-683]Support grounding dino #10907

Merged
merged 21 commits into from
Sep 18, 2023

Conversation

YanxingLiu
Copy link
Contributor

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

GLIP: Grounded Language-Image Pre-training

Abstract

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object detection on novel categories, we propose to also perform evaluations on referring expression comprehension for objects specified with attributes. Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a 52.5 AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. It sets a new record on the ODinW zero-shot benchmark with a mean 26.1 AP.

Installation

cd $MMDETROOT

# source installation
pip install -r requirements/multimodal.txt

# or mim installation
mim install mmdet[multimodal]
cd $MMDETROOT

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

python projects/GroundingDINO/tools/model_converters/groundingdino_to_mmdet.py \
		groundingdino_swint_ogc.pth \
		weights/groundingdino_swint_ogc_mmdet.pth
# this script will generate a model $WEIGHT_FILE in $MMDETROOT

python demo/image_demo.py \
	demo/demo.jpg \
	projects/GroundingDINO/configs/groundingdino/groundingdino_swin-t.py \
	--weights  $WEIGHT_FILE \
	--texts 'bench . car .'

Results and Models

Model backbone COCO mAP Pre-Train Data Config Download
Grounding DINO-T Swin-T 48.5 O365,GoldG,Cap4M config model
Grounding DINO-B Swin-B 56.9 COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO config model

Note:

  1. The weights corresponding to the zero-shot model are adopted from the official weights and converted using the script. We have not retrained the model for the time being.

configs/grounding_dino/README.md Outdated Show resolved Hide resolved
configs/grounding_dino/README.md Outdated Show resolved Hide resolved
mmdet/models/detectors/grounding_dino.py Outdated Show resolved Hide resolved
mmdet/models/detectors/grounding_dino.py Show resolved Hide resolved
mmdet/models/utils/vlfuse_helper.py Show resolved Hide resolved
mmdet/models/utils/vlfuse_helper.py Show resolved Hide resolved
configs/grounding_dino/README.md Outdated Show resolved Hide resolved
configs/grounding_dino/README.md Outdated Show resolved Hide resolved
mmdet/models/detectors/grounding_dino.py Show resolved Hide resolved
mmdet/models/detectors/grounding_dino.py Outdated Show resolved Hide resolved
mmdet/models/detectors/grounding_dino.py Outdated Show resolved Hide resolved
@hhaAndroid hhaAndroid merged commit 073626f into open-mmlab:dev-3.x Sep 18, 2023
1 of 2 checks passed
@YanxingLiu YanxingLiu changed the title Support grounding dino [CodeCamp2023-683]Support grounding dino Sep 18, 2023
@CDchenlin
Copy link

CDchenlin commented Sep 23, 2023

Does the grouding DINO support finetune?

@YanxingLiu
Copy link
Contributor Author

YanxingLiu commented Sep 25, 2023

We did not test the training phase as the original code was not open for training related content. If you want to try to fine-tune it, you may need to modify some files. There is a pull request you can refer to: #10954. Thank you for your interest.

yumion pushed a commit to yumion/mmdetection that referenced this pull request Jan 31, 2024
yumion pushed a commit to yumion/mmdetection that referenced this pull request Jan 31, 2024
@SoulProficiency
Copy link

What are the minimum equipment requirements of fine-tunning ground DINO with coco dataset?(FP32&total parameters&batch-size≥32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants