Enhancing Open-Vocabulary Object Detection through Region-Word and Region-Vision Matching

Installation

This project is based on MMDetection 3.x

It requires the following OpenMMLab packages:

MMEngine >= 0.6.0
MMCV-full >= v2.0.0rc4
MMDetection >= v3.0.0rc6
lvisap

Usage

Obtain CLIP Checkpoints

We use CLIP's ViT-B-32 model for the implementation of our method. Obtain the state_dict of the model from GoogleDrive and put it under checkpoints.

Training and Testing

Data preparation

Prepare data following MMdetection. Obtain the json files for OV-COCO from GoogleDrive and put them under data/coco/yichen.The data structure looks like:

checkpoints/
├── clip_vitb32.pth
data/
├── coco
│   ├── annotations
│   │   ├── instances_{train,val}2017.json
│   ├── yichen
│   │   ├── instances_train2017_base.json
│   │   ├── instances_val2017_base.json
│   │   ├── instances_val2017_novel.json
│   │   ├── captions_train2017_tags_allcaps.json
│   ├── train2017
│   ├── val2017
│   ├── test2017

Otherwise, generate the json files using the following scripts:

python tools/pre_processors/keep_coco_base.py \
      --json_path data/coco/annotations/instances_train2017.json \
      --out_path data/coco/yichen/instances_train2017_base.json

python tools/pre_processors/keep_coco_base.py \
      --json_path data/coco/annotations/instances_val2017.json \
      --out_path data/coco/yichen/instances_val2017_base.json

python tools/pre_processors/keep_coco_novel.py \
      --json_path data/coco/annotations/instances_val2017.json \
      --out_path data/coco/yichen/instances_val2017_novel.json

The json file for caption supervision captions_train2017_tags_allcaps.json is obtained following Detic. Put it under data/coco/yichen.

Training

RWM training

Train the detector based on FasterRCNN+ResNet50C4.

CUDA_VISIBLE_DEVICES=0,1,2,3  python -m torch.distributed.launch --nproc_per_node=4 \
./tools/train.py /home/think4090/cy/RWVM-main/configs/rwvm/ov_coco/rwvm_kd_faster_rcnn_r50_caffe_c4_90k.py --launcher pytorch

RVM training

Train the detector based on FasterRCNN+ResNet50C4

CUDA_VISIBLE_DEVICES=0,1,2,3  python -m torch.distributed.launch --nproc_per_node=4 \
./tools/train.py /home/think4090/cy/RWVM-main/configs/rwvm/ov_coco/rwvm_kd_faster_rcnn_r50_caffe_c4_90k.py --launcher pytorch

Testing

OV-COCO

The implementation based on MMDet3.x achieves better results compared to the results reported in the paper. To test the models, run

python ./tools/test.py \ 
path/to/the/cfg/file path/to/the/checkpoint

Acknowledgment

We thank the authors and contributors of BARON and MMdetection.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
demo		demo
image		image
ovdet		ovdet
tools		tools
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Open-Vocabulary Object Detection through Region-Word and Region-Vision Matching

Installation

Usage

Training and Testing

Data preparation

Training

RWM training

RVM training

Testing

OV-COCO

Acknowledgment

About

Releases

Packages

Languages

xiaoyi728/RWVM

Folders and files

Latest commit

History

Repository files navigation

Enhancing Open-Vocabulary Object Detection through Region-Word and Region-Vision Matching

Installation

Usage

Training and Testing

Data preparation

Training

RWM training

RVM training

Testing

OV-COCO

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages