Skip to content

JongMokKim/mix-unmix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection (CVPR2022)

This is the Pytorch implementation of our paper :
MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection
IEEE/CVF International Conference on Computer Vision (CVPR), 2022
[arXiv]

Installtion & Setup

We follow the installation precess of Unbiased Teacher official repo (https://github.com/facebookresearch/unbiased-teacher)

Download the code

  • For your convenience, we provide the code and model weights in zip

Prerequisites

  • Linux or macOS with Python ≥ 3.6
  • PyTorch ≥ 1.5 and torchvision that matches the PyTorch installation.

Build Detectron2 from Source

  • We find the latest(v0.6) package of Detectron2 occur the error with our code.
  • Therefore, please install the matched(v0.5) version of Detectron2 as follows:
# get the Detectron2 v0.5 package
wget https://github.com/facebookresearch/detectron2/archive/refs/tags/v0.5.zip

# unzip
unzip v0.5.zip

# install
python -m pip install -e detectron2-0.5

Install other requirements

pip install -r requirements.txt

Dataset download

  1. Download COCO & VOC dataset

  2. Organize the dataset as following:

mix-unmix/
└── datasets/
    ├── coco/
    │   ├── train2017/
    │   ├── val2017/
    │   └── annotations/
    │   	├── instances_train2017.json
    │   	└── instances_val2017.json
    ├── VOC2007
    │   ├── Annotations
    │   ├── ImageSets
    │   └── JPEGImages
    └── VOC2012
        ├── Annotations
        ├── ImageSets
        └── JPEGImages

Evaluation

  • Performance table and Model Weights (weight files are already included in zip file)
Backbone Protocols AP50 AP50:95 Model Weights
R50-FPN COCO-Standard 1% 40.06 21.89 link
R50-FPN COCO-Additional 63.30 42.11 link
R50-FPN VOC07 (VOC12) 78.94 50.22 link
R50-FPN VOC07 (VOC12 / COCO20cls) 80.45 52.31 link
Swin COCO-Standard 0.5% 34.25 16.52 link
  • Run Evaluation w/ R50 in COCO
python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/coco.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth
  • Run Evaluation w/ R50 in VOC
python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/voc.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth

Train

We use 4 GPUs (A6000 or V100 32GB) to achieve the paper results.

  • Train the MUM under 1% COCO-supervision (ResNet-50)
python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/coco.yaml \
  • Train the MUM under VOC07 as labeled set and VOC12 as unlabeled set
python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/voc.yaml \

Swin

  • Download ImageNet pretrained weight of swin-t in link
  • mv pretrained weight to weights folder
mv swin_tiny_patch4_window7_224.pth weights/
  • Run Evaluation w/ Swin in COCO
python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/coco_swin.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth
      
  • Train under 0.5% COCO-supervision
python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/coco_swin.yaml \

Mix/UnMix code block

Mixing code block

  • Generate mix mask
mask = torch.argsort(torch.rand(bs // ng, ng, nt, nt), dim=1).cuda()
img_mask = mask.view(bs // ng, ng, 1, nt, nt)
img_mask = img_mask.repeat_interleave(3, dim=2)
img_mask = img_mask.repeat_interleave(h // nt, dim=3)
img_mask = img_mask.repeat_interleave(w // nt, dim=4)
  • Mixing image tiles
img_tiled = images.tensor.view(bs // ng, ng, c, h, w)
img_tiled = torch.gather(img_tiled, dim=1, index=img_mask)
img_tiled = img_tiled.view(bs, c, h, w)

Unmixing code block

  • Generate inverse mask to unmix
inv_mask = torch.argsort(mask, dim=1).cuda()
feat_mask = inv_mask.view(bs//ng,ng,1,nt,nt)
feat_mask = feat_mask.repeat_interleave(c,dim=2)
feat_mask = feat_mask.repeat_interleave(h//nt, dim=3)
feat_mask = feat_mask.repeat_interleave(w//nt, dim=4)
  • Unmixing feature tiles
feat_tiled = feat.view(bs//ng,ng,c,h,w)
feat_tiled = torch.gather(feat_tiled, dim=1, index=feat_mask)
feat_tiled = feat_tiled.view(bs,c,h,w)

Acknowledgements

We use Unbiased-teacher official code as our baseline. And also we use Timm repository to implement Swin Transformer easily.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages