Skip to content

Latest commit

 

History

History
176 lines (145 loc) · 5.98 KB

README.md

File metadata and controls

176 lines (145 loc) · 5.98 KB

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, arxiv

PaddlePaddle training/validation code and pretrained models for Swin Detection.

The official pytorch implementation is here.

This implementation is developed by PaddleViT.

drawing

Swin Model Overview

Update

Update (2021-09-15): Code is released and Mask R-CNN ported weights are uploaded.

Models Zoo

Model backbone box_mAP Model
Mask R-CNN Swin-T 1x 43.7 google/baidu(qev7)
Mask R-CNN Swin-T 3x 46.0 google/baidu(m8fg)
Mask R-CNN Swin-S 3x 48.4 google/baidu(hdw5)
  • The results are evaluated on COCO validation set.
  • 1x/3x is the 'Lr Schd' in the official repo.
  • Backbone model weights can be found in Swin Transformer Classification here

Notebooks

We provide a few notebooks in aistudio to help you get started:

*(coming soon)*

Requirements

Data

COCO2017 dataset is used in the following folder structure:

COCO dataset folder
├── annotations
│   ├── captions_train2017.json
│   ├── captions_val2017.json
│   ├── instances_train2017.json
│   ├── instances_val2017.json
│   ├── person_keypoints_train2017.json
│   └── person_keypoints_val2017.json
├── train2017
│   ├── 000000000009.jpg
│   ├── 000000000025.jpg
│   ├── 000000000030.jpg
│   ├── 000000000034.jpg
|   ...
└── val2017
    ├── 000000000139.jpg
    ├── 000000000285.jpg
    ├── 000000000632.jpg
    ├── 000000000724.jpg
    ...

More details about the COCO dataset can be found here and COCO official dataset.

Usage

To use the model with pretrained weights, download the .pdparam weight file and change related file paths in the following python scripts. The model config files are located in ./configs/.

For example, assume the downloaded weight file is stored in ./mask_rcnn_swin_tiny_patch4_window7.pdparams, to use the swin_t_maskrcnn model in python:

from config import get_config
from swin_det import build_swin_det
# config files in ./configs/
config = get_config('./configs/swin_t_maskrcnn.yaml')
# build model
model = build_swin_det(config)
# load pretrained weights
model_state_dict = paddle.load('./mask_rcnn_swin_tiny_patch4_window7.pdparams')
model.set_dict(model_state_dict)

Evaluation

To evaluate Swin detection model performance on COCO2017 with a single GPU, run the following script using command line:

sh run_eval.sh

or

CUDA_VISIBLE_DEVICES=0 \
python main_single_gpu.py \
    -cfg=./configs/swin_t_maskrcnn.yaml \
    -dataset=coco \
    -batch_size=4 \
    -data_path=/path/to/dataset/coco/val \
    -eval \
    -pretrained=/path/to/pretrained/model/mask_rcnn_swin_tiny_patch4_window7  # .pdparams is NOT needed
Run evaluation using multi-GPUs:
sh run_eval_multi.sh

or

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
    -cfg=./configs/swin_t_maskrcnn.yaml \
    -dataset=coco \
    -batch_size=4 \
    -data_path=/path/to/dataset/coco/val \
    -eval \
    -pretrained=/path/to/pretrained/model/mask_rcnn_swin_tiny_patch4_window7  # .pdparams is NOT needed

Training

To train the Swin detection model on COCO2017 with single GPU, run the following script using command line:

sh run_train.sh

or

CUDA_VISIBLE_DEVICES=1 \
python main_single_gpu.py \
    -cfg=./configs/swin_t_maskrcnn.yaml \
    -dataset=coco \
    -batch_size=2 \
    -data_path=/path/to/dataset/coco/train \
    -pretrained=/path/to/pretrained/model/swin_tiny_patch4_window7_224.pdparams  # .pdparams is NOT needed

The pretrained arguments sets the pretrained backbone weights, which can be found in Swin classification here.

Run training using multi-GPUs:
sh run_train_multi.sh

or

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
    -cfg=./configs/swin_t_maskrcnn.yaml \
    -dataset=coco \
    -batch_size=2 \
    -data_path=/path/to/dataset/coco/train \
    -pretrained=/path/to/pretrained/model/swin_tiny_patch4_window7_224.pdparams  # .pdparams is NOT needed

The pretrained arguments sets the pretrained backbone weights, which can be found in Swin classification here.

Visualization

coming soon

Reference

@article{liu2021swin,
  title={Swin transformer: Hierarchical vision transformer using shifted windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}