IFSeg: Image-free Semantic Segmentation via Vision-Language Model

The official codebase for IFSeg: Image-free Semantic Segmentation via Vision-Language Model (CVPR 2023)

This codebase is largely derived from OFA.

Requirements

Python 3.9.15
PyTorch 1.12.1+cu116
torchvision 0.13.1+cu116
mmsegmentation v0.28.0

Install PyTorch and torchvision

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

Install mmsegmentation

pip install openmim
mim install mmcv-full==1.6.2
git clone https://github.com/open-mmlab/mmsegmentation.git
cd mmsegmentation && git checkout v0.28.0 && pip install -v -e .

Install other dependencies

pip install -e ./custom_fairseq/
pip install -r requirements.txt

Training and Inference

Image Processing

To ensure the efficiency of processing data, we did not store images with small files, but instead we encode them to base64 strings, following the procedure described in OFA datasets preparation guide.

1. Prepare the Dataset and download pretrained checkpoint

Download the COCO Stuff images and annotations from https://cocodataset.org and the build the unseen_val2017.tsv and fineseg_refined_val2017.tsv with the example notebook convert_segmentation_unseen_split.ipynb and convert_segmentation_coco.ipynb.

For ADE20K, download images and annotations from https://groups.csail.mit.edu/vision/datasets/ADE20K/ and build "validation.tsv" with the example notebook convert_segmentation_ade.ipynb.

The pretrianed OFA checkpoint is available at https://github.com/OFA-Sys/OFA/blob/main/checkpoints.md/. Specifically, we require the OFA-Base model; ofa_base.pt.

We recommend that your workspace directory should be organized like this:

IFSeg/
├── criterions/
├── data/
├── dataset/
│   ├── coco/unseen_val2017.tsv; fineseg_refined_val2017.tsv
│   └── ade/ade_valid.tsv
├── custom_fairseq/
├── models/
├── run_scripts/
│   ├── IFSeg/coco_unseen.sh
│   ├── IFSeg/ade.sh
│   └── IFSeg/coco_fine.sh
├── tasks/
├── train.py
├── trainer.py
├── convert_segmentation_coco_unseen_split.ipynb
├── convert_segmentation_coco.ipynb
├── convert_segmentation_ade.ipynb
├── visualize_segmentation_web.ipynb
└── utils/

2. Finetuning and Inference Scripts

For running the image-free experiment for 15 unseen COCO categories (Table 1), refer to run_scripts/IFSeg/coco_unseen.sh

For running the image-free experiment for 150 ADE categories (Table 2), refer to run_scripts/IFSeg/ade.sh

For running the image-free experiment for 171 COCO-stuff categories (Table 3), refer to run_scripts/IFSeg/coco_fine.sh

Visualizing the results

To obtain the web image visualization, follow the directions in visualize_segmentation_web.ipynb

The pre-trained checkpoint for the visualization can be downloaded from https://drive.google.com/file/d/167sIrrSsBTRQlrVHYMKYoWA5A9r04eAD/view?usp=sharing

One may also produce their own checkpoint with novel semantic categories. For example, based on a example script in ./run_scripts/IFSeg, modify category_list and num_seg_tokens for your segmentation setting.

Related Codebase

Citation

Please cite our paper if you find it helpful :)

@inproceedings{yun2023ifseg,
  title     = {IFSeg: Image-free Semantic Segmentation via Vision-Language Model},
  author    = {Sukmin Yun and
               Seong Hyeon Park and
               Paul Hongsuck Seo and
               Jinwoo Shin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IFSeg: Image-free Semantic Segmentation via Vision-Language Model

Requirements

Training and Inference

Image Processing

Visualizing the results

Related Codebase

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
criterions		criterions
custom_fairseq		custom_fairseq
data		data
models		models
ofa_module		ofa_module
run_scripts/IFSeg		run_scripts/IFSeg
tasks		tasks
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
cat_dog.jpeg		cat_dog.jpeg
convert_segmentation_ade.ipynb		convert_segmentation_ade.ipynb
convert_segmentation_coco.ipynb		convert_segmentation_coco.ipynb
convert_segmentation_coco_unseen_split.ipynb		convert_segmentation_coco_unseen_split.ipynb
crf.py		crf.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py
visualize_segmentation_web.ipynb		visualize_segmentation_web.ipynb

alinlab/ifseg

Folders and files

Latest commit

History

Repository files navigation

IFSeg: Image-free Semantic Segmentation via Vision-Language Model

Requirements

Training and Inference

Image Processing

Visualizing the results

Related Codebase

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages