Official PyTorch implementation for NamedMask. Details can be found in the paper.
[paper
]
[poster
]
[project page
]
Please find our demo built with Hugging Face and Gradio.
Please download datasets of interest first by visiting the following links:
- Cityscapes
- CoCA
- COCO2017
- VOC2012
- (Optional) ImageNet2012 (for an index dataset used in training)
It is worth noting that Cityscapes and ImageNet2012 require you to sign up an account. In addition, you need to download ImageNet2012 if you want to train NamedMask yourself.
We advise you to put the downloaded dataset(s) into the following directory structure for ease of implementation:
{your_dataset_directory}
├──cityscapes
│ ├──gtFine
│ ├──leftImg8bit
├──coca
│ ├──binary
│ ├──image
├──coco2017
│ ├──annotations
│ ├──train2017
│ ├──val2017
├──ImageNet2012
│ ├──train
│ ├──val
├──ImageNet-S
│ ├──ImageNetS50
│ ├──ImageNetS300
│ ├──ImageNetS919
├──VOCdevkit
├──VOC2012
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c conda-forge tqdm
conda install -c conda-forge matplotlib
conda install -c anaconda ujson
conda install -c conda-forge pyyaml
conda install -c conda-forge pycocotools
conda install -c anaconda scipy
pip install opencv-python
pip install git+https://github.com/openai/CLIP.git
Please note that a required version of each package might vary depending on your local device.
NamedMask is trained with pseudo-labels from either an unsupervised saliency detector (e.g., SelfMask) or category experts which refines the predictions made by the saliency network. For this reason, we need to generate pseudo-labels before training NamedMask. You can skip this part if you only want to do inference with pre-trained weights provided below.
To compute pseudo-masks for images of the categories in Cityscapes, COCO2017, CoCA, or VOC2012, we provide for each benchmark a dictionary file (.json format) which maps a category to a list of 500 ImageNet2012 image paths which are retrieved by CLIP (with ViT-L/14@336px architecture). This file has the following structure:
{
"category_a": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
"category_b": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
...
}
You need to change {your_imagenet_dir}
before loading this file for the following steps (by default, it's set to /home/cs-shin1/datasets/ImageNet2012
).
Please download a dictionary file for a benchmark on which you want to evaluate and put it in the ImageNet2012
directory:
Then, open
selfmask.sh
in scripts
directory and change
DIR_ROOT={your_working_directory}
DIR_DATASET={your_ImageNet2012_directory}
CATEGORY_TO_P_IMAGES_FP={your_category_to_p_images_fp} # this should point to a json file you downloaded above
Run,
bash selfmask.sh
This will generate pseudo-masks for images retrieved by CLIP (with ViT-L/14@336px architecture) from the ImageNet2012 training set.
The pseudo-masks will be saved at {your_ImageNet2012_directory}/train_pseudo_masks_selfmask
.
If you want to skip this process, please download the pre-computed pseudo-masks and uncompress it in {your_ImageNet2012_directory}/train_pseudo_masks_selfmask
:
- pseudo-masks from SelfMask (~89 MB)
Optionally, if you want to refine pseudo-masks with a category expert (after finishing the above step), check out
expert_$DATASET_NAME_category.sh
file and configure DIR_ROOT
, CATEGORY_TO_P_IMAGES_FP
and CATEGORY_TO_P_IMAGES_FP
as appropriate. Then,
bash expert_$DATASET_NAME_category.sh
Currently, we only provide code for training experts of the VOC2012 categories.
The pseudo-masks will be saved at {your_ImageNet2012_directory}/train_pseudo_masks_experts
.
If you want to skip this process, please download the pre-computed pseudo-masks:
- Cityscapes pseudo-masks from category experts (~ 6.5 MB)
- CoCA pseudo-masks from category experts (~ 36 MB)
- COCO2017 pseudo-masks from category experts (~ 36 MB)
- VOC2012 pseudo-masks from category experts (~ 11 MB)
Please uncompress .zip
file in {your_ImageNet2012_directory}/train_pseudo_masks_experts
.
Once pseudo-masks are created (or downloaded and uncompressed), set a path to the directory that contains the pseudo-masks in a configuration file.
For example, to train a model with pseudo-masks from experts for the VOC2012 categories, open the
voc_val_n500_cp2_ex.yaml
file and change
category_to_p_images_fp: {your_category_to_p_images_fp} # this should point to a json file you downloaded above
dir_ckpt: {your_dir_ckpt} # this should point to a checkpoint directory
dir_train_dataset: {your_dir_train_dataset} # this should point to ImageNet2012 directory (as an index dataset)
dir_val_dataset: {your_dir_val_dataset} # this should point to a benchmark directory
arguments as appropriate.
Then, run
bash voc_val_n500_cp2_sr10100_ex.sh
It is worth noting that an evaluation will be made at every certain iterations during training and the final weights will be saved at your checkpoint directory.
To evaluate a model with pre-trained weights on a benchmark, e.g., VOC2012, please run (after customising the four arguments above)
bash voc_val_n500_cp2_sr10100_ex.sh $PATH_TO_WEIGHTS
We provide the pre-trained weights of NamedMask:
benchmark | split | IoU (%) | pixel accuracy (%) | link |
---|---|---|---|---|
Cityscapes (object) | val | 18.2 | 93.0 | weights (~102 MB) |
COCA | - | 27.4 | 82.0 | weights (~102 MB) |
COCO2017 | val | 27.7 | 76.4 | weights (~102 MB) |
ImageNet-S50 | test | 47.5 | - | weights (~102 MB) |
ImageNet-S300 | test | 33.1 | - | weights (~103 MB) |
ImageNet-S919 | test | 23.1 | - | weights (~103 MB) |
VOC2012 | val | 59.3 | 89.2 | weights (~102 MB) |
We additionally offer the pre-trained weights of the category experts for 20 classes in VOC2012:
category | link |
---|---|
aeroplane | weights (~102 MB) |
bicycle | weights (~102 MB) |
bird | weights (~102 MB) |
boat | weights (~102 MB) |
bottle | weights (~102 MB) |
bus | weights (~102 MB) |
car | weights (~102 MB) |
cat | weights (~102 MB) |
chair | weights (~102 MB) |
cow | weights (~102 MB) |
dining table | weights (~102 MB) |
dog | weights (~102 MB) |
horse | weights (~102 MB) |
motorbike | weights (~102 MB) |
person | weights (~102 MB) |
potted plant | weights (~102 MB) |
sheep | weights (~102 MB) |
sofa | weights (~102 MB) |
train | weights (~102 MB) |
tv/monitor | weights (~102 MB) |
@inproceedings{shin2023namedmask,
title = {NamedMask: Distilling Segmenters from Complementary Foundation Models},
author = {Shin, Gyungin and Xie, Weidi and Albanie, Samuel},
booktitle = {CVPRW},
year = {2023}
}
We borrowed the code for SelfMask and DeepLabv3+ from
If you have any questions about our code/implementation, please contact us at gyungin [at] robots [dot] ox [dot] ac [dot] uk.