Pytorch implementation of the unsupervised object discovery method LOST. More details can be found in the paper:
Localizing Objects with Self-Supervised Transformers and no Labels, BMVC 2021 [arXiv]
by Oriane Siméoni, Gilles Puy, Huy V. Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Renaud Marlet and Jean Ponce
If you use the LOST code or framework in your research, please consider citing:
@inproceedings{LOST,
title = {Localizing Objects with Self-Supervised Transformers and no Labels},
author = {Oriane Sim\'eoni and Gilles Puy and Huy V. Vo and Simon Roburin and Spyros Gidaris and Andrei Bursuc and Patrick P\'erez and Renaud Marlet and Jean Ponce},
journal = {Proceedings of the British Machine Vision Conference (BMVC)},
month = {November},
year = {2021}
}
This code was implemented with python 3.7, PyTorch 1.7.1 and CUDA 10.2. Please install PyTorch. In order to install the additionnal dependencies, please launch the following command:
pip install -r requirements.txt
This method is based on DINO paper. The framework can be installed using the following commands:
git clone https://github.com/facebookresearch/dino.git
cd dino;
touch __init__.py
echo -e "import sys\nfrom os.path import dirname, join\nsys.path.insert(0, join(dirname(__file__), '.'))" >> __init__.py; cd ../;
The code was made using the commit ba9edd1 of DINO repo (please rebase if breakage).
Following are scripts to apply LOST to an image defined via the image_path
parameter and visualize the predictions (pred
), the maps of the Figure 2 in the paper (fms
) and the visulization of the seed expansion (seed_expansion
). Box predictions are also stored in the output directory given by parameter output_dir
.
python main_lost.py --image_path examples/VOC07_000236.jpg --visualize pred
python main_lost.py --image_path examples/VOC07_000236.jpg --visualize fms
python main_lost.py --image_path examples/VOC07_000236.jpg --visualize seed_expansion
Following are the different steps to reproduce the results of LOST presented in the paper.
Please download the PASCAL VOC07 and PASCAL VOC12 datasets (link) and put the data in the folder datasets
. There should be the two subfolders: datasets/VOC2007
and datasets/VOC2012
. In order to apply lost and compute corloc results (VOC07 61.9, VOC12 64.0), please launch:
python main_lost.py --dataset VOC07 --set trainval
python main_lost.py --dataset VOC12 --set trainval
Please download the COCO dataset and put the data in datasets/COCO
. Results are provided given the 2014 annotations following previous works. The following command line allows you to get results on the subset of 20k images of the COCO dataset (corloc 50.7), following previous litterature. To be noted that the 20k images are a subset of the train
set.
python main_lost.py --dataset COCO20k --set train
We have tested the method on different setups of the VIT model, corloc results are presented in the following table (more can be found in the paper).
arch | pre-training | dataset | ||
---|---|---|---|---|
VOC07 | VOC12 | COCO20k | ||
ViT-S/16 | DINO | 61.9 | 64.0 | 50.7 |
ViT-S/8 | DINO | 55.5 | 57.0 | 49.5 |
ViT-B/16 | DINO | 60.1 | 63.3 | 50.0 |
ResNet50 | DINO | 36.8 | 42.7 | 26.5 |
ResNet50 | Imagenet | 33.5 | 39.1 | 25.5 |
Previous results on the dataset VOC07
can be obtained by launching:
python main_lost.py --dataset VOC07 --set trainval #VIT-S/16
python main_lost.py --dataset VOC07 --set trainval --patch_size 8 #VIT-S/8
python main_lost.py --dataset VOC07 --set trainval --arch vit_base #VIT-B/16
python main_lost.py --dataset VOC07 --set trainval --arch resnet50 #Resnet50/DINO
python main_lost.py --dataset VOC07 --set trainval --arch resnet50_imagenet #Resnet50/imagenet
In this work, we additionally use LOST predictions to train object detection models without any human supervision. We explore two scenarios: class-agnostic (CAD) and (pseudo) class-aware training of object detectors (OD). The next section present the different steps to reproduce our results.
We use the detectron2 framework to train a Faster R-CNN model with LOST predictions as pseudo-gt. The code was developped with the version v0.5 of the framework. In order to reproduce our results, please install detectron2 using the next commands. In case of failure, you can find the installation corresponding to your version of pytorch/CUDA here.
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install detectron2==0.5
Set global variables for ease of usage.
export LOST=$(pwd)
cd detectron2; export D2=$(pwd);
Then please copy LOST-specific files to detectron2 framework, following:
ln -s $LOST/tools/*.py $D2/tools/. # Move LOST tools to D2
mkdir $D2/configs/LOST
ln -s $LOST/tools/configs/* $D2/configs/LOST/. # Move LOST configs to D2
Before launching a training, data must be formated to fit detectron2 and COCO styles. Following are the command lines to do this formatting for boxes predicted with LOST.
cd $D2;
# Format DINO weights to fit detectron2
wget https://dl.fbaipublicfiles.com/dino/dino_resnet50_pretrain/dino_resnet50_pretrain.pth -P ./data # Download the model from DINO
python tools/convert_pretrained_to_detectron_format.py --input ./data/dino_resnet50_pretrain.pth --output ./data/dino_RN50_pretrain_d2_format.pkl
# Format pseudo-boxes data to fit detectron2
python tools/prepare_voc_LOST_CAD_pseudo_boxes_in_detectron2_format.py --year 2007 --pboxes $LOST/data/LOST_predictions/LOST_VOC07.pkl
# Format VOC data to fit COCO style
python tools/prepare_voc_data_in_coco_style.py --is_CAD --voc07_dir $LOST/datasets/VOC2007 --voc12_dir $LOST/datasets/VOC2012
The next command line allows you to launch a CAD training with 4 gpus on the VOC2007 dataset. The batch size is set to 16, 4 to 8 GPUs may be needed depending on your machines. Please make sure to change the argument value MODEL.WEIGHTS
to the correct path of DINO weights.
python tools/train_net_for_LOST_CAD.py --num-gpus 4 --config-file ./configs/LOST/RN50_DINO_FRCNN_VOC07_CAD.yaml DATALOADER.NUM_WORKERS 8 OUTPUT_DIR ./outputs/RN50_DINO_FRCNN_VOC07_CAD MODEL.WEIGHTS ./data/dino_RN50_pretrain_d2_format.pkl
Inference results of the model will be stored in $OUTPUT_DIR/inference
. In order to produce results on the train+val
dataset, please use the following command:
python tools/train_net_for_LOST_CAD.py --resume --eval-only --num-gpus 4 --config-file ./configs/LOST/RN50_DINO_FRCNN_VOC07_CAD.yaml DATALOADER.NUM_WORKERS 6 MODEL.WEIGHTS ./outputs/RN50_DINO_FRCNN_VOC07_CAD/model_final.pth OUTPUT_DIR ./outputs/RN50_DINO_FRCNN_VOC07_CAD/ DATASETS.TEST '("voc_2007_trainval_CAD_coco_style", )'
cd $LOST;
python main_corloc_evaluation.py --dataset VOC07 --set trainval --type_pred detectron --pred_file $D2/outputs/RN50_DINO_FRCNN_VOC07_CAD/inference/coco_instances_results.json
Following are the command lines allowing to train a detector in a class-agnostic fashion on the COCO20k subset of COCO dataset.
cd $D2;
# Format pseudo-boxes data to fit detectron2
python tools/prepare_coco_LOST_CAD_pseudo_boxes_in_detectron2_format.py --pboxes $LOST/outputs/COCO20k_train/LOST-vit_small16_k/preds.pkl
# Generate COCO20k CAD gt annotations
python tools/prepare_coco_CAD_gt.py --coco_dir $LOST/datasets/COCO
# Train detector (evaluation done on COCO20k CAD training set)
python tools/train_net_for_LOST_CAD.py --num-gpus 4 --config-file ./configs/LOST/RN50_DINO_FRCNN_COCO20k_CAD.yaml DATALOADER.NUM_WORKERS 8 OUTPUT_DIR ./outputs/RN50_DINO_FRCNN_COCO20k_CAD MODEL.WEIGHTS ./data/dino_RN50_pretrain_d2_format.pkl
# Corloc evaluation
python main_corloc_evaluation.py --dataset COCO20k --type_pred detectron --pred_file $D2/outputs/RN50_DINO_FRCNN_COCO20k_CAD/inference/coco_instances_results.json
We have provided predictions of a class-agnostic Faster R-CNN model trained using LOST boxes as pseudo-gt; they are stored in the folder data/CAD_predictions
. In order to launch the corloc evaluation, please launch the following scripts. It is to be noted that in this evaluation, only the box with the highest confidence score is considered per image.
python main_corloc_evaluation.py --dataset VOC07 --set trainval --type_pred detectron --pred_file data/CAD_predictions/LOST_plus_CAD_VOC07.json
python main_corloc_evaluation.py --dataset VOC12 --set trainval --type_pred detectron --pred_file data/CAD_predictions/LOST_plus_CAD_VOC12.json
python main_corloc_evaluation.py --dataset COCO20k --set train --type_pred detectron --pred_file data/CAD_predictions/LOST_plus_CAD_COCO20k.json
The following table presents the obtained corloc results.
method | dataset | ||
---|---|---|---|
VOC07 | VOC12 | COCO20k | |
LOST | 61.9 | 64.0 | 50.7 |
LOST+CAD | 65.7 | 70.4 | 57.5 |
Following are the different steps to train a class-aware detector using LOST peusdo-boxes for the dataset VOC07. We provide LOST boxes correspoding to the dataset VOC07 in $LOST/data/LOST_predictions/LOST_VOC07.pkl
.
cd $LOST;
# Cluster features of LOST boxes
python cluster_for_OD.py --pred_file $LOST/data/LOST_predictions/LOST_VOC07.pkl --nb_clusters 20 --dataset VOC07 --set trainval
cd $D2;
# Format DINO weights to fit detectron2
wget https://dl.fbaipublicfiles.com/dino/dino_resnet50_pretrain/dino_resnet50_pretrain.pth -P ./data # Download the model from DINO
python tools/convert_pretrained_to_detectron_format.py --input ./data/dino_resnet50_pretrain.pth --output ./data/dino_RN50_pretrain_d2_format.pkl
# Prepare the clustered LOST pseudo-box data for training
python tools/prepare_voc_LOST_OD_pseudo_boxes_in_detectron2_format.py --year 2007 --pboxes $LOST/data/LOST_predictions/LOST_VOC07_clustered_20clu.pkl
# Format VOC data to fit COCO style
python tools/prepare_voc_data_in_coco_style.py --voc07_dir $LOST/datasets/VOC2007 --voc12_dir $LOST/datasets/VOC2012
# Train the detector on VOC2007 trainval set -- please be aware that no hungarian matching is used during training, so validation restuls are not meaningful (will be close to 0). Please use command bellow in order to evaluate results correctly.
python tools/train_net_for_LOST_OD.py --num-gpus 8 --config-file ./configs/LOST/RN50_DINO_FRCNN_VOC07_OD.yaml DATALOADER.NUM_WORKERS 8 OUTPUT_DIR ./outputs/RN50_DINO_FRCNN_VOC07_OD MODEL.WEIGHTS ./data/dino_RN50_pretrain_d2_format.pkl
# Evaluate the detector results using hungarian matching -- allows to reproduce results from the paper
cd $LOST;
python tools/evaluate_unsupervised_detection_voc.py --results ./detectron2/outputs/RN50_DINO_FRCNN_VOC07_OD/inference/coco_instances_results.json
We use the R50-C4
model of Detectron2 with ResNet50 pre-trained with DINO self-supervision model.
Details:
- mini-batches of size 16 across 8 GPUs using SyncBatchNorm
- extra BatchNorm layer for the RoI head after conv5, i.e.,
Res5ROIHeadsExtraNorm
layer in Detectron2 - frozen first two convolutional blocks of ResNet-50, i.e.,
conv1
andconv2
in Detectron2. - learning rate is first warmed-up for 100 steps to 0.02 and then reduced by a factor of 10 after 18K and 22K training steps
- we use in total 24K training steps for all the experiments, except when training class-agnostic detectors on the pseudo-boxes of the VOC07 trainval set, in which case we use 10K steps.
LOST is released under the Apache 2.0 license.