This repository is the official implementation of our ICLR 2022 paper "Learning Transferable Reward for Query Object Localization with Policy Adaptation".
- Python3
- Pytorch >= 1.5
- Weights & Biases
- sklearn
- Add RoI pool/align layer
- ImageNet pretrained vgg16 model
To generate corrupted MNIST data with variant background (4 background supported: clean, clutter, patch, gaussian_noise, impulse_noise). E.g., to generate patched MNIST, run the following code:
python datasets/generate_data.py --bg_name patch
- Pretrain RoI encoder and projection head. Below is an example of training on digit 3.
CUDA_VISIBLE_DEVICES=0 python pretrain_encoder_ordinal.py \
--savename pretrain_mnist_ae_randpatch \
--digit 3 --bg_name patch --samples_per_class 10 \
--sample_size 50 --batch_size 50
- Train localization agent.
CUDA_VISIBLE_DEVICES=0 python train_agent.py \
--savename agent_mnist_ae_randpatch \
--pretrained pretrain_mnist_ae_randpatch/best.pth.tar \
--img_size 84 --bg_name patch --sample_size 50 \
--digit 3 --num_act 10 --batch_size 50 --steps_ag 200
- Test-time adaptation.
CUDA_VISIBLE_DEVICES=0 python adapt_agent.py \
--savename adapt_mnist_randpatch2clutter \
--pretrained_agent agent_mnist_ae_randpatch/best.pth.tar \
--pretrained pretrain_mnist_ae_randpatch/last.pth.tar \
--bg_name clutter --digit 2 --num_act 10 --batch_size 512
Below is an example of generate gull_59_64.json
. Other files are located at datasets/cub_files
.
python datasets/generate_cub_filelist.py
- pretrain RoI encoder and projection head on warbler
CUDA_VISIBLE_DEVICES=0 python pretrain_encoder_ordinal.py \
--savename pretrain_vgg_cub_warb15 \
--dataset cub --backbone vgg16 --bg_name warbler \
--dim 1024 --batch_size 50 --img_size 224 --lamb 1.0
- train agent
CUDA_VISIBLE_DEVICES=0 python train_agent.py \
--savename agent_vgg_cub_warb15 \
--pretrained pretrain_vgg_cub_warb15/last.pth.tar \
--dataset cub --dim 1024 --dim_ag 512 \
--bg_name warbler --backbone vgg16 --img_size 224 \
--min_box_side 40 --batch_size 50
- adapt agent from warbler to wren
CUDA_VISIBLE_DEVICES=0 python adapt_agent.py \
--savename adapt_cub_warbler2wren \
--pretrained_agent agent_vgg_cub_warb15/best.pth.tar \
--pretrained pretrain_vgg_cub_warb15/last.pth.tar \
--bg_name wren --dataset cub --backbone vgg16 \
--dim 1024 --dim_ag 512 --img_size 224 \
--min_box_side 40 --batch_size 64
- pretrain RoI encoder and projection head on dog
CUDA_VISIBLE_DEVICES=0 python pretrain_encoder_ordinal.py \
--savename pretrain_vgg_coco_dog \
--dataset coco --sel_cls dog --backbone vgg16 \
--dim 1024 --batch_size 75 --img_size 224 --lamb 1.0
- train agent
CUDA_VISIBLE_DEVICES=0 python train_agent.py \
--savename agent_vgg_coco_dog \
--pretrained pretrain_vgg_coco_dog/last.pth.tar \
--dataset coco --backbone vgg16 \
--dim 1024 --dim_ag 512 --sel_cls dog \
--img_size 224 --min_box_side 40 --batch_size 50
- adapt agent from dog to cat
CUDA_VISIBLE_DEVICES=0 python adapt_agent.py \
--savename adapt_coco_dog2cat \
--pretrained_agent agent_vgg_coco_dog/best.pth.tar \
--pretrained pretrain_vgg_coco_dog/last.pth.tar \
--dataset coco --backbone vgg16 --sel_cls cat \
--dim 1024 --dim_ag 512 --img_size 224 \
--min_box_side 40 --batch_size 64
@inproceedings{
li2022learning,
title={Learning Transferable Reward for Query Object Localization with Policy Adaptation},
author={Tingfeng Li and Shaobo Han and Martin Renqiang Min and Dimitris N. Metaxas},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=92tYQiil17}
}