Yibin Wang*, Yuchao Feng, Jianwei Zheng†
(†corresponding author)
[Zhejiang University of Techonology]
BMVC 2024
We provide models for TERSE (CVPR 2019) [arXiv], PlaceNet (ECCV 2020) [arXiv], GracoNet(ECCV 2022) [arXiv], CA-GAN(ICME 2023, Oral) [paper] and our CSANet:
method | FID | LPIPS | url of model & logs | |
---|---|---|---|---|
0 | TERSE | 46.88 | 0 | baidu disk (code: zkk8) |
1 | PlaceNet | 37.01 | 0.161 | baidu disk (code: rap8) |
2 | GracoNet | 28.10 | 0.207 | baidu disk (code: cayr) |
3 | CA-GAN | 23.21 | 0.268 | baidu disk (code: 90yf) |
4 | CSANet | 20.88 | 0.274 | baidu disk (code: l0e6) |
Install Python 3.6 and PyTorch 1.9.1 (require CUDA >= 10.2):
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=10.2 -c pytorch
Download and extract OPA dataset from the official link: google drive. We expect the directory structure to be the following:
<PATH_TO_OPA>
background/ # background images
foreground/ # foreground images with masks
composite/ # composite images with masks
train_set.csv # train annotation
test_set.csv # test annotation
Then, make some preprocessing:
python tool/preprocess.py --data_root <PATH_TO_OPA>
You will see some new files and directories:
<PATH_TO_OPA>
com_pic_testpos299/ # test set positive composite images (resized to 299)
train_data.csv # transformed train annotation
train_data_pos.csv # train annotation for positive samples
test_data.csv # transformed test annotation
test_data_pos.csv # test annotation for positive samples
test_data_pos_unique.csv # test annotation for positive samples with different fg/bg pairs
To train CSANet on a single 3090 GPU with batch size 32 for 18 epochs, run:
python main.py --data_root <PATH_TO_OPA> --expid <YOUR_EXPERIMENT_NAME>
If you want to reproduce the baseline models, just replace main.py
with main_terse.py
/ main_placenet.py
/ main_graconet.py
/ main_CA-GAN.py
for training.
To see the change of losses dynamically, use TensorBoard:
tensorboard --logdir result/<YOUR_EXPERIMENT_NAME>/tblog --port <YOUR_SPECIFIED_PORT>
To predict composite images from a trained CSANet model, run:
python infer.py --data_root <PATH_TO_OPA> --expid <YOUR_EXPERIMENT_NAME> --epoch <EPOCH_TO_EVALUATE> --eval_type eval
python infer.py --data_root <PATH_TO_OPA> --expid <YOUR_EXPERIMENT_NAME> --epoch <EPOCH_TO_EVALUATE> --eval_type evaluni --repeat 10
If you want to infer the baseline models, just replace infer.py
with infer_terse.py
/ infer_placenet.py
/ infer_graconet.py
/ infer_CA-GAN.py
.
You could also directly make use of our provided models. For example, if you want to infer our best CSANet model, please 1) download CSANet.zip
given above, 2) place it under result
and uncompress it:
mv path/to/your/downloaded/CSANet.zip result/CSANet.zip
cd result
unzip CSANet.zip
cd ..
and 3) run:
python infer.py --data_root <PATH_TO_OPA> --expid CSANet --epoch 18 --eval_type eval
python infer.py --data_root <PATH_TO_OPA> --expid CSANet --epoch 18 --eval_type evaluni --repeat 10
The procedure of inferring our provided baseline models are similar. Remember to use --epoch 11
for TERSE, GracoNet, --epoch 9
for PlaceNet and --epoch 15
for CA-GAN.
To evaluate FID score, run:
sh script/eval_fid.sh <YOUR_EXPERIMENT_NAME> <EPOCH_TO_EVALUATE> <PATH_TO_OPA/com_pic_testpos299>
To evaluate LPIPS score, run:
sh script/eval_lpips.sh <YOUR_EXPERIMENT_NAME> <EPOCH_TO_EVALUATE>
Some of the evaluation codes in this repo are borrowed and modified from Faster-RCNN-VG, OPA, FID-Pytorch, GracoNet and Perceptual Similarity. Thank them for their great work.
If you find CSANet useful or relevant to your research, please kindly cite our paper:
@inproceedings{face-diffuser,
title={High-fidelity Person-centric Subject-to-Image Synthesis},
author={Wang, Yibin and Feng Yuchao, Zheng, Jianwei},
booktitle={BMVC},
pages={1--13},
year={2024}
}
If you have any technical comments or questions, please open a new issue or feel free to contact Yibin Wang.