This repository provides the official PyTorch implementation for the following paper:
StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3
Haonan Qiu, Yuming Jiang, Hang Zhou, Wayne Wu, and Ziwei Liu
Arxiv, 2022.
From MMLab@NTU affiliated with S-Lab, Nanyang Technological University and SenseTime Research.
[Project Page] | [Paper] | [Demo Video]
- [07/2022] Paper and demo video are released.
- [07/2022] Code is released.
Clone this repo:
git clone https://github.com/arthur-qiu/StyleFaceV.git
cd StyleFaceV
Dependencies:
All dependencies for defining the environment are provided in environment/stylefacev.yaml
.
We recommend using Anaconda to manage the python environment:
conda env create -f ./environment/stylefacev.yaml
conda activate stylefacev
Image Data: Unaligned FFHQ
Video Data: RAVDESS
Download the processed video data via this Google Drive or process the data via this repo
Put all the data at the path ../data
.
Transform the video data into .png
form:
python scripts/vid2img.py
Pretrained models can be downloaded from this Google Drive. Unzip the file and put them under the dataset folder with the following structure:
pretrained_models
├── network-snapshot-005000.pkl # styleGAN3 checkpoint finetuned on both RAVDNESS and unaligned FFHQ.
├── wing.ckpt # Face Alignment model from https://github.com/protossw512/AdaptiveWingLoss.
├── motion_net.pth # trained motion sampler.
├── pre_net.pth
└── pre_pose_net.pth
checkpoints/stylefacev
├── latest_net_FE.pth # appearance extractor + recompostion
├── latest_net_FE_lm.pth # first half of pose extractor
└── latest_net_FE_pose.pth # second half of pose extractor
python test.py --dataroot ../data/actor_align_512_png --name stylefacev \
--network_pkl=pretrained_models/network-snapshot-005000.pkl --model sample \
--model_names FE,FE_pose,FE_lm --rnn_path pretrained_models/motion_net.pth \
--n_frames_G 60 --num_test=64 --results_dir './sample_results/'
If you want to use new datasets, please finetune the StyleGAN3 model first.
This stage is purely trained on image data and will help the convergence.
python train.py --dataroot ../data/actor_align_512_png --name stylepose \
--network_pkl=pretrained_models/network-snapshot-005000.pkl \
--model stylevpose --n_epochs 5 --n_epochs_decay 5
python train.py --dataroot ../data/actor_align_512_png --name stylefacev_pre \
--network_pkl=pretrained_models/network-snapshot-005000.pkl \
--model stylepre --pose_path checkpoints/stylevpose/latest_net_FE.pth
You can also use pre_net.pth
and pre_pose_net.pth
from the folder of pretrained_models
.
python train.py --dataroot ../data/actor_align_512_png --name stylefacev_pre \
--network_pkl=pretrained_models/network-snapshot-005000.pkl --model stylepre \
--pre_path pretrained_models/pre_net.pth --pose_path pretrained_models/pre_pose_net.pth
python train.py --dataroot ../data/actor_align_512_png --name stylefacev \
--network_pkl=pretrained_models/network-snapshot-005000.pkl --model stylefacevadv \
--pose_path pretrained_models/pre_pose_net.pth \
--pre_path checkpoints/stylefacev_pre/latest_net_FE.pth \
--n_epochs 50 --n_epochs_decay 50 --lr 0.0002
python train.py --dataroot ../data/actor_align_512_png --name motion \
--network_pkl=pretrained_models/network-snapshot-005000.pkl --model stylernn \
--pre_path checkpoints/stylefacev/latest_net_FE.pth \
--pose_path checkpoints/stylefacev/latest_net_FE_pose.pth \
--lm_path checkpoints/stylefacev/latest_net_FE_lm.pth \
--n_frames_G 30
If you do not have a 32G GPU, reduce the n_frames_G
(12 for 16G). Or only add supervision on pose representations:
python train.py --dataroot ../data/actor_align_512_png --name motion \
--network_pkl=pretrained_models/network-snapshot-005000.pkl --model stylernns \
--pose_path checkpoints/stylefacev/latest_net_FE_pose.pth \
--lm_path checkpoints/stylefacev/latest_net_FE_lm.pth \
--n_frames_G 30
If you find this work useful for your research, please consider citing our paper:
@misc{https://doi.org/10.48550/arxiv.2208.07862,
doi = {10.48550/ARXIV.2208.07862},
url = {https://arxiv.org/abs/2208.07862},
author = {Qiu, Haonan and Jiang, Yuming and Zhou, Hang and Wu, Wayne and Liu, Ziwei},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}