Official NeRFFaceSpeech Code

NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior, CVPR 2024 Workshop on AI for Content Creation (AI4CC)

Setting

We have confirmed that the code runs under the following conditions.

Python 3.7.16 // CUDA 11.7 //GPU 3090

git clone https://github.com/rlgnswk/NeRFFaceSpeech_Code.git
cd NeRFFaceSpeech_Code/
conda env create -f environment.yml
conda activate nerffacespeech

Please install Nvdiffrast inside the Deep3DFaceRecon_pytorch folder.

cd Deep3DFaceRecon_pytorch
git clone https://github.com/NVlabs/nvdiffrast
cd nvdiffrast
pip install .

Download

Download Link

mkdir pretrained_networks

Download SadTalker_V0.0.2_256.safetensors https://github.com/OpenTalker/SadTalker/releases to NeRFFaceSpeech_Code\pretrained_networks\sad_talker_pretrained

Download https://huggingface.co/wsj1995/sadTalker/blob/af80749f8c9af3702fbd0272df14ff086986a1de/BFM09_model_info.mat to NeRFFaceSpeech_Code\pretrained_networks\BFM_for_3DMM-Fitting-Pytorch\BFM

@Thanks nitinmukesh's reports

Place Pretrained Weights at pretrained_networks/

Command (Generated from Latent Space)

python StyleNeRF/main_NeRFFaceSpeech_audio_driven_from_z.py   \
    --outdir=out_test_z --trunc=0.7 \
        --network=pretrained_networks/ffhq_1024.pkl \
            --test_data="test_data/test_audio/AdamSchiff_0.wav" \
                --seeds=6;

Command (Generated from Real Image)

The inversion process for real image takes some time.

python StyleNeRF/main_NeRFFaceSpeech_audio_driven_from_image.py   \
    --outdir=out_test_real --trunc=0.7 \
        --network=pretrained_networks/ffhq_1024.pkl \
            --test_data="test_data/test_audio/AdamSchiff_0.wav" \
                --test_img="test_data/test_img/32.png";

Command (Pose Varying)

The first command is for head pose varying only.

The second command is for head pose and exp varing by video-frames (at that time, audio input is only for the initial frame.)

The video frames should be pose-predictable.

python StyleNeRF/main_NeRFFaceSpeech_audio_driven_w_given_poses.py   \
    --outdir=out_test_given_pose --trunc=0.7 \
        --network=pretrained_networks/ffhq_1024.pkl \
            --test_data="test_data/test_audio/AdamSchiff_0.wav" \
                --test_img="test_data/test_img/AustinScott0_0_cropped.jpg"\
                    --motion_guide_img_folder="driving_frames";     


python StyleNeRF/main_NeRFFaceSpeech_video_driven.py   \
    --outdir=out_test_video_driven --trunc=0.7 \
        --network=pretrained_networks/ffhq_1024.pkl \
            --test_data="test_data/test_audio/AdamSchiff_0.wav" \
                --test_img="test_data/test_img/DougJones_0_cropped.jpg"\
                    --motion_guide_img_folder="driving_frames";

Custom Data for Use

If you want to use new audio and image data, you must follow the formats of StyleNeRF for image data and Wav2Lip or SadTalker for audio data.

Post-processing @ nitinmukesh

There is an applicable post-processing method called GFPGAN. It is being applied to other methods as well and can help produce better results. Please refer to the issue!

Caution: Error Accumulation

The proposed method may not work well due to accumulated errors such as landmark prediction errors and inversion(reconsturction) errors.

Ethical Use

This project is intended for research and educational purposes only. Misuse of technology for deceptive practices is strictly discouraged

Acknowledgement

We appreciate StyleNeRF, PTI, Wav2Lip, SadTalker, Deep3Drecon and 3DMM-Fitting for sharing their codes and baselines.

Citation

@misc{kim2024nerffacespeech,
    title={NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior}, 
    author={Gihoon Kim and Kwanggyoon Seo and Sihun Cha and Junyong Noh},
    year={2024},
    eprint={2405.05749},
    archivePrefix={arXiv},
    primaryClass={cs.CV}}
            
@misc{kim2024nerffacespeech,
    title={NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior},
    author={Gihoon Kim, Kwanggyoon Seo, Sihun Cha and Junyong Noh},
    booktitle={IEEE Computer Vision and Pattern Recognition Workshops},
    year={2024}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official NeRFFaceSpeech Code

NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior, CVPR 2024 Workshop on AI for Content Creation (AI4CC)

Setting

Please install Nvdiffrast inside the Deep3DFaceRecon_pytorch folder.

Download

Place Pretrained Weights at pretrained_networks/

Command (Generated from Latent Space)

Command (Generated from Real Image)

Command (Pose Varying)

Custom Data for Use

If you want to use new audio and image data, you must follow the formats of StyleNeRF for image data and Wav2Lip or SadTalker for audio data.

Post-processing @ nitinmukesh

Caution: Error Accumulation

Ethical Use

Acknowledgement

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
3DMM-Fitting-Pytorch		3DMM-Fitting-Pytorch
Deep3DFaceRecon_pytorch		Deep3DFaceRecon_pytorch
PTI		PTI
SadTalker/src		SadTalker/src
StyleNeRF		StyleNeRF
driving_frames		driving_frames
test_data		test_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

License

rlgnswk/NeRFFaceSpeech_Code

Folders and files

Latest commit

History

Repository files navigation

Official NeRFFaceSpeech Code

NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior, CVPR 2024 Workshop on AI for Content Creation (AI4CC)

Setting

Please install Nvdiffrast inside the Deep3DFaceRecon_pytorch folder.

Download

Place Pretrained Weights at pretrained_networks/

Command (Generated from Latent Space)

Command (Generated from Real Image)

Command (Pose Varying)

Custom Data for Use

If you want to use new audio and image data, you must follow the formats of StyleNeRF for image data and Wav2Lip or SadTalker for audio data.

Post-processing @ nitinmukesh

Caution: Error Accumulation

Ethical Use

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages