Simple Sounding Video Generation

This is the official implementation of the paper "A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation."

Setup

Run the command below from the project root directory.

pip install -r requirements.txt

pip install -e .

Training

Here, we provide an exmaple of training with GreatestHits dataset.

Dataset preparation

Download a dataset from the official webpage and run ./src/va_core/dataset/greatesthits/script_create_segmented_mp4.sh to obtain preprocessed video clips. Or you can download the preprocessed dataset from this link.

In the default setting, the mp4 files are assumed to be stored in ./src/va_core/dataset/greatesthits/vis-data-256-segment-8fps-crop. If you use another directory, you need to modify flist_*.txt in src/va_core/dataset/greatesthits accordingly. An each line of the text file should describe a path to mp4 file and the corresponding caption separated by a comma as shown below:

./src/va_core/dataset/greatesthits/vis-data-256-segment-8fps-crop/2015-09-23-16-13-51-97_denoised_thumb-003.mp4,"A person is hitting a drumstick on a drum that is surrounded by leaves and foliage, creating a rhythmic sound in the midst of a natural environment."

Run the training script

./sh_scripts/script_train_greatesthits.sh

Sounding video generation with the trained model

You can download the trained model from this link.

Then, run the following script after setting $MODEL_DIR in the script properly:

./sh_scripts/script_gen_greatesthits.sh

Evaluate the generated sounding videos

Setup

To compute LanguageBind scores, you need to clone its official repo as shown below:

cd third_party
git clone https://github.com/PKU-YuanGroup/LanguageBind

To compute FAD, you need to clone audioldm_eval as shown below:

cd third_party
git clone https://github.com/haoheliu/audioldm_eval

Evaluation

./sh_scripts/script_eval.sh

Credits

This repository is built on the following open-source projects:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
py_scripts		py_scripts
sh_scripts		sh_scripts
src/va_core		src/va_core
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Sounding Video Generation

Setup

Training

Dataset preparation

Run the training script

Sounding video generation with the trained model

Evaluate the generated sounding videos

Setup

Evaluation

Credits

About

Releases

Packages

Languages

SonyResearch/SVG_baseline

Folders and files

Latest commit

History

Repository files navigation

Simple Sounding Video Generation

Setup

Training

Dataset preparation

Run the training script

Sounding video generation with the trained model

Evaluate the generated sounding videos

Setup

Evaluation

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages