Official Pytorch Implementation of DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction
🎧 Listen to examples on the Demopage
🔥 Updates: SoloAudio is now available! This advanced diffusion-transformer-based model extracts target sounds from free-text input.
- Download checkpoints and dataset from this 🤗 link
- Prepare environment: requirement.txt
# Training
python src/train_ddim_cls.py --data-path 'data/fsd2018/' --autoencoder-path 'ckpts/first_stage.pt' --autoencoder-config 'ckpts/vae.yaml' --diffusion-config 'src/config/DiffTSE_cls_v_b_1000.yaml'
# Inference
python src/tse.py --device 'cuda' --mixture 'example.wav' --target_sound 'Applause' --autoencoder-path 'ckpts/first_stage.pt' --autoencoder-config 'ckpts/vae.yaml' --diffusion-config 'src/config/DiffTSE_cls_v_b_1000.yaml' --diffusion-ckpt 'ckpts/base_v_1000.pt'
If you find the code useful for your research, please consider citing:
@inproceedings{hai2024dpm,
title={DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction},
author={Hai, Jiarui and Wang, Helin and Yang, Dongchao and Thakkar, Karan and Dehak, Najim and Elhilali, Mounya},
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1196--1200},
year={2024},
organization={IEEE}
}
We borrow code from following repos: