This repository contains the official implementation of "FlowSep: Language-Queried Sound Separation with Rectified Flow Matching".
We introduce FlowSep, a novel generative model based on Rectified Flow Matching for language-queried sound seperation tasks. Specifically, FlowSep learns linear flow trajectories from Gaussian noise to the target source features within a pre-trained latent space. During inference, the mel-spectrogram can be reconstructed from the generated latent vector. FlowSep outperforms SoTA models across multiple benchmarks, check out the separated audio examples on the Demo Page!
Clone the repository and cd to the right path
git clone [email protected]:Audio-AGI/FlowSep.git && \
cd FlowSep
For the setup of the environment, please refer to AudioLDM and BigVGAN as our code are modified based on these systems.
First please download the checkpoint from here and place as model_logs/pretrained/v2_100k.ckpt
We provide a easy employed code for inference by simply running:
python3 lass_inference.py
Audio can be changed by in the mixed folder under metadata-master by simplying replacing the code
python3 lass_inference.py --text 'text_of_the_audio' --audio 'path_to_the_audio'
You can find more demo audios in the mixed folder under metada-master, the name are the text query of each audio sample.
All the results will be saved under the lass_result folder, the mixed verson of the model is also saved under the mixed folder in the same path by default, you can change this setting by:
python3 lass_inference.py --no_mixed
We provide a simple data structure on AudioCaps to train the FlowSep. Please first download the dataset from AudioCaps
Place the data under the metadata-master folder and make sure the name and path are matched with the JSON structure in the processed folder.
Download the pretained VAE checkpoint from here and place under the model_logs/checkpoint folder.
Train FlowSep from scratch, just simply run:
python3 train_latent_diffusion.py
We provide some configuations in 'train_latent_diffuision.py' to setup the wandb and cache paths for downloaded models.
Datasets and configs of the model can be modified from the yaml file under lass-config. You can fine-tune the FlowSep by setting the 'reload_from_ckpt' value in the config file.
To evaluate the model, setup the 'pretrained_ckpt' under config file and simply run:
python3 val_latent_diffusion.py
If you found this tool useful, please consider citing
@article{yuan2024flowsep,
title={FlowSep: Language-Queried Sound Separation with Rectified Flow Matching},
author={Yuan, Yi and Liu, Xubo and Liu, Haohe and Plumbley, Mark D and Wang, Wenwu},
journal={arXiv preprint arXiv:2409.07614},
year={2024}
}