FlowSep: Language-Queried Sound Separation with Rectified Flow Matching

This repository contains the official implementation of "FlowSep: Language-Queried Sound Separation with Rectified Flow Matching".

We introduce FlowSep, a novel generative model based on Rectified Flow Matching for language-queried sound seperation tasks. Specifically, FlowSep learns linear flow trajectories from Gaussian noise to the target source features within a pre-trained latent space. During inference, the mel-spectrogram can be reconstructed from the generated latent vector. FlowSep outperforms SoTA models across multiple benchmarks, check out the separated audio examples on the Demo Page!

Setup

Clone the repository and cd to the right path

git clone [email protected]:Audio-AGI/FlowSep.git && \
cd FlowSep

For the setup of the environment, please refer to AudioLDM and BigVGAN as our code are modified based on these systems.

Inference

First please download the checkpoint from here and place as model_logs/pretrained/v2_100k.ckpt

We provide a easy employed code for inference by simply running:

python3 lass_inference.py

Audio can be changed by in the mixed folder under metadata-master by simplying replacing the code

python3 lass_inference.py --text 'text_of_the_audio'  --audio 'path_to_the_audio'

You can find more demo audios in the mixed folder under metada-master, the name are the text query of each audio sample.

All the results will be saved under the lass_result folder, the mixed verson of the model is also saved under the mixed folder in the same path by default, you can change this setting by:

python3 lass_inference.py --no_mixed

Training

We provide a simple data structure on AudioCaps to train the FlowSep. Please first download the dataset from AudioCaps

Place the data under the metadata-master folder and make sure the name and path are matched with the JSON structure in the processed folder.

Download the pretained VAE checkpoint from here and place under the model_logs/checkpoint folder.

Train FlowSep from scratch, just simply run:

python3 train_latent_diffusion.py

We provide some configuations in 'train_latent_diffuision.py' to setup the wandb and cache paths for downloaded models.

Datasets and configs of the model can be modified from the yaml file under lass-config. You can fine-tune the FlowSep by setting the 'reload_from_ckpt' value in the config file.

To evaluate the model, setup the 'pretrained_ckpt' under config file and simply run:

python3 val_latent_diffusion.py

Evaluation(Coming soon)

Cite this work

If you found this tool useful, please consider citing

@article{yuan2024flowsep,
  title={FlowSep: Language-Queried Sound Separation with Rectified Flow Matching},
  author={Yuan, Yi and Liu, Xubo and Liu, Haohe and Plumbley, Mark D and Wang, Wenwu},
  journal={arXiv preprint arXiv:2409.07614},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
lass_config		lass_config
lass_result		lass_result
metadata-master		metadata-master
src		src
taming/modules/autoencoder/lpips		taming/modules/autoencoder/lpips
.DS_Store		.DS_Store
README.md		README.md
lass_inference.py		lass_inference.py
train_latent_diffusion.py		train_latent_diffusion.py
val_latent_diffusion.py		val_latent_diffusion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowSep: Language-Queried Sound Separation with Rectified Flow Matching

Setup

Inference

Training

Evaluation(Coming soon)

Cite this work

About

Releases

Packages

Languages

Audio-AGI/FlowSep

Folders and files

Latest commit

History

Repository files navigation

FlowSep: Language-Queried Sound Separation with Rectified Flow Matching

Setup

Inference

Training

Evaluation(Coming soon)

Cite this work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages