Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Feature Request: Audio mix and degrader #823

Closed
VictorBeraldo opened this issue Jul 23, 2020 · 6 comments
Closed

🚀 Feature Request: Audio mix and degrader #823

VictorBeraldo opened this issue Jul 23, 2020 · 6 comments

Comments

@VictorBeraldo
Copy link

🚀 Feature

Add mixing feature to torchaudio with an desirable SNR

Motivation

This feature wil be usefull for audio data augmentation by adding noise to the audio signal.

Alternatives

There are a lot of python packages that contains this features, but it will be very useful if we have it inside torchaudio

Additional context

@VictorBeraldo VictorBeraldo changed the title Audio mix and degrader 🚀 Feature Request: Audio mix and degrader Jul 23, 2020
@mthrok
Copy link
Collaborator

mthrok commented Jul 23, 2020

Hi @VictorBeraldo

I love this idea and I was actually thinking about a new augmentation (degradation) module here (although it turned miniaudio leaks memory, so we cannot do it right away).

Do you have an idea of interface or an inspiration from existing packages? It would help to discuss requirements, pros and cons.

@VictorBeraldo
Copy link
Author

VictorBeraldo commented Jul 24, 2020

Hello @mthrok!
I find two of them here: https://github.com/emilio-molina/audio_degrader and https://github.com/sleekEagle/audio_processing.
The first is more complete with a lot of complex features and the second uses what exactly what i'd like to have in torchaudio.

@mthrok
Copy link
Collaborator

mthrok commented Jul 27, 2020

Hi @VictorBeraldo

Thanks for the links. I skimmed through them and here is my thoughts

  • audio_degrader
    • sox
      torchaudio has direct binding to libsox where you can run most of sox effects directly on a Tensor or on a file so it's already covered.
    • ffmpeg
      It seems that MP3 codec is the unique functionality of this. (other things like speed change, filtering are covered). This is what I was thinking with doing with miniaudio.
    • rubberband
      I am not familiar with this library. Will take a look again later (but I have a feeling that libsox can do this too)
  • audio_processing
    • Both functions look valuable and I believe it should not be difficult to add to torchaudio. Though we need to settle on the interface (and module name/location).

I guess we can start from functions to add noises to Tensor object.
Some parameters for this includes SNR and maybe the type of noises.

What do you think?

@dataprowl
Copy link

Hello, @VictorBeraldo

The links that you've provided are great resources. I went through them and I found that I'm working on something very similar. So, I thought to give it my side of the effort to contribute to it.

Please do check out the PR for this request. It is a simple utility that is designed to degrade the audio Tensor with noise at a specific given SNR. I've also added a few separate functions that adds white and red noise to the audio which is provided as the input.

Happy to help and collaborate to improve it further! Feel free to suggest any changes, enhancements. I'm all ears!

@mthrok
Copy link
Collaborator

mthrok commented Mar 6, 2021

I have added a tutorial on how one can use torchaudio for augmentation.
https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html#adding-background-noise

I think the next step will be to define interface for noise mixture function.

@mthrok
Copy link
Collaborator

mthrok commented Jul 31, 2023

After about 2 years, we have a goo collection of augmentation utilities, so I will close this issue. For people looking for specific audio utility to be added to torchaudio, I encourage to open a new issue. Thanks,

@mthrok mthrok closed this as completed Jul 31, 2023
mpc001 pushed a commit to mpc001/audio that referenced this issue Aug 4, 2023
Summary: Since TensorPipe is going to be the default backend in 1.7, we
should change our examples to use this backend. Some examples were
failing since the default was TensorPipe but we were passing in
ProcessGroupBackendOptions.

Test Plan: Run the test examples.

Co-authored-by: pritam <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants