Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to detect the noise, breaks and multi-person speak in a audio? #96

Open
TomSuen opened this issue Dec 27, 2024 · 10 comments
Open

How to detect the noise, breaks and multi-person speak in a audio? #96

TomSuen opened this issue Dec 27, 2024 · 10 comments

Comments

@TomSuen
Copy link

TomSuen commented Dec 27, 2024

Hello, I am not in the audio field. I would like to ask, for a reference audio, I have removed BGM and reverberation to a certain extent, but the effect of inputting it into the sound cloning is still not good. Is there any better way to detect whether there is noise, distortion, and multiple people speaking in the reference audio?

@youngercloud
Copy link

youngercloud commented Dec 27, 2024

Just drop a note here. Possibly, you might be interested in this project.

@GUUser91
Copy link

GUUser91 commented Dec 27, 2024

I use https://github.com/resemble-ai/resemble-enhance and https://github.com/Rikorose/DeepFilterNet for audio denoising (resemble-enhance can also remove background music), moises.ai pro plan to remove sound effects and background music and the audacity plugin, Acon Digital DeVerberate 3 to reduce reverb.
If you don't want to pay for the moises.ai pro plan, you can use the BandIt Plus model in https://github.com/ZFTurbo/Music-Source-Separation-Training/
Here's a sample for the BandIt Plus Model.
Input audio.
https://vocaroo.com/1kW5ouqlmNEJ
https://vocaroo.com/1jdwZKVDNTpd
Output audio
https://vocaroo.com/1e4Hdj50P1py
https://vocaroo.com/19dlBzlnb5sv

Sample for the moises ai pro plan
Input audio
https://vocaroo.com/1ipKWeiyr3o6
https://vocaroo.com/1gpMlZZXepYx
https://vocaroo.com/1m5ruL0qGGZR
https://vocaroo.com/1l04xg3Cp9t7
Output audio
https://vocaroo.com/1hkFYE87glzr
https://vocaroo.com/16as9N3eRtvV
https://vocaroo.com/1a13ldM1aTmT
https://vocaroo.com/15AuBf7DS0lv

@AlonDan
Copy link

AlonDan commented Dec 28, 2024

Dear @youngercloud and @GUUser91 will you please kind and please recommend:

  • "Speaker Separation" with simple GUI similar to like UVR (Ultimate Vocal Remover) or Gradio that is easy to install on windows?

  • Same for Audio Enhancement

With UVR I can separate Music and Audio, even Reverb / Echo removal...
but not separation of multiple Speakers.
Sadly I don't think it's currently including a Gradio app GUI for it (as simple user) that's why I asked for alternative solution.

Thanks ahead! 🙏

@GUUser91
Copy link

@AlonDan
Resemble enhance and deepfilternet have gradio spaces
https://huggingface.co/spaces/ResembleAI/resemble-enhance
https://huggingface.co/spaces/hshr/DeepFilterNet2

As for Speaker Separation, the only application I know that has this feature is SpectraLayers 11, but that cost money and it can't separate voices talking over each other.

Also there's a UVR fork that has a gradio demo.
https://github.com/Eddycrack864/UVR5-UI

@AlonDan
Copy link

AlonDan commented Dec 28, 2024

Thanks for the detailed reply, 🙏

Sorry I wasn't clear enough: I'm looking for local installation not online / cloud.
I already have UVR 5.6 installed, but I can't see any models that do that as I explained.

Can you recommend on any Local Gradio / GUI solutions for multiple speakers separation ?

@GUUser91
Copy link

@AlonDan
That doesn't exist as far I know. You're better off just clipping the audio with ffmpeg. That's what I do to gather voices.

@AlonDan
Copy link

AlonDan commented Dec 28, 2024

I didn't know that it's possible to separate overlapping multiple speakers via ffmpeg... that's new to me!😮
But I still rather using GUI for these things, using cmd commands will take me years to work with.


If anyone know how to get:
"BS-RoFormer" and "Mel-RoFormer" models to run locally via Gradio or within UVR ? (it's not in the built-in download list)

These 2 are not for separating multiple speakers but it's the best enhanced/cleanup I've tested so far compare to Kim2 and other MDX models.

I wonder if there is a local Gradio app just like THIS ONE, it's so simple yet very useful, but I'm looking for local installation:

I'm still looking for multiple separation like THIS wonderful project but locally, if anyone find something like that please share.
Thanks ahead! 🙏

@GUUser91
Copy link

GUUser91 commented Dec 28, 2024

@AlonDan
ffmpeg can't separate voices talking over each other if that's what you mean.

This fork has BS-RoFormer and Mel-RoFormer I believe
https://github.com/Eddycrack864/UVR5-UI

Resemble enhance has local gradio demo and can separate vocals from background music.
https://github.com/resemble-ai/resemble-enhance

@AlonDan
Copy link

AlonDan commented Dec 28, 2024

Thank you @GUUser91 for your kind help. 🙏

I'll give the UVR5-UI a chance it seems like it's supporting the models I mentioned and I hope that the installation is simple as it looks (batch file).

I hope that one day this project will have a simple GUI to try locally, it's very impressive!

@TomSuen
Copy link
Author

TomSuen commented Jan 7, 2025

Just drop a note here. Possibly, you might be interested in this project.

Thx, that's a good project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants