Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAD and whisper-timestamped #30

Closed
Jeronymous opened this issue Nov 16, 2023 · 3 comments
Closed

VAD and whisper-timestamped #30

Jeronymous opened this issue Nov 16, 2023 · 3 comments

Comments

@Jeronymous
Copy link

First, thank you. I am super happy to see whisper-timestamped used in such a good project.
Having Whipser streamed in real time is a super feature!

I see here that VAD is not available when using whisper-timestamped backend:

def use_vad(self):
raise NotImplemented("Feature use_vad is not implemented for whisper_timestamped backend.")

But VAD IS implemented in whisper-timestamped (it was even before faster-whisper integrated it). It's currently based on SILERO (same as what was done in faster-whisper).
Am I missing a sticking point? (Maybe the fact that things required for VAD are not by default in the requirements?)
I can contribute if help is needed on this.

(VAD is important to prevent some hallucinations of Whisper models, and make timestamps more accurate)

Also, I want to mention:
After being disappointed with weird results on some files, I opened a branch to replace SILERO with AUDITOK : linto-ai/whisper-timestamped#78 (see the linked issue to have an illustration of possible "hallucinations" of Silero).
I had good experience with Auditok. I was hoping some user feedback to confirm before merging in master. But as it's not coming, maybe we just need to establish a benchmark to confirm the improvement.

@Gldkslfmsd
Copy link
Collaborator

Hi, thanks for feedback.
Yes, I know that VAD is in whisper_timestamped. I put NotImplemented because I primarily use and focus on faster-whisper backend. Feel free to implement it -- it should be easy, passing parameter to a function, analogically to

self.transcribe_kargs["vad_filter"] = True

SILERO vs AUDITOK is a topic for another issue. I don't have feedback.

@Gldkslfmsd
Copy link
Collaborator

but I realized that VAD is now used ineffectively. In every update it's processed on the whole buffer. It could be used to cut silence out of the buffer, so that next update is faster. This could be improved

@Gldkslfmsd
Copy link
Collaborator

SILERO vs AUDITOK is a topic for another issue. I don't have feedback.

@Jeronymous , please open an issue about this, if you'll have a test results to share

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants