Replies: 1 comment 6 replies
-
Thanks for the effort, Alex, great project! Regarding to VAD, I'm wondering if there is any plan on releasing more sophisticated top-level segmentation logics beyond frame-level pretrained model. Say, 4 commonly used options are:
I see current codes do have a ring-buffer cache(for top-level average-smoothing), this is definitely somewhere that can be improved to support more complicated segmentation strategies & controls. I think those things can make silero-vad more friendly to end-users from non-speech background. Besides that, supports of multimedia input formats other than well-formatted 16k16bit pcm/wav(say, mp3 or even mp4/mkv) can help other projects to use silero-vad as a battery-included module. At the end, again, thanks for sharing silero-vad :) |
Beta Was this translation helpful? Give feedback.
-
Please share ideas on how / what to improve.
For example, some obvious ones:
Beta Was this translation helpful? Give feedback.
All reactions