Skip to content

Python script for detect silences with Silero-VAD and transcribing with the whisper AI model.

Notifications You must be signed in to change notification settings

JRWSP/SileroVAD_for_Whisper-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

If this project help you reduce time to develop, you can give me a cup of coffee, or some beers so I can code more :)

BTC: bc1q2zpmmlz7ujwx2ghsgw5j7umv8wmpchplemvhtu
ETH: 0x80e98FcfED62970e35a57d2F1fefed7C89d5DaF4

Buy Me A Coffee

SileroVAD for Whisper-cpp

For CUDA-available devices, running Whisper with Silero-VAD is easily implemented by using Faster-Whisper. Whisper.cpp is an alternative to run Whisper on AMD gpu but it does not implement any-VAD.

This repo conatins python script for pre-processing input file with Silero-VAD and split it into chunks before passing them into any voice-to-text model. Then re-construct the full transcription from the chunk's results.

Dependecy

  • pytorch (only cpu needed for VAD.)
  • onnxruntime
  • argparse
  • ffmpeg
  • json
  • moviepy
  • srt
  • pysrt

Installation

To use with bash script, need Whisper.cpp to be installed.

  1. For Mac/Linux, download Whisper.cpp and make it executable. For Windows, download CLI version of Whisperer
  2. Put it in the same directory with VAD_Whisper-cpp and whisper_with_VAD.sh (or whisper_with_VAD.ps1).
  3. Make the script executable by typing chmod +x whisper_with_VAD.sh.

If you don't want to use script, look at manually implementation.

Usage

For Mac/Linux user that can use Shell Script, simply run

./whisper_with_VAD.sh -f INPUT_FILE.mp4 -m MODEL_PATH

For example, if you want to use small model then replace MODEL_PATH with whisper.cpp/models/ggml-small.bin.

For Windows user, I find it is convenient to use GPU-ready version through Whisperer. Download its CLI version and use script for PowerShell.

.\whisper_with_VAD.ps1 -f INPUT_FILE.mp4 -m MODEL_PATH

About

Python script for detect silences with Silero-VAD and transcribing with the whisper AI model.

Topics

Resources

Stars

Watchers

Forks