If this project help you reduce time to develop, you can give me a cup of coffee, or some beers so I can code more :)
BTC: bc1q2zpmmlz7ujwx2ghsgw5j7umv8wmpchplemvhtu
ETH: 0x80e98FcfED62970e35a57d2F1fefed7C89d5DaF4
For CUDA-available devices, running Whisper with Silero-VAD is easily implemented by using Faster-Whisper. Whisper.cpp is an alternative to run Whisper on AMD gpu but it does not implement any-VAD.
This repo conatins python script for pre-processing input file with Silero-VAD and split it into chunks before passing them into any voice-to-text model. Then re-construct the full transcription from the chunk's results.
- pytorch (only cpu needed for VAD.)
- onnxruntime
- argparse
- ffmpeg
- json
- moviepy
- srt
- pysrt
To use with bash script, need Whisper.cpp to be installed.
- For Mac/Linux, download Whisper.cpp and make it executable. For Windows, download CLI version of Whisperer
- Put it in the same directory with
VAD_Whisper-cpp
andwhisper_with_VAD.sh
(orwhisper_with_VAD.ps1
). - Make the script executable by typing
chmod +x whisper_with_VAD.sh
.
If you don't want to use script, look at manually implementation.
For Mac/Linux user that can use Shell Script, simply run
./whisper_with_VAD.sh -f INPUT_FILE.mp4 -m MODEL_PATH
For example, if you want to use small
model then replace MODEL_PATH
with whisper.cpp/models/ggml-small.bin
.
For Windows user, I find it is convenient to use GPU-ready version through Whisperer. Download its CLI version and use script for PowerShell.
.\whisper_with_VAD.ps1 -f INPUT_FILE.mp4 -m MODEL_PATH