WhisperCLI is a powerful command-line tool for transcribing audio and video files based on the Whisper model by OpenAI. This tool allows you to easily convert speech to text with support for multiple languages and the ability to process specific segments of a file.
- Support for various audio and video file formats (MP3, MP4, WAV, etc.)
- Choice of Whisper model (tiny, base, small, medium, large)
- Transcription of specific time intervals
- Support for multiple languages
- Silent mode with an option for detailed logs
- Save results to a file
- Simple command-line interface
- Python 3.7+
- PyTorch
- Transformers
- Pydub
- NumPy
- Clone the repository:
git clone https://github.com/Hole-code/WhisperCLI.git
cd WhisperCLI
- Install the dependencies:
pip install -r requirements.txt
To use WhisperCLI as a system utility, follow these steps:
- Create a
whispercli
file with the following content:#!/bin/bash python3 /full/path/to/whisper_cli.py "$@"
Replace /full/path/to/whisper_cli.py with the actual path to your script.
- Make the file executable:
chmod +x whispercli
- Move the file to a directory that is in the system PATH:
sudo mv whispercli /usr/local/bin/
- Now you can run the utility from any directory by simply typing whispercli:
whispercli -n path/to/audio_file.mp3
Note: Make sure you have administrative rights to create a symbolic link in the system directory.
Basic usage:
whispercli -n path/to/your/audio_file.mp3
-n
,--name
: Path to the audio file (required)-s
,--start
: Start time for transcription (format: SS, MM:SS, or HH:MM:SS)-e
,--end
: End time for transcription (format: SS, MM:SS, or HH:MM:SS)-l
,--language
: Expected language of the audio (e.g., 'en' for English)-m
,--model
: Whisper model size (tiny, base, small, medium, large)-o
,--output
: Output file name to save the transcription-v
,--verbose
: Output detailed logs
- Transcribe the entire file:
whispercli -n audio.mp3
- Transcribe a specific segment:
whispercli -n video.mp4 -s 5:30 -e 10:00
- Specify language and model:
whispercli -n audio.wav -l en -m medium
- Save the result to a file:
whispercli -n audio.mp3 -o transcription.txt
- Use all options with detailed logs:
whispercli -n long_video.mp4 -s 1:00:00 -e 1:30:00 -l en -m large -o result.txt -v
- Working with MP3 files may require installing ffmpeg.
- Large Whisper models may require significant computational resources.
- On the first run, the script will download the selected Whisper model, which may take some time.
We welcome contributions! If you have ideas for improvements or have found a bug, please create an issue or submit a pull request.
This project is licensed under the MIT License. For more details, see the LICENSE file.
- OpenAI for creating the Whisper model
- The developers of PyTorch, Transformers, and Pydub libraries