Skip to content
This repository has been archived by the owner on Dec 13, 2022. It is now read-only.

Releases: fcakyon/pywhisper

v1.0.6

26 Oct 08:05
4a6508d
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.0.5...1.0.6

v1.0.5

15 Oct 19:56
964b9dc
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.0.4...1.0.5

v1.0.4

02 Oct 11:02
1687a46
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.0.3...1.0.4

v1.0.3

27 Sep 10:28
1f8ff3d
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.0.2...1.0.3

v1.0.2

25 Sep 13:02
1fc82b7
Compare
Choose a tag to compare

What's Changed

  • include latest updates from openai/whisper by @fcakyon in #8

Full Changelog: 1.0.1...1.0.2

v1.0.1

25 Sep 09:45
518bce2
Compare
Choose a tag to compare

bugfix release

What's Changed

Full Changelog: 1.0.0...1.0.1

v1.0.0

24 Sep 23:41
6423cb3
Compare
Choose a tag to compare

pywhisper

openai/whisper + extra features

extra features

  • no need for ffmpeg cli installation, pip install is enough
  • srt export
  • progress bar for transcribe
  • continious integration and package testing via github actions

setup

pip install pywhisper

You may need rust installed as well, in case tokenizers does not provide a pre-built wheel for your platform. If you see installation errors during the pip install command above, please follow the Getting started page to install Rust development environment.

command-line usage

The following command will transcribe speech in audio files, using the medium model:

pywhisper audio.flac audio.mp3 audio.wav --model medium

The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

pywhisper japanese.wav --language Japanese

Adding --task translate will translate the speech into English:

pywhisper japanese.wav --language Japanese --task translate

Run the following to view all available options:

pywhisper --help

See tokenizer.py for the list of all available languages.

python usage

Transcription can also be performed within Python:

import pywhisper

model = pywhisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

Internally, the transcribe() method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.

Below is an example usage of pywhisper.detect_language() and pywhisper.decode() which provide lower-level access to the model.

import pywhisper

model = pywhisper.load_model("base")

# load audio and pad/trim it to fit 30 seconds
audio = pywhisper.load_audio("audio.mp3")
audio = pywhisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = pywhisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = pywhisper.DecodingOptions()
result = pywhisper.decode(model, mel, options)

# print the recognized text
print(result.text)

What's Changed

New Contributors

Full Changelog: https://github.com/fcakyon/pywhisper/commits/1.0.0