New transcription implementation using Whisper #1823

lfcnassif · 2023-08-18T23:25:26Z

As asked on #1335, we can offer Whisper for users and they can decide if they will pay the performance cost or not. Still not sure which would be better: Faster-Whisper or Whisper-JAX.

lfcnassif · 2023-08-20T03:39:46Z

I have some sad news regarding Whisper-JAX, I managed to run it on Linux. Unfortunately it took a bit more than 4h to transcribe my 29h test data set using Whisper medium model and running on one RTX3090. It also used a ton of GPU memory, about 19GB to load the medium model, while standard Whisper uses about 11GB and Faster-Whisper about 5GB, both for the larger model. Faster-Whisper took about 3h to do the same job using the medium model.

So, given the much higher memory usage and a bit slower performance of Whisper-JAX, at least on the hardware we have, Faster-Whisper seems a better option.

lfcnassif · 2023-08-20T03:42:39Z

PS: JAX support on Windows is also experimental and CPU only.

joasource · 2023-09-29T01:40:52Z

You probably already know, but Whisper runs very smoothly with PyTorch using CUDA 11.6. In fact, the best GUI implementation I've seen is this one: https://grisk.itch.io/whisper-gui.

I'm eagerly awaiting Whisper on IPED.

lfcnassif · 2023-09-29T16:11:52Z

We plan to integrate Whisper in version 4.2.0, to be released in some months. If you can't wait, there is a starting draft code here:
#1335 (comment)

rafael844 · 2023-10-06T13:14:59Z

I tested this whisper-gui and it's surprisingly fast, but I don't think the source is open source.

lfcnassif · 2024-04-12T20:56:14Z

Starting to work on this...

joasource · 2024-05-23T02:01:22Z

Starting to work on this...

Wonderful. Any release date forecast?

lfcnassif · 2024-05-23T02:09:13Z

Wonderful. Any release date forecast?

Hopefully next month.

lfcnassif · 2024-05-25T21:39:42Z

For those interested, a snapshot with this feature will be created here in a few minutes:
https://github.com/sepinf-inc/IPED/actions/runs/9238362650

lfcnassif added the enhancement label Aug 18, 2023

lfcnassif self-assigned this Apr 12, 2024

lfcnassif added a commit that referenced this issue Apr 12, 2024

'#1823: new audio transcription params for Whisper, rename old ones

a22017a

lfcnassif added a commit that referenced this issue Apr 12, 2024

'#1823: load new transcription parameters

8663142

lfcnassif added a commit that referenced this issue Apr 12, 2024

'#1823: rename RemoteWav2Vec2TranscriptTask to RemoteAudioTranscriptTask

3095e66

lfcnassif added a commit that referenced this issue Apr 12, 2024

'#1823: make private methods protected, make inner class package visible

3e4ba41

lfcnassif added a commit that referenced this issue Apr 12, 2024

'#1823: new Whisper process python service

20aeff9

lfcnassif added a commit that referenced this issue Apr 12, 2024

'#1823: new WhisperTranscriptTask communicating with the python process

d2e1d70

lfcnassif mentioned this issue Apr 12, 2024

#1823 whisper transcription #2165

Merged

lfcnassif added a commit that referenced this issue Apr 13, 2024

'#1823: fix a typo

147cdf1

lfcnassif added a commit that referenced this issue Apr 13, 2024

'#1823: convert UI language to whisper supported language format

71e125a

lfcnassif added a commit that referenced this issue Apr 13, 2024

'#1823: allow language auto detection configuration

2d0332f

lfcnassif added a commit that referenced this issue Apr 13, 2024

'#1823: uses a much smaller dependency to get number of GPUs

7177a91

lfcnassif added a commit that referenced this issue Apr 13, 2024

'#1823: rename remote transcript classes to be implementation decoupled

71df157

lfcnassif added a commit that referenced this issue Apr 13, 2024

'#1823: makes remote transcription load implementation class from config

53f80a6

lfcnassif added a commit that referenced this issue Apr 13, 2024

'#1823: update config file comments

cc7b495

lfcnassif added a commit that referenced this issue Apr 15, 2024

'#1823: use float16, not int8, for better precision and ~50% more speed

b6ec69d

lfcnassif added a commit that referenced this issue Apr 15, 2024

'#1823: use numpy.mean instead of numpy.average (by @gfd2020)

c559910

lfcnassif added a commit that referenced this issue Apr 16, 2024

'#1823: fix commit b6ec69d: uses float16 just for gpu, int8 for cpu

06cc625

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: change code to use WhisperX instead of Faster-Whisper

f094f73

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: don't break audios in 59s to benefit from batching long audios

67e5342

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: fix probability computation when there are no results

231b85d

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: update library name in error message

ca30e57

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: fix fallback code to use the same lib, add a warning message

4f7fcf3

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: redirect warmless console messages to log

8b23384

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: externalize batchSize and precision (compute_type) params

abb74fb

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: change default precision from float32 to int8

0c68009

lfcnassif added a commit that referenced this issue Apr 28, 2024

'#1823: update comments with JonatasGrosman's fine tuned large-v2 model

2861ea6

lfcnassif added a commit that referenced this issue Apr 29, 2024

'#1823: update python package to include needed docopt-0.6.2 lib

dd206f8

lfcnassif added a commit that referenced this issue Apr 30, 2024

'#1823: add a better error message if FFmpeg is not found on PATH

4737172

lfcnassif added a commit that referenced this issue May 25, 2024

'#1823: support both whisperx and faster_whisper, try whisperx first

25654b8

lfcnassif added a commit that referenced this issue May 25, 2024

'#1823: update error message about missing libraries

a60330a

lfcnassif added a commit that referenced this issue May 25, 2024

'#1823: update config files comments

067fc8f

lfcnassif added a commit that referenced this issue May 25, 2024

'#1823: log warning instead of aborting if FFmpeg in not on PATH

f8b3f5f

lfcnassif closed this as completed in #2165 May 25, 2024

lfcnassif added a commit that referenced this issue May 27, 2024

'#1823: abort with error message if whisperx found and ffmpeg not found

548215b

github-project-automation bot added this to 4.2 Aug 29, 2024

github-project-automation bot moved this to Done in 4.2 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New transcription implementation using Whisper #1823

New transcription implementation using Whisper #1823

lfcnassif commented Aug 18, 2023

lfcnassif commented Aug 20, 2023

lfcnassif commented Aug 20, 2023

joasource commented Sep 29, 2023

lfcnassif commented Sep 29, 2023 •

edited

Loading

rafael844 commented Oct 6, 2023

lfcnassif commented Apr 12, 2024

joasource commented May 23, 2024

lfcnassif commented May 23, 2024

lfcnassif commented May 25, 2024

New transcription implementation using Whisper #1823

New transcription implementation using Whisper #1823

Comments

lfcnassif commented Aug 18, 2023

lfcnassif commented Aug 20, 2023

lfcnassif commented Aug 20, 2023

joasource commented Sep 29, 2023

lfcnassif commented Sep 29, 2023 • edited Loading

rafael844 commented Oct 6, 2023

lfcnassif commented Apr 12, 2024

joasource commented May 23, 2024

lfcnassif commented May 23, 2024

lfcnassif commented May 25, 2024

lfcnassif commented Sep 29, 2023 •

edited

Loading