Announcing the large-v2 model #661

jongwook · 2022-12-08T21:23:37Z

jongwook
Dec 8, 2022
Maintainer

We are pleased to announce the large-v2 model. This model has been trained for 2.5 times more epochs, with SpecAugment, stochastic depth, and BPE dropout for regularization. Other than the training procedure, the model architecture and size remained the same as the original large model, which is now renamed to large-v1.

The new large model shows improved performance in transcription, translation, as well as language identification compared to the large-v1 model. The new model is a lot more "on trend" with the smaller models in the scaling curves:

The large-v2 model on average shows about 5% relative error reduction in English and about 10% in other languages, but please note that it may behave differently depending on the individual audio and in some cases perform worse than large-v1:

If you already have installed the whisper package, you can upgrade it to the latest version with:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

After upgrading, calling whisper.load_model("large") will load the new large-v2 model.

The model is also available on HuggingFace transformers. More details and results can be found in the updated paper.

FurkanGozukara · 2022-12-09T00:44:23Z

FurkanGozukara
Dec 9, 2022

I have made more tests. In some particular areas v2 is better. For example lets say I have generated a method name as AsyncYieldReturn. Model v2 can capture it that way and model v1 says async yield return. However in some cases I found model v1 better.

I hope you train model on more data and improve even further.

1 more thing, do optimal parameters stay same? because model 2 had terrible result with beam size 10 and best_of 10

here #657

1 reply

goodPointP Dec 15, 2022

I have made more tests. In some particular areas v2 is better. For example lets say I have generated a method name as AsyncYieldReturn. Model v2 can capture it that way and model v1 says async yield return. However in some cases I found model v1 better.

I hope you train model on more data and improve even further.

1 more thing, do optimal parameters stay same? because model 2 had terrible result with beam size 10 and best_of 10

here #657

Not an expert, but I would expect the optimal parameters to change as model's weights have been updated. Maybe try a K-fold?

RickArcher108 · 2023-05-15T14:34:08Z

RickArcher108
May 15, 2023

I'm using the desktop version of Whisper, running the ggml-large.bin model. I assume that large-v2 is more up to date, but I can find where to download it. I'm not as technically astute as most of the people I see commenting on Hugging Face and elsewhere. I would appreciate a simpler way of locating and downloading the latest models.

3 replies

ZeenathulNizreen · 2023-06-27T07:17:13Z

ZeenathulNizreen
Jun 27, 2023

How to work around speaker diarization in node js?

1 reply

pfaffman Sep 5, 2023

You can check out https://colab.research.google.com/drive/12W6bR-C6NIEjAML19JubtzHPIlVxdaUq?usp=sharing#scrollTo=vqO6Nd6YfZYa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Announcing the large-v2 model #661

{{title}}

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Announcing the large-v2 model #661

jongwook Dec 8, 2022 Maintainer

Replies: 3 comments · 5 replies

jongwook
Dec 8, 2022
Maintainer

Replies: 3 comments 5 replies