Announcing the large-v2 model #661
Replies: 3 comments 5 replies
-
I have made more tests. In some particular areas v2 is better. For example lets say I have generated a method name as AsyncYieldReturn. Model v2 can capture it that way and model v1 says async yield return. However in some cases I found model v1 better. I hope you train model on more data and improve even further. 1 more thing, do optimal parameters stay same? because model 2 had terrible result with beam size 10 and best_of 10 here #657 |
Beta Was this translation helpful? Give feedback.
-
I'm using the desktop version of Whisper, running the ggml-large.bin model. I assume that large-v2 is more up to date, but I can find where to download it. I'm not as technically astute as most of the people I see commenting on Hugging Face and elsewhere. I would appreciate a simpler way of locating and downloading the latest models. |
Beta Was this translation helpful? Give feedback.
-
How to work around speaker diarization in node js? |
Beta Was this translation helpful? Give feedback.
-
We are pleased to announce the
large-v2
model. This model has been trained for 2.5 times more epochs, with SpecAugment, stochastic depth, and BPE dropout for regularization. Other than the training procedure, the model architecture and size remained the same as the originallarge
model, which is now renamed tolarge-v1
.The new large model shows improved performance in transcription, translation, as well as language identification compared to the
large-v1
model. The new model is a lot more "on trend" with the smaller models in the scaling curves:The
large-v2
model on average shows about 5% relative error reduction in English and about 10% in other languages, but please note that it may behave differently depending on the individual audio and in some cases perform worse thanlarge-v1
:If you already have installed the
whisper
package, you can upgrade it to the latest version with:After upgrading, calling
whisper.load_model("large")
will load the newlarge-v2
model.The model is also available on HuggingFace transformers. More details and results can be found in the updated paper.
Beta Was this translation helpful? Give feedback.
All reactions