Replies: 5 comments 3 replies
-
Before fine tuning it might be useful to experiment using |
Beta Was this translation helpful? Give feedback.
-
I’m curious on other folks thoughts here too. Haven’t had great luck in longer files with initial prompt. Was thinking of doing a post process step of running outputs through spaCy or nltk to get named entities and regex to replace. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
Completely anecdotal - in a non-English transcript today (large model, beam search) the surname "von der Leyen" was transcribed as "FonderLion". Running it again with |
Beta Was this translation helpful? Give feedback.
-
+1. I agree that having a list of custom vocab would be incredibly useful, likely for most applications. I think use cases for ASR are often domain specific. In my case, I'm using a for dictation and being able to specify names of people I work with and team lingo would be super helpful. |
Beta Was this translation helpful? Give feedback.
-
I am interested in domain-specific fine-tuning.
In my audio, there is a certain number of brand names, person names and domain-specific jargon. Whisper generally transcribes tht text fine but sometimes gets the specific vocab wrong.
Is this the right approach too fix this:
Take my audio samples for which I have GT (ground-truth) transcriptions.
Run Whisper on them, get the generated text.
Compare the generated text with the GT transcripts and gather those that mismatch.
Fine-tune Whisper ONLY on the audio samples and the GT transcripts for which a mismatch was found.
So in short, fine-tune only on the corrected errors, not on the entire corpus? I imagine that would reduce the fine-tuning time.
Is there any caveat to my approach?
Beta Was this translation helpful? Give feedback.
All reactions