-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically adds "Thank you" #1592
Comments
That's hallucination. |
Interesting thanks for sharing. Is this fixable on the model? I'm stripping it programmatically for now |
As I've mentioned in that openai whisper thread, I got rid of these with the --suppress_tokens command line switch. It will cause you to get descriptions of sound events instead of "Thank you". EDIT: Around the line 4600 or so (there are similar lines for other tokens there). |
Can you give more details, where to add the line into? I don't know c++. |
Could you explain how removing the BEG token (begin time stamps) helps in reducing hallucinations? |
Well if I've understood this correctly, suppressing the non-speech tokens causes the BEG token to emerge somehow (rather than NOT/no timestamps token), and that's what causes these hallucinations. The workaround that I used for whisper/whisper-timestamped was to allow non-speech tokens. I suppose this could all be fixed in the training data too, but that's something we plebs don't get to see. EDIT: OK it was a nice theory, but it doesn't hold up (for whisper.cpp). PR1588 has some samples for testing. |
Hmmm, do these hallucinated tokens always have low probability? Another idea I haven't seen mentioned is that prompting can sometimes help (for short clips?). |
Testing the large v3 model on a word-by-word transcript output, when there is no audio at the end, it always adds "Thank you"
The text was updated successfully, but these errors were encountered: