Replies: 2 comments 2 replies
-
See this past discussion for some other whisper options to try @shonokin any other suggestions based on your testing? |
Beta Was this translation helpful? Give feedback.
2 replies
-
Hi, thank you for the samples. I guess you can do the same in Python by manipulating suppress_tokens[]. EDIT: Note that --suppress_tokens "" does not work in this case, for some reason. Could depend on the model though. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Issue Description:
I am using the Whisper model to recognize Japanese speech. However, most of the time, it is returning the transcription "ご視聴ありがとうございました" (which translates to "Thank you for watching"). This result is incorrect for the input speech I am testing. Additionally, the model is taking a considerable amount of time to return the transcription.
Steps to Reproduce:
Expected Behavior:
Audio files: 11543.zip
Actual Behavior:
Code Snippet:
Environment:
Beta Was this translation helpful? Give feedback.
All reactions