-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat(whisper)] Add recognize_whisper #625
Conversation
Add a recognizer for https://github.com/openai/whisper
Thanks! |
def test_whisper_chinese(self): | ||
r = sr.Recognizer() | ||
with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source) | ||
self.assertEqual(r.recognize_whisper(audio, model="small", language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model="small"
is required.
✍️ When I specify model="base"
(the default value), this test failed due to wrong recognition.
======================================================================
FAIL: test_whisper_chinese (test_recognition.TestRecognition)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/.../speech_recognition-pr/tests/test_recognition.py", line 98, in test_whisper_chinese
self.assertEqual(r.recognize_whisper(audio, language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳")
AssertionError: "�<|translate|> I'm sorry." != '砸自己的腳'
- �<|translate|> I'm sorry.
+ 砸自己的腳
|
||
# recognize speech using whisper | ||
try: | ||
print("Whisper thinks you said " + r.recognize_whisper(audio, language="english")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works!🎉 Thanks.
$ python examples/microphone_recognition.py
Say something!
/.../speech_recognition-pr/venv/lib/python3.9/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Whisper thinks you said Hello whisper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your great PR.
Whisper works with SpeechRecognition😃
I am very sorry for my too late review.
I would like to merge this once only the MUST comment are addressed.
@joy-void-joy Can you respond to that comment?
If it's difficult for you, that is no problem.
I'll fix the MUST comment and merge this PR this night (JST)
Let's discuss comments other than MUST after merge.
**transcribe_options | ||
) | ||
|
||
if show_dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: I found Conditional expressions x if C else y
make here more concisely, but it depends on my preferences.
assert isinstance(audio_data, AudioData), "Data must be audio data" | ||
import whisper | ||
|
||
if load_options or not hasattr(self, "whisper_model") or self.whisper_model.get(model) is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✍️memo: or
is short-circuit.
https://docs.python.org/3/reference/expressions.html#boolean-operations
The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.
- When you passed not empty
dict
asload_options
, load model - When
load_options
isNone
or{}
and the instance does not havewhisper_model
attribute, then load model - When
load_options
isNone
or{}
and the instance havewhisper_model
attribute but the namemodel
does not included, then load model
It seems unit tests failed because of not pip installing whisper.
https://github.com/Uberi/speech_recognition/actions/runs/3405126020/jobs/5662866082 |
https://github.com/Uberi/speech_recognition/actions/runs/3405145980/jobs/5662903599 It seems that |
Thanks for the review, do you need anything helped with? I think the fp16 bug should be fixed in the new version of whisper, I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU |
Thanks for your reply.
I agree. If you have an idea to make it even slightly better, please send us a pull request. |
Solve #624 by adding recognize_whisper to Recognizer.
This works by writing in a tempfile, due to the format whisper asks for.
Usage example: