Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat(whisper)] Add recognize_whisper #625

Merged
merged 5 commits into from
Nov 6, 2022

Conversation

joy-void-joy
Copy link
Contributor

@joy-void-joy joy-void-joy commented Sep 28, 2022

Solve #624 by adding recognize_whisper to Recognizer.

This works by writing in a tempfile, due to the format whisper asks for.

Usage example:

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

print("Got it, now to recognize it...")

try:
    print("Whisper thinks you said " + r.recognize_whisper(audio, language='english'))
except sr.UnknownValueError:
    print("Whisper could not understand audio")
except sr.RequestError as e:
    print("Whisper error; {0}".format(e))

@ftnext
Copy link
Collaborator

ftnext commented Sep 28, 2022

Thanks!
I'll check this later.

@ftnext ftnext self-assigned this Sep 28, 2022
def test_whisper_chinese(self):
r = sr.Recognizer()
with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
self.assertEqual(r.recognize_whisper(audio, model="small", language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model="small" is required.

✍️ When I specify model="base" (the default value), this test failed due to wrong recognition.

======================================================================
FAIL: test_whisper_chinese (test_recognition.TestRecognition)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/.../speech_recognition-pr/tests/test_recognition.py", line 98, in test_whisper_chinese
    self.assertEqual(r.recognize_whisper(audio, language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳")
AssertionError: "�<|translate|> I'm sorry." != '砸自己的腳'
- �<|translate|> I'm sorry.
+ 砸自己的腳


# recognize speech using whisper
try:
print("Whisper thinks you said " + r.recognize_whisper(audio, language="english"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works!🎉 Thanks.

$ python examples/microphone_recognition.py
Say something!
/.../speech_recognition-pr/venv/lib/python3.9/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Whisper thinks you said  Hello whisper

Copy link
Collaborator

@ftnext ftnext left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your great PR.
Whisper works with SpeechRecognition😃

I am very sorry for my too late review.
I would like to merge this once only the MUST comment are addressed.

@joy-void-joy Can you respond to that comment?
If it's difficult for you, that is no problem.
I'll fix the MUST comment and merge this PR this night (JST)

Let's discuss comments other than MUST after merge.

README.rst Outdated Show resolved Hide resolved
**transcribe_options
)

if show_dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: I found Conditional expressions x if C else y make here more concisely, but it depends on my preferences.

speech_recognition/__init__.py Show resolved Hide resolved
assert isinstance(audio_data, AudioData), "Data must be audio data"
import whisper

if load_options or not hasattr(self, "whisper_model") or self.whisper_model.get(model) is None:
Copy link
Collaborator

@ftnext ftnext Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✍️memo: or is short-circuit.

https://docs.python.org/3/reference/expressions.html#boolean-operations

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

  • When you passed not empty dict as load_options, load model
  • When load_options is None or {} and the instance does not have whisper_model attribute, then load model
  • When load_options is None or {} and the instance have whisper_model attribute but the name model does not included, then load model

@ftnext ftnext mentioned this pull request Nov 6, 2022
5 tasks
@ftnext
Copy link
Collaborator

ftnext commented Nov 6, 2022

It seems unit tests failed because of not pip installing whisper.
I'll fix the unittest workflow file to install.

ModuleNotFoundError: No module named 'whisper'

https://github.com/Uberi/speech_recognition/actions/runs/3405126020/jobs/5662866082

@ftnext
Copy link
Collaborator

ftnext commented Nov 6, 2022

FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

https://github.com/Uberi/speech_recognition/actions/runs/3405145980/jobs/5662903599

It seems that ffmpeg are needed to install in the ubuntu-latest runner.
FYI: https://github.com/actions/runner-images/tree/main/images/linux

@joy-void-joy
Copy link
Contributor Author

Thanks for the review, do you need anything helped with? I think the fp16 bug should be fixed in the new version of whisper, I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU

@ftnext
Copy link
Collaborator

ftnext commented Nov 9, 2022

Thanks for your reply.

I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU

I agree.
I already merged #630 fp16=torch.cuda.is_available() and I believe the current implementation is similar to your idea.

If you have an idea to make it even slightly better, please send us a pull request.
Pull requests are always welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
whisper Features related to Whisper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants