Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Parameter use_enhanced and model to GoogleCloudSpeech (Fix #734) #735

Merged
merged 8 commits into from
Dec 18, 2024
8 changes: 7 additions & 1 deletion reference/library-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ Returns the most likely transcription if ``show_all`` is false (the default). Ot

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json: Union[str, None] = None, language: str = "en-US", preferred_phrases: Union[Iterable[str], None] = None, show_all: bool = False) -> Union[str, Dict[str, Any]]``
``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json: Union[str, None] = None, language: str = "en-US", preferred_phrases: Union[Iterable[str], None] = None, show_all: bool = False, **api_params) -> Union[str, Dict[str, Any]]``
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API.
Expand All @@ -238,6 +238,12 @@ The recognition language is determined by ``language``, which is a BCP-47 langua

If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.

``api_params`` are Cloud Speech API-specific parameters as dict (optional). For more information see <https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.RecognitionConfig>

The ``use_enhanced`` is a boolean option. If use_enhanced is set to true and the model field is not set, then an appropriate enhanced model is chosen if an enhanced model exists for the audio. If use_enhanced is true and an enhanced version of the specified model does not exist, then the speech is recognized using the standard version of the specified model.

Furthermore, if the option ``use_enhanced`` has not been set the option ``model`` can be used, which can be used to select the model best suited to your domain to get best results. If a model is not explicitly specified, then we auto-select a model based on the other parameters of this method.

Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection.
Expand Down
23 changes: 21 additions & 2 deletions speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -693,7 +693,7 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g
if hypothesis is not None: return hypothesis.hypstr
raise UnknownValueError() # no transcriptions available

def recognize_google_cloud(self, audio_data, credentials_json=None, language="en-US", preferred_phrases=None, show_all=False):
def recognize_google_cloud(self, audio_data, credentials_json=None, language="en-US", preferred_phrases=None, show_all=False, **api_params):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API.

Expand All @@ -703,6 +703,17 @@ def recognize_google_cloud(self, audio_data, credentials_json=None, language="en

If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.

``api_params`` are Cloud Speech API-specific parameters as dict (optional). For more information see <https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.RecognitionConfig>

The ``use_enhanced`` is a boolean option. If use_enhanced is set to true and the model field is not set,
then an appropriate enhanced model is chosen if an enhanced model exists for the audio.
If use_enhanced is true and an enhanced version of the specified model does not exist,
then the speech is recognized using the standard version of the specified model.

Furthermore, if the option ``use_enhanced`` has not been set the option ``model`` can be used, which can be used to select the model best
suited to your domain to get best results. If a model is not explicitly specified,
then we auto-select a model based on the other parameters of this method.

Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection.
Expand All @@ -713,6 +724,13 @@ def recognize_google_cloud(self, audio_data, credentials_json=None, language="en
assert isinstance(language, str), "``language`` must be a string"
assert preferred_phrases is None or all(isinstance(preferred_phrases, (type(""), type(u""))) for preferred_phrases in preferred_phrases), "``preferred_phrases`` must be a list of strings"

# Implementation of assertions of common api_params
ftnext marked this conversation as resolved.
Show resolved Hide resolved
if "use_enhanced" in api_params:
assert isinstance(api_params["use_enhanced"], bool), "``use_enhanced`` must be a boolean when used"

if "model" in api_params:
assert api_params["model"] in (None, "latest_long", "latest_short", "command_and_search", "phone_call", "video", "default", "medical_conversation", "medical_dictation"), "``model`` must be None or 'latest_long', 'latest_short', 'command_and_search', 'phone_call', 'video', or 'default'"

try:
import socket

Expand All @@ -735,7 +753,8 @@ def recognize_google_cloud(self, audio_data, credentials_json=None, language="en
config = {
'encoding': speech.RecognitionConfig.AudioEncoding.FLAC,
'sample_rate_hertz': audio_data.sample_rate,
'language_code': language
'language_code': language,
**api_params,
}
if preferred_phrases is not None:
config['speechContexts'] = [speech.SpeechContext(
Expand Down