Skip to content

Commit

Permalink
Merged Uberi#395 of Uberi/speech_recognition
Browse files Browse the repository at this point in the history
  • Loading branch information
embie27 committed Dec 18, 2018
1 parent b013364 commit 1ab9790
Show file tree
Hide file tree
Showing 5 changed files with 81 additions and 21 deletions.
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ The "bt_audio_service_open" error means that you have a Bluetooth audio device,

For errors of the form "ALSA lib [...] Unknown PCM", see `this StackOverflow answer <http://stackoverflow.com/questions/7088672/pyaudio-working-but-spits-out-error-messages-each-time>`__. Basically, to get rid of an error of the form "Unknown PCM cards.pcm.rear", simply comment out ``pcm.rear cards.pcm.rear`` in ``/usr/share/alsa/alsa.conf``, ``~/.asoundrc``, and ``/etc/asound.conf``.

For "jack server is not running or cannot be started" or "connect(2) call to /dev/shm/jack-1000/default/jack_0 failed (err=No such file or directory)" or "attempt to connect to server failed", these are caused by ALSA trying to connect to JACK, and can be safely ignored. I'm not aware of any simple way to turn those messages off at this time, besides [entirely disabling printing while starting the microphone](https://github.com/Uberi/speech_recognition/issues/182#issuecomment-266256337).
For "jack server is not running or cannot be started" or "connect(2) call to /dev/shm/jack-1000/default/jack_0 failed (err=No such file or directory)" or "attempt to connect to server failed", these are caused by ALSA trying to connect to JACK, and can be safely ignored. I'm not aware of any simple way to turn those messages off at this time, besides `entirely disabling printing while starting the microphone <https://github.com/Uberi/speech_recognition/issues/182#issuecomment-266256337>`__.

On OS X, I get a ``ChildProcessError`` saying that it couldn't find the system FLAC converter, even though it's installed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion make-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ set -o pipefail # for a pipeline, if any of the commands fail with a non-zero ex
echo "Making release for SpeechRecognition-$1"

python setup.py bdist_wheel
gpg2 --detach-sign -a dist/SpeechRecognition-$1-*.whl
gpg --detach-sign -a dist/SpeechRecognition-$1-*.whl
twine upload dist/SpeechRecognition-$1-*.whl dist/SpeechRecognition-$1-*.whl.asc
4 changes: 3 additions & 1 deletion reference/library-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ Returns the most likely transcription if ``show_all`` is false (the default). Ot

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if there are any issues with the Sphinx installation.

``recognizer_instance.recognize_google(audio_data: AudioData, key: Union[str, None] = None, language: str = "en-US", show_all: bool = False) -> Union[str, Dict[str, Any]]``
``recognizer_instance.recognize_google(audio_data: AudioData, key: Union[str, None] = None, language: str = "en-US", , pfilter: Union[0, 1], show_all: bool = False) -> Union[str, Dict[str, Any]]``
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.
Expand All @@ -221,6 +221,8 @@ To obtain your own API key, simply follow the steps on the `API Keys <http://www

The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. A list of supported language tags can be found `here <http://stackoverflow.com/questions/14257598/what-are-language-codes-for-voice-recognition-languages-in-chromes-implementati>`__. Basically, language codes can be just the language (``en``), or a language with a dialect (``en-US``).

The profanity filter level can be adjusted with ``pfilter``: 0 - No filter, 1 - Only shows the first character and replaces the rest with asterisks. The default is level 0.

Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
Expand Down
12 changes: 6 additions & 6 deletions reference/pocketsphinx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,19 @@ By default, SpeechRecognition's Sphinx functionality supports only US English. A

To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running ``python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"``).

Here is a simple Bash script to install all of them:
Here is a simple Bash script to install all of them, assuming you've downloaded all three ZIP files into your current directory:

.. code:: bash
#!/usr/bin/env bash
SR_LIB=$(python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))")
sudo apt-get install --yes wget unzip
sudo wget https://db.tt/tVNcZXao -O "$SR_LIB/fr-FR.zip"
sudo unzip -o "$SR_LIB/fr-FR.zip" -d "$SR_LIB"
sudo apt-get install --yes unzip
sudo unzip -o fr-FR.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/fr-FR/"
sudo wget https://db.tt/2YQVXmEk -O "$SR_LIB/zh-CN.zip"
sudo unzip -o "$SR_LIB/zh-CN.zip" -d "$SR_LIB"
sudo unzip -o zh-CN.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/zh-CN/"
sudo unzip -o it-IT.zip -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/it-IT/"
Once installed, you can simply specify the language using the ``language`` parameter of ``recognizer_instance.recognize_sphinx``. For example, French would be specified with ``"fr-FR"`` and Mandarin with ``"zh-CN"``.

Expand Down
82 changes: 70 additions & 12 deletions speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,20 @@ def list_working_microphones():
audio.terminate()
return result

def __enter__(self):
assert self.stream is None, "This audio source is already inside a context manager"
self.audio = self.pyaudio_module.PyAudio()
try:
self.stream = Microphone.MicrophoneStream(
self.audio.open(
input_device_index=self.device_index, channels=1, format=self.format,
rate=self.SAMPLE_RATE, frames_per_buffer=self.CHUNK, input=True,
)
)
except Exception:
self.audio.terminate()
return self

def __exit__(self, exc_type, exc_value, traceback):
try:
self.stream.close()
Expand Down Expand Up @@ -590,10 +604,15 @@ def snowboy_wait_for_hot_word(self, snowboy_location, snowboy_hot_word_files, so
seconds_per_buffer = float(source.CHUNK) / source.SAMPLE_RATE
resampling_state = None

# buffers capable of holding 5 seconds of original and resampled audio
two_seconds_buffer_count = int(math.ceil(2 / seconds_per_buffer))
frames = collections.deque(maxlen=two_seconds_buffer_count)
resampled_frames = collections.deque(maxlen=two_seconds_buffer_count)
# buffers capable of holding 5 seconds of original audio
five_seconds_buffer_count = int(math.ceil(5 / seconds_per_buffer))
# buffers capable of holding 0.5 seconds of resampled audio
half_second_buffer_count = int(math.ceil(0.5 / seconds_per_buffer))
frames = collections.deque(maxlen=five_seconds_buffer_count)
resampled_frames = collections.deque(maxlen=half_second_buffer_count)
# snowboy check interval
check_interval = 0.05
last_check = time.time()
while True:
elapsed_time += seconds_per_buffer
if timeout and elapsed_time > timeout:
Expand All @@ -606,11 +625,13 @@ def snowboy_wait_for_hot_word(self, snowboy_location, snowboy_hot_word_files, so
# resample audio to the required sample rate
resampled_buffer, resampling_state = audioop.ratecv(buffer, source.SAMPLE_WIDTH, 1, source.SAMPLE_RATE, snowboy_sample_rate, resampling_state)
resampled_frames.append(resampled_buffer)

# run Snowboy on the resampled audio
snowboy_result = detector.RunDetection(b"".join(resampled_frames))
assert snowboy_result != -1, "Error initializing streams or reading audio data"
if snowboy_result > 0: break # wake word found
if time.time() - last_check > check_interval:
# run Snowboy on the resampled audio
snowboy_result = detector.RunDetection(b"".join(resampled_frames))
assert snowboy_result != -1, "Error initializing streams or reading audio data"
if snowboy_result > 0: break # wake word found
resampled_frames.clear()
last_check = time.time()

return b"".join(frames), elapsed_time

Expand Down Expand Up @@ -868,7 +889,7 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g
if hypothesis is not None: return hypothesis.hypstr
raise UnknownValueError() # no transcriptions available

def recognize_google(self, audio_data, key=None, language="en-US", show_all=False):
def recognize_google(self, audio_data, key=None, language="en-US", pfilter=0, show_all=False):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.
Expand All @@ -877,7 +898,9 @@ def recognize_google(self, audio_data, key=None, language="en-US", show_all=Fals
To obtain your own API key, simply following the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API".
The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language tags can be found in this `StackOverflow answer <http://stackoverflow.com/a/14302134>`__.
The profanity filter level can be adjusted with ``pfilter``: 0 - No filter, 1 - Only shows the first character and replaces the rest with asterisks. The default is level 0.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
Expand All @@ -895,6 +918,7 @@ def recognize_google(self, audio_data, key=None, language="en-US", show_all=Fals
"client": "chromium",
"lang": language,
"key": key,
"pFilter": pfilter
}))
request = Request(url, data=flac_data, headers={"Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate)})

Expand Down Expand Up @@ -1025,7 +1049,7 @@ def recognize_wit(self, audio_data, key, show_all=False):
convert_rate=None if audio_data.sample_rate >= 8000 else 8000, # audio samples must be at least 8 kHz
convert_width=2 # audio samples should be 16-bit
)
url = "https://api.wit.ai/speech?v=20160526"
url = "https://api.wit.ai/speech?v=20170307"
request = Request(url, data=wav_data, headers={"Authorization": "Bearer {}".format(key), "Content-Type": "audio/wav"})
try:
response = urlopen(request, timeout=self.operation_timeout)
Expand Down Expand Up @@ -1134,6 +1158,40 @@ def recognize_bing(self, audio_data, key, language="en-US", show_all=False):
if "RecognitionStatus" not in result or result["RecognitionStatus"] != "Success" or "DisplayText" not in result: raise UnknownValueError()
return result["DisplayText"]

def recognize_lex(self, audio_data, bot_name, bot_alias, user_id, content_type="audio/l16; rate=16000; channels=1", access_key_id=None, secret_access_key=None, region=None):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Amazon Lex API.
If access_key_id or secret_access_key is not set it will go through the list in the link below
http://boto3.readthedocs.io/en/latest/guide/configuration.html#configuring-credentials
"""
assert isinstance(audio_data, AudioData), "Data must be audio data"
assert isinstance(bot_name, str), "``bot_name`` must be a string"
assert isinstance(bot_alias, str), "``bot_alias`` must be a string"
assert isinstance(user_id, str), "``user_id`` must be a string"
assert isinstance(content_type, str), "``content_type`` must be a string"
assert access_key_id is None or isinstance(access_key_id, str), "``access_key_id`` must be a string"
assert secret_access_key is None or isinstance(secret_access_key, str), "``secret_access_key`` must be a string"
assert region is None or isinstance(region, str), "``region`` must be a string"

try:
import boto3
except ImportError:
raise RequestError("missing boto3 module: ensure that boto3 is set up correctly.")

client = boto3.client('lex-runtime', aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key,
region_name=region)

raw_data = audio_data.get_raw_data(
convert_rate=16000, convert_width=2
)

accept = "text/plain; charset=utf-8"
response = client.post_content(botName=bot_name, botAlias=bot_alias, userId=user_id, contentType=content_type, accept=accept, inputStream=raw_data)

return response["inputTranscript"]

def recognize_houndify(self, audio_data, client_id, client_key, show_all=False):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Houndify API.
Expand Down

0 comments on commit 1ab9790

Please sign in to comment.