Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Vosk API and updated README #513

Merged
merged 13 commits into from
Feb 9, 2022
15 changes: 15 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Speech recognition engine/API support:
* `Houndify API <https://houndify.com/>`__
* `IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__
* `Snowboy Hotword Detection <https://snowboy.kitt.ai/>`__ (works offline)
* `Tensorflow <https://www.tensorflow.org/>`__
mytja marked this conversation as resolved.
Show resolved Hide resolved
* `Vosk API <https://github.com/alphacep/vosk-api/>`__ (works offline)

**Quickstart:** ``pip install SpeechRecognition``. See the "Installing" section for more details.

Expand All @@ -52,6 +54,8 @@ The `library reference <https://github.com/Uberi/speech_recognition/blob/master/

See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.

You have to install Vosk models for using Vosk. `Here <https://alphacephei.com/vosk/models>`__ are models avaiable. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"

Examples
--------

Expand Down Expand Up @@ -86,6 +90,7 @@ To use all of the functionality of the library, you should have:
* **PocketSphinx** (required only if you need to use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
* **Google API Client Library for Python** (required only if you need to use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``)
* **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X)
* **Vosk** (required only if you need to use Vosk API speech recognition ``recognizer_instance.recognize_vosk``)

The following requirements are optional, but can improve or extend functionality in some situations:

Expand Down Expand Up @@ -129,6 +134,16 @@ Note that the versions available in most package repositories are outdated and w

See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.

Vosk (for Vosk users)
~~~~~~~~~~~~~~~~~~~~~
Vosk API is **required if and only if you want to use Vosk recognizer** (``recognizer_instance.recognize_vosk``).

You can install it with ``python3 -m pip install vosk``.

You also have to install Vosk Models:

`Here <https://alphacephei.com/vosk/models>`__ are models avaiable for download. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"

Google Cloud Speech Library for Python (for Google Cloud Speech API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
19 changes: 18 additions & 1 deletion speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1390,7 +1390,24 @@ def recognize_tensorflow(self, audio_data, tensor_graph='tensorflow-data/conv_ac
for node_id in top_k:
human_string = self.tflabels[node_id]
return human_string


def recognize_vosk(self, audio_data, language='en'):
from vosk import Model, KaldiRecognizer

assert isinstance(audio_data, AudioData), "Data must be audio data"

if not hasattr(self, 'vosk_model'):
mytja marked this conversation as resolved.
Show resolved Hide resolved
if not os.path.exists("model"):
mytja marked this conversation as resolved.
Show resolved Hide resolved
return "Please download the model from https://github.com/alphacep/vosk-api/blob/master/doc/models.md and unpack as 'model' in the current folder."
exit (1)
self.vosk_model = Model("model")

rec = KaldiRecognizer(self.vosk_model, 16000);

rec.AcceptWaveform(audio_data.get_raw_data(convert_rate=16000, convert_width=2));
finalRecognition = rec.FinalResult()

return finalRecognition

def get_flac_converter():
"""Returns the absolute path of a FLAC converter executable, or raises an OSError if none can be found."""
Expand Down