Skip to content

Commit

Permalink
Merge pull request #513 from mytja-archive/master
Browse files Browse the repository at this point in the history
Added Vosk API and updated README
  • Loading branch information
Uberi authored Feb 9, 2022
2 parents e7877ca + 274f5eb commit 5215137
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 1 deletion.
15 changes: 15 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Speech recognition engine/API support:
* `Houndify API <https://houndify.com/>`__
* `IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__
* `Snowboy Hotword Detection <https://snowboy.kitt.ai/>`__ (works offline)
* `Tensorflow <https://www.tensorflow.org/>`__
* `Vosk API <https://github.com/alphacep/vosk-api/>`__ (works offline)

**Quickstart:** ``pip install SpeechRecognition``. See the "Installing" section for more details.

Expand All @@ -52,6 +54,8 @@ The `library reference <https://github.com/Uberi/speech_recognition/blob/master/

See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.

You have to install Vosk models for using Vosk. `Here <https://alphacephei.com/vosk/models>`__ are models avaiable. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"

Examples
--------

Expand Down Expand Up @@ -86,6 +90,7 @@ To use all of the functionality of the library, you should have:
* **PocketSphinx** (required only if you need to use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
* **Google API Client Library for Python** (required only if you need to use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``)
* **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X)
* **Vosk** (required only if you need to use Vosk API speech recognition ``recognizer_instance.recognize_vosk``)

The following requirements are optional, but can improve or extend functionality in some situations:

Expand Down Expand Up @@ -129,6 +134,16 @@ Note that the versions available in most package repositories are outdated and w

See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst>`__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``.

Vosk (for Vosk users)
~~~~~~~~~~~~~~~~~~~~~
Vosk API is **required if and only if you want to use Vosk recognizer** (``recognizer_instance.recognize_vosk``).

You can install it with ``python3 -m pip install vosk``.

You also have to install Vosk Models:

`Here <https://alphacephei.com/vosk/models>`__ are models avaiable for download. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"

Google Cloud Speech Library for Python (for Google Cloud Speech API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
19 changes: 18 additions & 1 deletion speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1390,7 +1390,24 @@ def recognize_tensorflow(self, audio_data, tensor_graph='tensorflow-data/conv_ac
for node_id in top_k:
human_string = self.tflabels[node_id]
return human_string


def recognize_vosk(self, audio_data, language='en'):
from vosk import Model, KaldiRecognizer

assert isinstance(audio_data, AudioData), "Data must be audio data"

if not hasattr(self, 'vosk_model'):
if not os.path.exists("model"):
return "Please download the model from https://github.com/alphacep/vosk-api/blob/master/doc/models.md and unpack as 'model' in the current folder."
exit (1)
self.vosk_model = Model("model")

rec = KaldiRecognizer(self.vosk_model, 16000);

rec.AcceptWaveform(audio_data.get_raw_data(convert_rate=16000, convert_width=2));
finalRecognition = rec.FinalResult()

return finalRecognition

def get_flac_converter():
"""Returns the absolute path of a FLAC converter executable, or raises an OSError if none can be found."""
Expand Down

0 comments on commit 5215137

Please sign in to comment.