Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Vosk API and updated README #513

Merged
merged 13 commits into from
Feb 9, 2022
3 changes: 3 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Speech recognition engine/API support:
* `Houndify API <https://houndify.com/>`__
* `IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__
* `Snowboy Hotword Detection <https://snowboy.kitt.ai/>`__ (works offline)
* `Tensorflow <https://www.tensorflow.org/>`__
mytja marked this conversation as resolved.
Show resolved Hide resolved
* `Vosk API <https://github.com/alphacep/vosk-api/>`__ (works offline)

**Quickstart:** ``pip install SpeechRecognition``. See the "Installing" section for more details.

Expand Down Expand Up @@ -86,6 +88,7 @@ To use all of the functionality of the library, you should have:
* **PocketSphinx** (required only if you need to use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
* **Google API Client Library for Python** (required only if you need to use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``)
* **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X)
* **Vosk** (required only if you need to use Vosk API speech recognition ``recognizer_instance.recognize_vosk``)

The following requirements are optional, but can improve or extend functionality in some situations:

Expand Down
33 changes: 33 additions & 0 deletions speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
__version__ = "3.8.1"
__license__ = "BSD"

# model for Vosk
modelVosk = Model("model")

try: # attempt to use the Python 2 modules
from urllib import urlencode
from urllib2 import Request, urlopen, URLError, HTTPError
Expand Down Expand Up @@ -1390,7 +1393,37 @@ def recognize_tensorflow(self, audio_data, tensor_graph='tensorflow-data/conv_ac
for node_id in top_k:
human_string = self.tflabels[node_id]
return human_string

def recognize_vosk(self, audio_data, language='en'):
from vosk import Model, KaldiRecognizer

if not os.path.exists("model"):
return "Please download the model from https://github.com/alphacep/vosk-api/blob/master/doc/models.md and unpack as 'model' in the current folder."
exit (1)

import pyaudio
mytja marked this conversation as resolved.
Show resolved Hide resolved

rec = KaldiRecognizer(modelVosk, 16000)
mytja marked this conversation as resolved.
Show resolved Hide resolved

p = pyaudio.PyAudio()
mytja marked this conversation as resolved.
Show resolved Hide resolved
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()

while True:
mytja marked this conversation as resolved.
Show resolved Hide resolved
data = stream.read(4000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
#bottom lines are for debugging
#print(rec.Result())
break
else:
#bottom lines are for debugging
#print(rec.PartialResult())
break

finalRecognition = rec.FinalResult()
return finalRecognition

def get_flac_converter():
"""Returns the absolute path of a FLAC converter executable, or raises an OSError if none can be found."""
Expand Down