Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularized STT implementation #118

Merged
merged 8 commits into from
Aug 10, 2014
Merged

Conversation

astahlman
Copy link
Contributor

Overview

This commit abstracts out the Speech To Text engine into a new module: stt.py. Users now have the option to specify a Google API key in their profile. If this key is present, Jasper will rely on the Google Speech API to transcribe audio during the active listen phase. The default behavior still uses the PocketSphinx engine for audio transcription.

Motivation

There is a stark difference between the performance of the Google Speech API and the PocketSphinx performance. I rarely ever need to repeat myself anymore.

Testing

  • Manual testing on both OS X and Raspberry Pi using the PocketSphinx engine as well as the Google Speech API
  • Updated tests, boot/test.py and client/test.py both pass.

Prerequisites

The new STT implementation requires a Google API key to be present in profile.yml

To obtain an API key:

  1. Join the Chromium Dev group
  2. Create a project through the Google Developers console
  3. Select your project. In the sidebar, navigate to "APIs & Auth." Activate the Speech API.
  4. Under "APIs & Auth," navigate to "Credentials." Create a new key for public API access.
  5. Copy your API key and run
    cd client/; python populate.py
    When prompted, paste this key for access to the Speech API.

This implementation also requires that either the ffmpeg or avconv audio utility be present on your $PATH. To install on RPi, simply run

sudo apt-get install libav-tools

Acknowledgements

This was inspired by @fritz-fritz 's fork

@@ -1,6 +1,8 @@
import yaml
import sys
import speaker
import stt

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these into a single import? I prefer that practice, and all it requires is that line 32 use stt.PocketSpinxSTT().

@charliermarsh
Copy link

This is great! I had this on my TODO list for the next few weeks, so thank you very much for sending over the pull request! I've gone through and added some comments, but once those are addressed, I don't think we'll be far away from merging this in.

As an aside: once this is merged in, can you send a pull request over to the docs site with amendments?

@astahlman
Copy link
Contributor Author

Sure, all of these comments make sense. I'll try to get around to making these changes in the next few days and send an updated pull request. And yes, post-merge I'll send a pull request for the docs.

@willondubs
Copy link

Just a suggestion regarding Google's STT v2 implementation. FLAC files are only required with v1. V2 additionally allows wav or mp3. You can even stream audio directly to the Google Speech API. (It works simple enough with nodejs, but I haven't been able to stream using Python just yet.) Google Speech API v2 works very similar to Wit.Ai so if you'll be enhancing your work, why not also include implements for Wit.Ai? Ref: https://www.npmjs.org/package/node-record-lpcm16

@astahlman
Copy link
Contributor Author

@willondubs thanks for the suggestion - I didn't know that v2 accepted .wav files. This allowed me to eliminate the implicit dependency on either ffmpeg or avlib. wit.ai looks nice, as well. I think it should be fairly easy for someone to integrate with their API after this change.

@crm416, I believe I've addressed your comments. Upon running populate.py, the user will now be prompted to choose their STT engine (or hit enter to default to 'sphinx'). If the user chooses 'google', he or she will then be prompted for an API key, which is added to profile["keys"]. We will default to PocketSphinx if no STT engine is specified in the profile.

@shbhrsaha
Copy link
Member

Thanks for the update, @astahlman . I'm going to run this through testing this weekend, and we'll merge it in if it looks good! Great work

@Holzhaus
Copy link
Member

Holzhaus commented Aug 9, 2014

Great!

shbhrsaha added a commit that referenced this pull request Aug 10, 2014
LGTM! Thanks for the fine work and revisions.
@shbhrsaha shbhrsaha merged commit 8112caf into jasperproject:master Aug 10, 2014
@shbhrsaha
Copy link
Member

LGTM! Thanks for the fine work and revisions.

Holzhaus added a commit to Holzhaus/jasper-client that referenced this pull request Aug 10, 2014
@charliermarsh
Copy link

@astahlman Can you send over a pull requests to the docs repo outlining the changes here?

@bsinfo523
Copy link

@willondubs @astahlman thanks for the great work - has anyone already integrated the wit.ai as STT in Jasper?

Holzhaus added a commit to Holzhaus/jasper-client that referenced this pull request Jan 2, 2015
@Holzhaus
Copy link
Member

Holzhaus commented Jan 2, 2015

@bsinfo523 Check out Pull Request #273.

@himuura
Copy link

himuura commented Aug 2, 2017

any idea on how to renew the google api key automatically after it burns out the 50 queries per day quota?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants