Add streaming microphone sample for Speech (#87)

Add streaming microphone sample for Speech
googleapis · Sep 10, 2018 · 5760658 · 5760658
1 parent f2c42b5
commit 5760658
Show file tree

Hide file tree

Showing 3 changed files with 258 additions and 108 deletions.
diff --git a/README.md b/README.md
@@ -2,149 +2,140 @@
 [//]: # "To regenerate it, use `npm run generate-scaffolding`."
 <img src="https://avatars2.githubusercontent.com/u/2810941?v=3&s=96" alt="Google Cloud Platform logo" title="Google Cloud Platform" align="right" height="96" width="96"/>
 
-# [Google Cloud Speech API: Node.js Client](https://github.com/googleapis/nodejs-speech)
+# Google Cloud Speech API: Node.js Samples
 
-[![release level](https://img.shields.io/badge/release%20level-general%20availability%20%28GA%29-brightgreen.svg?style&#x3D;flat)](https://cloud.google.com/terms/launch-stages)
-[![CircleCI](https://img.shields.io/circleci/project/github/googleapis/nodejs-speech.svg?style=flat)](https://circleci.com/gh/googleapis/nodejs-speech)
-[![AppVeyor](https://ci.appveyor.com/api/projects/status/github/googleapis/nodejs-speech?branch=master&svg=true)](https://ci.appveyor.com/project/googleapis/nodejs-speech)
-[![codecov](https://img.shields.io/codecov/c/github/googleapis/nodejs-speech/master.svg?style=flat)](https://codecov.io/gh/googleapis/nodejs-speech)
-
-> Node.js idiomatic client for [Speech API][product-docs].
+[![Open in Cloud Shell][shell_img]][shell_link]
 
 The [Cloud Speech API](https://cloud.google.com/speech/docs) enables easy integration of Google speech recognition technologies into developer applications. Send audio and receive a text transcription from the Cloud Speech API service.
 
+## Table of Contents
 
-* [Speech API Node.js Client API Reference][client-docs]
-* [github.com/googleapis/nodejs-speech](https://github.com/googleapis/nodejs-speech)
-* [Speech API Documentation][product-docs]
-
-Read more about the client libraries for Cloud APIs, including the older
-Google APIs Client Libraries, in [Client Libraries Explained][explained].
-
-[explained]: https://cloud.google.com/apis/docs/client-libraries-explained
-
-**Table of contents:**
-
-* [Quickstart](#quickstart)
-  * [Before you begin](#before-you-begin)
-  * [Installing the client library](#installing-the-client-library)
-  * [Using the client library](#using-the-client-library)
+* [Before you begin](#before-you-begin)
 * [Samples](#samples)
-* [Versioning](#versioning)
-* [Contributing](#contributing)
-* [License](#license)
-
-## Quickstart
+  * [Speech Recognition](#speech-recognition)
+  * [Speech Recognition v1p1beta1](#speech-recognition-v1p1beta1)
 
-### Before you begin
+## Before you begin
 
-1.  Select or create a Cloud Platform project.
+Before running the samples, make sure you've followed the steps in the
+[Before you begin section](../README.md#before-you-begin) of the client
+library's README.
 
-    [Go to the projects page][projects]
+## Samples
 
-1.  Enable billing for your project.
+### Speech Recognition
 
-    [Enable billing][billing]
+View the [source code][recognize_0_code].
 
-1.  Enable the Google Cloud Speech API API.
+[![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/nodejs-speech&page=editor&open_in_editor=samples/recognize.js,samples/README.md)
 
-    [Enable the API][enable_api]
+__Usage:__ `node recognize.js --help`
 
-1.  [Set up authentication with a service account][auth] so you can access the
-    API from your local workstation.
+```
+recognize.js <command>
+
+Commands:
+  recognize.js sync <filename>           Detects speech in a local audio file.
+  recognize.js sync-gcs <gcsUri>         Detects speech in an audio file located in a Google Cloud Storage bucket.
+  recognize.js sync-words <filename>     Detects speech in a local audio file with word time offset.
+  recognize.js async <filename>          Creates a job to detect speech in a local audio file, and waits for the job to
+                                         complete.
+  recognize.js async-gcs <gcsUri>        Creates a job to detect speech in an audio file located in a Google Cloud
+                                         Storage bucket, and waits for the job to complete.
+  recognize.js async-gcs-words <gcsUri>  Creates a job to detect speech  with word time offset in an audio file located
+                                         in a Google Cloud Storage bucket, and waits for the job to complete.
+  recognize.js stream <filename>         Detects speech in a local audio file by streaming it to the Speech API.
+  recognize.js listen                    Detects speech in a microphone input stream. This command requires that you
+                                         have SoX installed and available in your $PATH. See
+                                         https://www.npmjs.com/package/node-record-lpcm16#dependencies
+
+Options:
+  --version              Show version number                                                                   [boolean]
+  --encoding, -e                                                                          [string] [default: "LINEAR16"]
+  --sampleRateHertz, -r                                                                        [number] [default: 16000]
+  --languageCode, -l                                                                         [string] [default: "en-US"]
+  --help                 Show help                                                                             [boolean]
+
+Examples:
+  node recognize.js sync ./resources/audio.raw -e LINEAR16 -r 16000
+  node recognize.js async-gcs gs://gcs-test-data/vr.flac -e FLAC -r 16000
+  node recognize.js stream ./resources/audio.raw  -e LINEAR16 -r 16000
+  node recognize.js listen
+
+For more information, see https://cloud.google.com/speech/docs
+```
 
-[projects]: https://console.cloud.google.com/project
-[billing]: https://support.google.com/cloud/answer/6293499#enable-billing
-[enable_api]: https://console.cloud.google.com/flows/enableapi?apiid=speech.googleapis.com
-[auth]: https://cloud.google.com/docs/authentication/getting-started
+[recognize_0_docs]: https://cloud.google.com/speech/docs
+[recognize_0_code]: recognize.js
 
-### Installing the client library
+### Speech Recognition v1p1beta1
 
-    npm install --save @google-cloud/speech
+View the [source code][recognize.v1p1beta1_1_code].
 
-### Using the client library
+[![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/nodejs-speech&page=editor&open_in_editor=samples/recognize.v1p1beta1.js,samples/README.md)
 
-```javascript
-// Imports the Google Cloud client library
-const speech = require('@google-cloud/speech');
-const fs = require('fs');
+__Usage:__ `node recognize.v1p1beta1.js --help`
 
-// Creates a client
-const client = new speech.SpeechClient();
+```
+recognize.v1p1beta1.js <command>
 
-// The name of the audio file to transcribe
-const fileName = './resources/audio.raw';
+Commands:
+  recognize.v1p1beta1.js sync-model <filename> <model>    Detects speech in a local audio file using provided model.
+  recognize.v1p1beta1.js sync-model-gcs <gcsUri> <model>  Detects speech in an audio file located in a Google Cloud
+                                                          Storage bucket using provided model.
 
-// Reads a local audio file and converts it to base64
-const file = fs.readFileSync(fileName);
-const audioBytes = file.toString('base64');
+Options:
+  --version              Show version number                                                                   [boolean]
+  --encoding, -e                                                                          [string] [default: "LINEAR16"]
+  --sampleRateHertz, -r                                                                        [number] [default: 16000]
+  --languageCode, -l                                                                         [string] [default: "en-US"]
+  --help                 Show help                                                                             [boolean]
 
-// The audio file's encoding, sample rate in hertz, and BCP-47 language code
-const audio = {
-  content: audioBytes,
-};
-const config = {
-  encoding: 'LINEAR16',
-  sampleRateHertz: 16000,
-  languageCode: 'en-US',
-};
-const request = {
-  audio: audio,
-  config: config,
-};
+Examples:
+  node recognize.v1p1beta1.js sync-model ./resources/Google_Gnome.wav video -e LINEAR16 -r 16000
+  node recognize.v1p1beta1.js sync-model-gcs gs://gcs-test-data/Google_Gnome.wav phone_call -e FLAC -r 16000
 
-// Detects speech in the audio file
-client
-  .recognize(request)
-  .then(data => {
-    const response = data[0];
-    const transcription = response.results
-      .map(result => result.alternatives[0].transcript)
-      .join('\n');
-    console.log(`Transcription: ${transcription}`);
-  })
-  .catch(err => {
-    console.error('ERROR:', err);
-  });
+For more information, see https://cloud.google.com/speech/docs
 ```
 
-## Samples
-
-Samples are in the [`samples/`](https://github.com/googleapis/nodejs-speech/tree/master/samples) directory. The samples' `README.md`
-has instructions for running the samples.
+[recognize.v1p1beta1_1_docs]: https://cloud.google.com/speech/docs
+[recognize.v1p1beta1_1_code]: recognize.v1p1beta1.js
 
-| Sample                      | Source Code                       | Try it |
-| --------------------------- | --------------------------------- | ------ |
-| Speech Recognition | [source code](https://github.com/googleapis/nodejs-speech/blob/master/samples/recognize.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/nodejs-speech&page=editor&open_in_editor=samples/recognize.js,samples/README.md) |
-| Speech Recognition v1p1beta1 | [source code](https://github.com/googleapis/nodejs-speech/blob/master/samples/recognize.v1p1beta1.js) | [![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/nodejs-speech&page=editor&open_in_editor=samples/recognize.v1p1beta1.js,samples/README.md) |
+[shell_img]: //gstatic.com/cloudssh/images/open-btn.png
+[shell_link]: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/nodejs-speech&page=editor&open_in_editor=samples/README.md
 
-The [Speech API Node.js Client API Reference][client-docs] documentation
-also contains samples.
+### betaFeatures v1p1beta1
 
-## Versioning
+View the [source code][betaFeatures_code].
 
-This library follows [Semantic Versioning](http://semver.org/).
+[![Open in Cloud Shell][shell_img]](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/nodejs-speech&page=editor&open_in_editor=samples/betaFeatures.js,samples/README.md)
 
-This library is considered to be **General Availability (GA)**. This means it
-is stable; the code surface will not change in backwards-incompatible ways
-unless absolutely necessary (e.g. because of critical security issues) or with
-an extensive deprecation period. Issues and requests against **GA** libraries
-are addressed with the highest priority.
+__Usage:__ `node betaFeatures.js --help`
 
-More Information: [Google Cloud Platform Launch Stages][launch_stages]
-
-[launch_stages]: https://cloud.google.com/terms/launch-stages
+```
+betaFeatures.js <command>
 
-## Contributing
+Commands:
+  betaFeatures.js sync-model <filename> <model>    Detects speech in a local audio file using provided model.
+  betaFeatures.js sync-model-gcs <gcsUri> <model>  Detects speech in an audio file located in a Google Cloud
+                                                          Storage bucket using provided model.
 
-Contributions welcome! See the [Contributing Guide](https://github.com/googleapis/nodejs-speech/blob/master/.github/CONTRIBUTING.md).
+Options:
+  --version              Show version number                                                                   [boolean]
+  --encoding, -e                                                                          [string] [default: "LINEAR16"]
+  --sampleRateHertz, -r                                                                        [number] [default: 16000]
+  --languageCode, -l                                                                         [string] [default: "en-US"]
+  --help                 Show help                                                                             [boolean]
 
-## License
+Examples:
+  node betaFeatures.js sync-model ./resources/Google_Gnome.wav video -e LINEAR16 -r 16000
+  node betaFeatures.js sync-model-gcs gs://gcs-test-data/Google_Gnome.wav phone_call -e FLAC -r 16000
 
-Apache Version 2.0
+For more information, see https://cloud.google.com/speech/docs
+```
 
-See [LICENSE](https://github.com/googleapis/nodejs-speech/blob/master/LICENSE)
+[betaFeatures_docs]: https://cloud.google.com/speech/docs
+[betaFeatures_code]: betaFeatures.js
 
-[client-docs]: https://cloud.google.com/nodejs/docs/reference/speech/latest/
-[product-docs]: https://cloud.google.com/speech/docs
-[shell_img]: https://gstatic.com/cloudssh/images/open-btn.png
+[shell_img]: //gstatic.com/cloudssh/images/open-btn.png
+[shell_link]: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/googleapis/nodejs-speech&page=editor&open_in_editor=samples/README.md
diff --git a/samples/MicrophoneStream.js b/samples/MicrophoneStream.js
@@ -0,0 +1,129 @@
+/**
+ * Copyright 2017, Google, Inc.
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * This application demonstrates how to perform basic recognize operations with
+ * with the Google Cloud Speech API.
+ *
+ * For more information, see the README.md under /speech and the documentation
+ * at https://cloud.google.com/speech/docs.
+ */
+
+'use strict';
+
+/**
+ * Note: Correct microphone settings is required: check enclosed link, and make
+ * sure the following conditions are met:
+ * 1. SoX must be installed and available in your $PATH- it can be found here:
+ * http://sox.sourceforge.net/
+ * 2. Microphone must be working
+ * 3. Encoding, sampleRateHertz, and # of channels must match header of audio file you're
+ * recording to.
+ * 4. Get Node-Record-lpcm16 https://www.npmjs.com/package/node-record-lpcm16
+ * More Info: https://cloud.google.com/speech-to-text/docs/streaming-recognize
+ */
+
+// const encoding = 'LINEAR16';
+// const sampleRateHertz = 16000;
+// const languageCode = 'en-US';
+
+function microphoneStream(encoding, sampleRateHertz, languageCode) {
+  // [START micStreamRecognize]
+
+  // Node-Record-lpcm16
+  const record = require('node-record-lpcm16');
+
+  // Imports the Google Cloud client library
+  const speech = require('@google-cloud/speech');
+
+  const config = {
+    encoding: encoding,
+    sampleRateHertz: sampleRateHertz,
+    languageCode: languageCode,
+  };
+
+  const request = {
+    config,
+    interimResults: false, //Get interim results from stream
+  };
+
+  // Creates a client
+  const client = new speech.SpeechClient();
+
+  // Create a recognize stream
+  const recognizeStream = client
+    .streamingRecognize(request)
+    .on('error', console.error)
+    .on('data', data =>
+      process.stdout.write(
+        data.results[0] && data.results[0].alternatives[0]
+          ? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
+          : `\n\nReached transcription time limit, press Ctrl+C\n`
+      )
+    );
+
+  // Start recording and send the microphone input to the Speech API
+  record
+    .start({
+      sampleRateHertz: sampleRateHertz,
+      threshold: 0, //silence threshold
+      recordProgram: 'rec', // Try also "arecord" or "sox"
+      silence: '5.0', //seconds of silence before ending
+    })
+    .on('error', console.error)
+    .pipe(recognizeStream);
+
+  console.log('Listening, press Ctrl+C to stop.');
+  // [END micStreamRecognize]
+}
+
+require(`yargs`)
+  .demand(1)
+  .command(
+    `micStreamRecognize`,
+    `Streams audio input from microphone, translates to text`,
+    {},
+    opts =>
+      microphoneStream(opts.encoding, opts.sampleRateHertz, opts.languageCode)
+  )
+  .options({
+    encoding: {
+      alias: 'e',
+      default: 'LINEAR16',
+      global: true,
+      requiresArg: true,
+      type: 'string',
+    },
+    sampleRateHertz: {
+      alias: 'r',
+      default: 16000,
+      global: true,
+      requiresArg: true,
+      type: 'number',
+    },
+    languageCode: {
+      alias: 'l',
+      default: 'en-US',
+      global: true,
+      requiresArg: true,
+      type: 'string',
+    },
+  })
+  .example(`node $0 micStreamRecognize`)
+  .wrap(120)
+  .recommendCommands()
+  .epilogue(`For more information, see https://cloud.google.com/speech/docs`)
+  .help()
+  .strict().argv;