Speech: encoding for speech to text ? #4360

amgsharma · 2017-11-08T16:37:01Z

API: Speech
MAX OSX
Python v35

I'm trying to set up a basic example for speech to text.
I've used ffmpeg to extract audio from an mp4, then convert this audio from mp3 to flac.

My code is as follows (as per the example on the SPEECH API documentation)

import io
import os

Imports the Google Cloud client library

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

Instantiates a client

client = speech.SpeechClient()

The name of the audio file to transcribe

file_name = os.path.join(
os.path.dirname(file),
'data','mp4s', 'audio',
'0BuayZmFrINBZHBG7uHMAI4U6xx4MkRC.flac')

Loads the audio into memory

with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)

config = types.RecognitionConfig(
# encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
encoding='FLAC',
sample_rate_hertz=48000,
language_code='en-US')
import pdb;pdb.set_trace()

Detects speech in the audio file

response = client.recognize(config, audio)

for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))

The current error I'm trying to debug is as follows:
google.gax.errors.RetryError: RetryError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.INVALID_ARGUMENT, Invalid audio channel count)>)

Haven't seen anything about this on the googles, so pardon if its a repeat.

amgsharma · 2017-11-08T16:39:31Z

Answer: Ensure 1 channel when converting as per this answer there:
https://stackoverflow.com/questions/39620198/google-cloud-speech-syncrecognize-invalid-argument

dariushazimi · 2018-04-18T16:26:09Z

@amgsharma Were you able to resolve the issue? Can you share the final version?

tseaver · 2018-04-18T17:47:34Z

@dariushazimi You need to ensure that the audio file to be converted is mono, not stereo.

…n-docs-samples#4360) * fix(translate): fix a broken test fixes #4353 * use uuid * fix builds

chemelnucfin changed the title ~~encoding for speech to text ?~~ Speech: encoding for speech to text ? Nov 8, 2017

chemelnucfin added the api: speech Issues related to the Speech-to-Text API. label Nov 8, 2017

amgsharma closed this as completed Nov 8, 2017

JustinBeckwith assigned amgsharma Feb 1, 2021

parthea pushed a commit that referenced this issue Oct 21, 2023

fix(translate): fix a broken test [(#4360)](GoogleCloudPlatform/pytho…

655bc03

…n-docs-samples#4360) * fix(translate): fix a broken test fixes #4353 * use uuid * fix builds

parthea pushed a commit that referenced this issue Oct 21, 2023

fix(translate): fix a broken test [(#4360)](GoogleCloudPlatform/pytho…

b124685

…n-docs-samples#4360) * fix(translate): fix a broken test fixes #4353 * use uuid * fix builds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech: encoding for speech to text ? #4360

Speech: encoding for speech to text ? #4360

amgsharma commented Nov 8, 2017

amgsharma commented Nov 8, 2017

dariushazimi commented Apr 18, 2018 •

edited

Loading

tseaver commented Apr 18, 2018

Speech: encoding for speech to text ? #4360

Speech: encoding for speech to text ? #4360

Comments

amgsharma commented Nov 8, 2017

Imports the Google Cloud client library

Instantiates a client

The name of the audio file to transcribe

Loads the audio into memory

Detects speech in the audio file

amgsharma commented Nov 8, 2017

dariushazimi commented Apr 18, 2018 • edited Loading

tseaver commented Apr 18, 2018

dariushazimi commented Apr 18, 2018 •

edited

Loading