Bug Report PHP: Google Cloud Speech to Text API not Recognizing all Speakers #3411
Labels
api: speech
Issues related to the Speech-to-Text API.
type: question
Request for information or clarification. Not an issue.
Environment details
Steps to reproduce
I am currently using Version 1.2.1 according to the vendor\google\cloud\Speech\VERSION file.
The Speech API was Installed via "composer require google/cloud" as part of the full cloud API.
I suspect the problem could be related to speakerTag always being zero and some ongoing code changes related to differentiating multiple speaker's voice characteristics are missing some code under certain scenarios.
The thing I am concerned about is that not all people speaking in the audio are being recognized and transcribed.
For example, I have an audio wave file that has several people speaking.
The teacher is the first to speak followed by Little Girl #1 followed by Little Girl #2 followed by Little Boy #2
All voices were recognized and transcribed with the exception of Little Girl #1, in fact throughout the entire video Little Girl #1 who speaks very clearly is never transcribed!
Here is a link to the video that I posted with closed captions that I created from the Google Speech API to test of the API: https://vimeo.com/455662126/5610c6b265
I addition there are several words that were not correct and YouTube Auto CC generator gets them right.
The audio wave file that I used as a source was extracted from the MP4 video using:
ffmpeg -i "03 Joining In Questions Comments.mp4" -ar 48000 -ac 1 "03 Joining In Questions Comments.wav"
SOURCE VIDEO (download to the same directory as the PHP script below)
Here is a download link to the original "03 Joining In Questions Comments.mp4": https://content.streamhoster.com/file/apsva/03_Joining_In_Questions_Comments.mp4?dl=1
Here are two versions of the audio source wave test files, both render the exact same text from the Google Speech API :
I uploaded to YouTube and it's auto-generated closed captions: https://www.youtube.com/watch?v=_hyET4U2xcM
Running the PHP Instructions
At this point you should have the following files all in the same directory:
My_Script.php
03 Joining In Questions Comments.mp4 (downloaded)
03 Joining In Questions Comments.srt (created by running My_Script.php)
03 Joining In Questions Comments.wav (downloaded audio source)
If you don't have VLC you can download it here: https://www.videolan.org/vlc/download-windows.html
Important Notice!!!
I have also used other audio source files and some unexpected text is inserted into the results. This audio file does not contain the acronym "BFF" (meaning "best friends forever") being said anywhere however it appears in the results! I am going to open another ticket on this problem that has better examples of text insertion coming from the server maybe from a thesaurus database or something.
Code example
Making sure to follow these steps will guarantee the quickest resolution possible.
Thanks!
The text was updated successfully, but these errors were encountered: