Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech API Operation Class doesn't return full results #444

Closed
adziuk opened this issue Apr 11, 2017 · 5 comments
Closed

Speech API Operation Class doesn't return full results #444

adziuk opened this issue Apr 11, 2017 · 5 comments
Assignees
Labels
api: speech Issues related to the Speech-to-Text API. 🚨 This issue needs some love. triage me I really want to be triaged.

Comments

@adziuk
Copy link

adziuk commented Apr 11, 2017

In google-cloud-php/src/Speech/Operation.php

public function results(array $options = [])
{
    $info = $this->info($options);
    return isset($info['response']['results'])
        ? $info['response']['results'][0]['alternatives']
        : [];
}

This function assumes that the results are fixed to 1 alternative, putting max_alternatives > 1, however, can result in more alternatives being returned. There are other optional settings to the RecognizeRequest that can change the contents of the results to include information other than just the alternatives, so I'm not sure how users are supposed to access that data.

@adziuk adziuk changed the title Speech API Operation Class doesn't support multiple alternatives Speech API Operation Class doesn't return full results Apr 11, 2017
@adziuk
Copy link
Author

adziuk commented Apr 11, 2017

Update: After reading more of this, with sufficiently long audio, this will simply return the wrong result. With long audio, there are multiple results, corresponding to sequential segments of audio, this will return all of the alternatives from the first segment, and nothing for subsequent segments.

@adziuk
Copy link
Author

adziuk commented Apr 11, 2017

link to relevant code

@jdpedrie jdpedrie added the api: speech Issues related to the Speech-to-Text API. label Apr 12, 2017
@dwsupplee
Copy link
Contributor

Thanks for the report @adziuk, I'll get this fixed today.

Do you know under what conditions a response is broken into multiple result sets? I was originally under the impression the multiple result sets were for streaming calls.

@danaharon
Copy link

When you set maxAlternatives (from https://cloud.google.com/speech-whitelist/docs/reference/rest/v1/RecognitionConfig) to a value greater than 1 then the API returns more than one alternative, regardless of whether you use recognize or LongRunningRecognize. The confidence scores for the results beyond first one are usually missing.

@adziuk
Copy link
Author

adziuk commented Apr 12, 2017

Things are generally broken into multiple result sets with longer audio, Audio around 60 seconds long looks like it's generally broken into multiple "results", for example, from the attached file (LINEAR16, sample rate = 44100)
eninv_45.wav.zip

Sync Recognize response: results {
 alternatives {
   transcript: "Pediatrics is my number one career choice. In many ways, it also reflects my second, third, and fourth career choices. Educated teach and Lead young people toward success."
   confidence: 0.9123565
 }
}
results {
 alternatives {
   transcript: " Legislators draft policies that improve processes for their constituents."
   confidence: 0.93561065
 }
}
results {
 alternatives {
   transcript: " Professional golfers commit themselves to extensive study and practice to master the skills of their profession."
   confidence: 0.96668166
 }
}
results {
 alternatives {
   transcript: " as a pediatrician, I see myself incorporating all three"
   confidence: 0.9412657
 }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: speech Issues related to the Speech-to-Text API. 🚨 This issue needs some love. triage me I really want to be triaged.
Projects
None yet
Development

No branches or pull requests

5 participants