Skip to content
This repository has been archived by the owner on Jan 16, 2024. It is now read-only.

Speech Recognizer - initiator + profile configuration #54

Closed
julianobRibeiro opened this issue Jul 11, 2017 · 7 comments
Closed

Speech Recognizer - initiator + profile configuration #54

julianobRibeiro opened this issue Jul 11, 2017 · 7 comments
Assignees

Comments

@julianobRibeiro
Copy link

Hi,

We are implementing Speech Recognizer using profile as FAR_FIELD and initiator ad WAKEWORD.
According to AVS API if WAKEWORD is used for initiator then KEYWORD Indices are required to be sent by device to the cloud for 2nd stage of Wake Word verification.
I have a couple of questions as to that:

  1. Is it mandatory to send KEYWORD to cloud together with command buffer when WAKEWORD engine is in use?

  2. If not mandatory. How can I configure my AudioInputprocessor to use WAKEWORD and DO NOT provide keyword indices to could? I have tried to send AudioInputProcessor::INVALID_INDEX as indices BEGIN and END values but cloud is considering voice buffer as false detection almost all the time. As soon as I send the recognize event I am receiving a STOP directive almost immediately.
    m_aip->recognize(audioProvider, Initiator::WAKEWORD, aipBegin, aipEnd, keyword);

  3. If Mandatory. Can we switch and use TAP profile even if we have a WAKEWORD engine running? Looks like that only difference from TAP and WAKEWORD is the Wake Word 2nd stage Verification.

Thank you
Juliano Ribeiro

@kencecka
Copy link
Contributor

kencecka commented Jul 13, 2017

Hi Juliano,

The indices are optional, and are only required to enable cloud-based wakeword verification. Here is the relevant text from the SpeechRecognizer Documentation:

This object is only required for wake word enabled products that use cloud-based wake word verification.

If you are not able to provide accurate begin and end indices for your recognize call, then cloud-based wake word verification is not supported and will not be performed.

That said, other features in AVS may depend correctly specifying the initiator in the future, so it is important to pick the correct initiator. While you may currently be able to specify TAP as the initiator for a wakeword device, it may cause incorrect behavior in the future.

Hope that helps,
Ken

@julianobRibeiro
Copy link
Author

julianobRibeiro commented Jul 13, 2017

Hi Ken,

Thank you for the response. The only point which is not clear for me is that API that you sent is saying that if initiator selected is WAKEWORD then Indices ARE REQUIRED (that information is available on table provided on API)

Based on that, how can I choose to use WAKEWORD initiator whiteout providing the indices. Is there any WILD value that I need to provide to cloud for startIndexInSamples and endIndexInSamples?

Can I just ignore and do not sent those values?

Regards,
Juliano

@kencecka
Copy link
Contributor

Hi Juliano,

The wording in the documentation is perhaps a little misleading. The indexes are required "for products that use cloud-based wake word verification". In other words, not providing the indexes means your product will not use cloud-based wake word verification.

The code in the AudioInputProcessor module is already set up to handle this. If you call recognize with an initiator and do not provide indexes, it will still send the initiator as WAKEWORD, but will omit the wakeWordIndices payload.

Regarding the placeholder value if you do not have indexes, it is provided as a default in the AudioInputProcessor header:

        avsCommon::avs::AudioInputStream::Index begin = INVALID_INDEX,
        avsCommon::avs::AudioInputStream::Index keywordEnd = INVALID_INDEX,

Ken

@julianobRibeiro
Copy link
Author

julianobRibeiro commented Jul 13, 2017

Ok got it. Thank you for the clarification.

Now lets move to implementation.
If I am not providing the Indices I can not provide keyword(string) while calling recognize(). Below is signature for recognize() form AudioinputProcessor:

std::future<bool> recognize(
        AudioProvider audioProvider,
        Initiator initiator,
        avsCommon::avs::AudioInputStream::Index begin = INVALID_INDEX,
        avsCommon::avs::AudioInputStream::Index keywordEnd = INVALID_INDEX,
        std::string keyword = "");

This is how I am calling it:
m_aip->recognize(audioProvider, Initiator::WAKEWORD);

The problem is that If I do not provide keyword String that means that it will be empty by default and AIP will fail on recognize event saying:
AudioInputProcessor:executeRecognizeFailed:reason=emptyKeywordWithWakewordInitiator

By the other hand if I provide Indices as INVALID_INDEX and also give to API keyword string as "Alexa" then looks like cloud is considering all my Recognize events as false alarms because I am receiving a STOP capturefor my Recognize event and no other directive after that.
m_aip->recognize(audioProvider, Initiator::WAKEWORD, AudioInputProcessor::INVALID_INDEX, AudioInputProcessor::INVALID_INDEX, keyword);

Can you please check implementation vs what cloud is expecting to receive. How does cloud knows if it needs to execute cloud verification on buffer sent?

Thank you in advanced
Juliano Ribeiro

@kencecka kencecka self-assigned this Aug 8, 2017
@kencecka
Copy link
Contributor

Hi Juliano,

I apologize for the long delay in responding to this. I have done some investigation into this and confirmed that AVS is currently not allowing a WAKEWORD initiator without the indices. I'm working with the AVS team to identify the correct workaround or solution, and will update this ticket as soon as I have a clear answer.

Ken

@dhpp
Copy link
Contributor

dhpp commented Apr 18, 2018

Hi @julianobRibeiro, sorry that this ticket has been not updated for a while.

When it comes to the AVS specification, a client has to meet them quite strictly, so I don't believe a workaround for the indices are possible if you wish to use the WAKEWORD initiator type.

I think the one thing I'm missing from your initial posts is why you need to do this. I believe switching to the TAP profile will allow the user to still say "Alexa" in their spoken query. Did you try this, and was this sufficient for your needs?

@dhpp
Copy link
Contributor

dhpp commented Apr 20, 2018

Closing for now. Please re-open if you wish to continue discussion.

@dhpp dhpp closed this as completed Apr 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants