-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SpeechRecognition on an audio MediaStreamTrack #66
Comments
@Pehrsons Not entirely certain what is being proposed here relevant to Currently Chrome, Chromium records the user voice (without notice or permission) then sends that recording to a remote web service (#56; https://webwewant.fyi/wants/55/). The response is a transcript (text) of the input; depending on the input words, heavily censored. It is unclear what happens to the users' input (potential biometric data; their voice). For output from |
Is this issue to specify for |
I don't understand. getUserMedia cannot currently be used with SpeechRecognition, and MediaRecorder is not even remotely related. This proposal is about adding a MediaStreamTrack argument to SpeechRecognition's start method. Avoiding using an online service for recognition is completely unrelated to this, please use a separate issue for that. |
Is the proposal that when the argument to Or is the idea that capturing the microphone input would be replaced entirely by the
AFAICT Mozilla does not implement There is no benefit in adding |
Actually |
Since this is obviously something you have thought about would only suggest when proceeding to not omit the opportunity to concurrently or consecutively add |
I read your question as "Should the MediaStreamTrack argument to Preferably required, since if it's optional we cannot get rid of any of the language that I claimed we can in the proposal.
Are you suggesting allowing to run SpeechRecognition on a buffer of data in non-realtime? That seems like an orthogonal proposal, please file your own issue if you want to argue for that. Giving SpeechRecognition the controls of MediaRecorder because the implementation happens to encode and send audio data to a server doesn't make sense. The server surely only allows specific settings to container and codec. It also locks out any future implementations that do not rely on a server, because there'd be no reason to support MediaRecorder configurations for them, yet they have to.
This issue is about the spec, not Mozilla's implementation.
See my first post in this issue again, there are lots of benefits.
It's part of improving the spec, so you seem to have answered your own question. |
That is what occurs now at Chromium, Chrome. Reading the specification the term "real-time" does not appear at all in the document. The term would need to be clearly defined anyway, as "real-time" can have different meanings in different domains or be interpreted differently by different individuals.
Downloaded https://github.com/guest271314/mozilla-central/tree/libdeep yesterday. Will try to build and test. It should not be any issue setting Am particularly interested in how you intend to test what you are proposing to be specified? |
@Pehrsons Re use of static files for STT for TTS
|
Then they're not implementing the spec. Or are they, but they're using a buffer internally? Well, then you're conflating their implementation with the spec.
And it doesn't have to be if we use a MediaStreamTrack, since mediacapture-streams defines what we need.
Again, file a separate issue if you think that is the right way to go.
Give the API some input that you control, and observe that it gives you the expected output. |
Are you referring to the following language? https://w3c.github.io/speech-api/#speechreco-methods
which does not necessarily exclusively mean a "real-time" input media stream.
The last time checked the audio was compiled into a single buffer (cannot recollect at the moment if conversion to
Filed. |
Not necessarily. Yes, the spec is bad and hand-wavy so it's hard to find definitions for the terms used. But it's fairly easy to understand the spec authors intent. In this case it is that
I'm afraid it does. Otherwise, where do you pass in this non-realtime media stream?
Implementation detail and irrelevant to the spec.
|
Do not agree with that assessment. Have not noticed any new UI at Chromium. The specification as-is permits gleaning or massing various plausible interpretations from the language, not of which are necessarily conclusive and binding, at least not unambiguous - thus the current state of the art is to attempt to "incubate" the specification. A user can schedule an audio file (or reading of a |
Since there is current interest in amending and adding to the specification the question must be asked why would |
IMO because that would make the spec very complicated. Unnecessarily so, since there are other specs already allowing conversions between the two. |
Hi There! I apologize for resurrecting this discussion almost five years later. I was wondering if a conclusion has been reached regarding whether the start() method should take an input. I am trying to allow my users to select the microphone they want to use for recognition within our app. Currently, I am forcing them to change their default device, but it would be much easier if we could let them decide in-app. Thank you in advance for your time! |
Hello! Chrome is planning on adding MediaStreamTrack as an optional parameter to the start() method. Does anyone have any objections to this change? If not, I'll work on sending out a PR with the proposed changes. |
There is an old issue in bugzilla but it doesn't discuss much.
We should revive this, to give the application control over the source of audio.
Not letting the application be in control has several issues:
Letting the application be in control has several advantages:
To support a MediaStreamTrack argument to
start()
, we need to:What to throw and what to fire I leave unsaid for now.
The text was updated successfully, but these errors were encountered: