Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Direct Line Speech support #2621

Merged
merged 47 commits into from
Dec 4, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
2cc1d9e
Indicate a speech-related post activity
compulim Nov 10, 2019
beda217
Bump to [email protected]
compulim Nov 14, 2019
d61870b
Bump to [email protected]
compulim Nov 19, 2019
22ad6e0
Update ponyfill signature
compulim Nov 19, 2019
6cf4124
Fix ESLint
compulim Nov 19, 2019
b1027cc
Rename to speechSynthesisUtterance
compulim Nov 19, 2019
ee7f6c5
Add directlinespeech-sdk package
compulim Nov 21, 2019
c72ef0e
Fix ESLint
compulim Nov 21, 2019
94db857
Fix abort synthesis
compulim Nov 22, 2019
264d02f
Fix error thrown for non-speaking utterance
compulim Nov 22, 2019
f99ac05
Fix ESLint
compulim Nov 22, 2019
6abc47f
Clean up
compulim Nov 25, 2019
ac5cbce
Should not send user ID
compulim Nov 25, 2019
9c54ae3
Optional OAuth support
compulim Dec 2, 2019
ab8d776
Use fetchCredentials instead of token/key
compulim Dec 2, 2019
9fe16f5
Renew token
compulim Dec 2, 2019
2386085
Warning on telemetry option
compulim Dec 2, 2019
e313221
Update sample
compulim Dec 2, 2019
d6d9a2c
Add README.md
compulim Dec 2, 2019
1aa87e4
Remove cache part
compulim Dec 2, 2019
cab21bd
Improve error handling and concurrency
compulim Dec 2, 2019
a7eb859
Error handling and concurrency
compulim Dec 2, 2019
d33cf5b
Fix bad merge
compulim Dec 3, 2019
30dc281
Disable microphone button if recognition is not abortable
compulim Dec 3, 2019
df35bce
Add entry
compulim Dec 3, 2019
ac968f0
Add entry
compulim Dec 3, 2019
cf0bed7
Update sample
compulim Dec 3, 2019
6530c63
Fix build error
compulim Dec 3, 2019
c358f5f
Fix SHA on tarball
compulim Dec 4, 2019
a483654
Add Markdown
compulim Dec 4, 2019
ab2acd5
Applying PR changes
compulim Dec 4, 2019
f57ae76
Apply PR comments
compulim Dec 4, 2019
1cf9fdc
Apply PR comments
compulim Dec 4, 2019
11aec11
Fix ESLint
compulim Dec 4, 2019
6adf78e
Add math-random
compulim Dec 4, 2019
fe801a3
Apply suggestions from code review
compulim Dec 4, 2019
ee14753
Apply suggestions from code review
compulim Dec 4, 2019
2208e1b
Use constructor
compulim Dec 4, 2019
5efa368
Apply suggestions from code review
compulim Dec 4, 2019
415d550
Applying PR comment
compulim Dec 4, 2019
f7aaa71
Remove commented out part
compulim Dec 4, 2019
6c4a6d3
Add BYO ponyfill and audioContext
compulim Dec 4, 2019
85e69f7
Update props
compulim Dec 4, 2019
41092ce
Update token fetch
compulim Dec 4, 2019
11b3c85
Fix test
compulim Dec 4, 2019
50336ad
Fix test
compulim Dec 4, 2019
fb4ad40
Update entry
compulim Dec 4, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Fixes [#2597](https://github.com/microsoft/BotFramework-WebChat/issues/2597). Modify `watch` script to `start` and add `tableflip` script for throwing `node_modules`, by [@corinagum](https://github.com/corinagum) in PR [#2598](https://github.com/microsoft/BotFramework-WebChat/pull/2598)
- Adds Arabic Language Support, by [@midineo](https://github.com/midineo), in PR [#2593](https://github.com/microsoft/BotFramework-WebChat/pull/2593)
- Adds `AdaptiveCardsComposer` and `AdaptiveCardsContext` for composability for Adaptive Cards, by [@compulim](https://github.com/compulim), in PR [#2648](https://github.com/microsoft/BotFramework-WebChat/pull/2648)
- Adds Direct Line Speech support, by [@compulim](https://github.com/compulim) in PR [#2621](https://github.com/microsoft/BotFramework-WebChat/pull/2621)

### Fixed

Expand All @@ -68,7 +69,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Changed

- Bumped all dependencies to latest version, by [@compulim](https://github.com/compulim), in PR [#2533](https://github.com/microsoft/BotFramework-WebChat/pull/2533)
- Bumped all dependencies to latest version, by [@compulim](https://github.com/compulim), in PR [#2533](https://github.com/microsoft/BotFramework-WebChat/pull/2533) and PR [#2621](https://github.com/microsoft/BotFramework-WebChat/pull/2621)
- Development dependencies
- Root package
- `@azure/[email protected]`
Expand Down Expand Up @@ -132,6 +133,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- `component`
- `[email protected]`
- `[email protected]`
- `[email protected]`
- `[email protected]`
- `[email protected]`
- `[email protected]`
Expand Down
228 changes: 228 additions & 0 deletions DIRECT_LINE_SPEECH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Using Direct Line Speech

> For Cognitive Services Speech Services, please refer to [`SPEECH.md`](https://github.com/microsoft/BotFramework-WebChat/blob/master/SPEECH.md).

This guide is for integrating Direct Line Speech.

We assume you have already set up a bot and have Web Chat running on a page.

> Sample code in this article is optimized for modern browsers. You may need to use a [transpiler](https://en.wikipedia.org/wiki/Source-to-source_compiler) (e.g. [Babel](https://babeljs.io/)) to target a broader range of browsers.

## Support matrix

<table>
<thead>
<tr>
<th></th>
<th></th>
<th colspan="2">Chrome/Edge<br />and Firefox<br />on desktop</th>
<th colspan="2">Chrome<br />on Android</th>
<th colspan="2">Safari<br />on macOS/iOS</th>
<th colspan="2"><a href="https://developer.android.com/reference/android/webkit/WebView">Web View<br />on Android</a></th>
<th colspan="2"><a href="https://developer.apple.com/documentation/webkit/wkwebview">Web View<br />on iOS</a></th>
</tr>
</thead>
<tbody>
<tr>
<td>STT</td>
<td>Basic recognition</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td><a href="#custom-speech">Custom Speech</a></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td><a href="#text-normalization-options">Text normalization options</a></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td>Abort recognition</td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td>Interims</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td>Dynamic priming</td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td>Reference grammar ID</td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td>Select language at initialization</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td>Select language on-the-fly</td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td><a href="#using-input-hint">Input hint</a></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td>Select input device</td>
<th>❌</th><td><a href="#notes-3"><sup>*3</sup></a></td>
<th>❌</th><td><a href="#notes-3"><sup>*3</sup></a></td>
<th>❌</th><td><a href="#notes-3"><sup>*3</sup></a></td>
<th>❌</th><td><a href="#notes-3"><sup>*3</sup></a></td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td>Basic synthesis using text</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td><a href="#using-speech-synthesis-markup-language">Speech Synthesis Markup Language</a></td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td><a href="#custom-voice">Custom Voice</a></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td><a href="#selecting-voice">Selecting voice/pitch/rate/volume</a></td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td><a href="#text-to-speech-audio-format">Text-to-speech audio format</a></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td>Stripping text from Markdown</td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td>Override using "speak" property</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td>Adaptive Cards using "speak" property</td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td>Interrupt synthesis when clicking on microphone button (<a href="https://github.com/microsoft/BotFramework-WebChat/issues/2428">Bug</a>) (<a href="https://github.com/microsoft/BotFramework-WebChat/pull/2429">PR</a>)</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>✔</th><td>4.7</td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
<tr>
<td>TTS</td>
<td>Synthesize activity with multiple attachments</td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❌</th><td></td>
<th>❓</th><td><a href="#notes-2"><sup>*2</sup></a></td>
</tr>
</tbody>
</table>

### Notes

1. <a name="notes-1"></a>[Web View on iOS](https://developer.apple.com/documentation/webkit/wkwebview) is not a full browser. It does not have audio recording capabilities, which is required for Cognitive Services
2. <a name="notes-2"></a>As speech recognition is not working (see above), speech synthesis is not tested
3. <a name="notes-3"></a>Cognitive Services currently has a bug on selecting a different device for audio recording
- Currently blocked by https://github.com/microsoft/cognitive-services-speech-sdk-js/issues/96
- Tracking bug at https://github.com/microsoft/BotFramework-WebChat/issues/2481

## Requirements

Direct Line Speech shares the same requirements as Cognitive Services Speech Services. Please refer to [`SPEECH.md`](https://github.com/microsoft/BotFramework-WebChat/blob/master/SPEECH.md#requirements).
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ There is a breaking change on behavior expectations regarding speech and input h
| [`06.e.cognitive-services-speech-services-with-lexical-result`](https://github.com/microsoft/BotFramework-WebChat/tree/master/samples/06.e.cognitive-services-speech-services-with-lexical-result) | Demonstrates how to use lexical result from Cognitive Services Speech Services API. | [Lexical Result Demo](https://microsoft.github.io/BotFramework-WebChat/06.e.cognitive-services-speech-services-with-lexical-result) |
| [`06.f.hybrid-speech`](https://github.com/microsoft/BotFramework-WebChat/tree/master/samples/06.f.hybrid-speech) | Demonstrates how to use both browser-based Web Speech API for speech-to-text, and Cognitive Services Speech Services API for text-to-speech. | [Hybrid Speech Demo](https://microsoft.github.io/BotFramework-WebChat/06.f.hybrid-speech) |
| [`06.g.select-voice`](https://github.com/microsoft/BotFramework-WebChat/tree/master/samples/06.g.select-voice) | Demonstrates how to select speech synthesis voice based on activity. | [Select Voice Demo](https://microsoft.github.io/BotFramework-WebChat/06.g.select-voice) |
| [`06.i.direct-line-speech`](https://github.com/microsoft/BotFramework-WebChat/tree/master/samples/06.i.direct-line-speech) | Demonstrates how to use Direct Line Speech channel in Web Chat. | [Direct Line Speech Demo](https://microsoft.github.io/BotFramework-WebChat/06.i.direct-line-speech) |
| [`07.a.customization-timestamp-grouping`](https://github.com/microsoft/BotFramework-WebChat/tree/master/samples/07.a.customization-timestamp-grouping) | Demonstrates how to customize timestamps by showing or hiding timestamps and changing the grouping of messages by time. | [Timestamp Grouping Demo](https://microsoft.github.io/BotFramework-WebChat/07.a.customization-timestamp-grouping) |
| [`07.b.customization-send-typing-indicator`](https://github.com/microsoft/BotFramework-WebChat/tree/master/samples/07.b.customization-send-typing-indicator) | Demonstrates how to send typing activity when the user start typing on the send box. | [User Typing Indicator Demo](https://microsoft.github.io/BotFramework-WebChat/07.b.customization-send-typing-indicator) |
| [`08.customization-user-highlighting`](https://github.com/microsoft/BotFramework-WebChat/tree/master/samples/08.customization-user-highlighting) | Demonstrates how to customize the styling of activities based whether the message is from the user or the bot. | [User Highlighting Demo](https://microsoft.github.io/BotFramework-WebChat/08.customization-user-highlighting) |
Expand Down
13 changes: 12 additions & 1 deletion SPEECH.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Using Cognitive Services Speech Services

> For Direct Line Speech, please refer to [DIRECT_LINE_SPEECH.md](https://github.com/microsoft/BotFramework-WebChat/blob/master/DIRECT_LINE_SPEECH.md).

This guide is for integrating speech-to-text and text-to-speech functionality of Azure Cognitive Services.

We assume you have already set up a bot and have Web Chat running on a page.
Expand Down Expand Up @@ -86,7 +88,16 @@ We assume you have already set up a bot and have Web Chat running on a page.
</tr>
<tr>
<td>STT</td>
<td>Select language</td>
<td>Select language at initialization</td>
<th>✔</th><td>4.2</td>
<th>✔</th><td>4.2</td>
<th>✔</th><td>4.2</td>
<th>✔</th><td>4.2</td>
<th>❌</th><td><a href="#notes-1"><sup>*1</sup></a></td>
</tr>
<tr>
<td>STT</td>
<td>Select language on-the-fly</td>
<th>✔</th><td>4.2</td>
<th>✔</th><td>4.2</td>
<th>✔</th><td>4.2</td>
Expand Down
11 changes: 6 additions & 5 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
],
"testPathIgnorePatterns": [
"<rootDir>/__tests__/setup/",
"<rootDir>/packages/directlinespeech/__tests__/utilities/",
"<rootDir>/packages/playground/",
"<rootDir>/samples/"
],
Expand All @@ -69,15 +70,15 @@
},
"scripts": {
"bootstrap": "lerna bootstrap",
"build": "lerna run --scope=botframework-webchat* --scope=isomorphic* --stream build",
"build": "lerna run --scope=botframework-* --scope=isomorphic* --stream build",
"build:sample": "lerna run --scope=sample-* --stream build",
"clean": "lerna run --scope=botframework-webchat* --parallel --stream clean",
"clean": "lerna run --scope=botframework-* --parallel --stream clean",
"coveralls": "cat ./coverage/lcov.info | coveralls",
"eslint": "lerna run --scope=botframework-webchat* --scope=isomorphic* --parallel --stream eslint",
"eslint": "lerna run --scope=botframework-* --scope=isomorphic* --parallel --stream eslint",
"lerna-publish": "lerna publish",
"prepublishOnly": "lerna run --scope=botframework-webchat* --scope=isomorphic* --scope=playground --stream prepublishOnly",
"prepublishOnly": "lerna run --scope=botframework-* --scope=isomorphic* --scope=playground --stream prepublishOnly",
"prettier-readmes": "prettier --write **/**/*.md --tab-width 3 --single-quote true",
"start": "npm run build && lerna run --parallel --scope=botframework-webchat* --scope=isomorphic* --stream watch",
"start": "npm run build && lerna run --parallel --scope=botframework-* --scope=isomorphic* --stream watch",
"start:docker": "npm run build && docker-compose up --build",
"start:playground": "cd packages/playground && npm run start",
"tableflip": "npm run tableflip:start && lerna clean --yes --concurrency 2 && npx rimraf node_modules && npm ci && npm run bootstrap -- --concurrency 2 && npm run tableflip:end",
Expand Down
6 changes: 3 additions & 3 deletions packages/bundle/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion packages/bundle/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
"@babel/runtime": "^7.6.3",
"adaptivecards": "1.2.3",
"botframework-directlinejs": "^0.11.6",
"botframework-directlinespeech-sdk": "0.0.0-0",
"botframework-webchat-component": "0.0.0-0",
"botframework-webchat-core": "0.0.0-0",
"core-js": "^3.3.6",
Expand All @@ -46,7 +47,7 @@
"prop-types": "^15.7.2",
"sanitize-html": "^1.19.0",
"url-search-params-polyfill": "^7.0.0",
"web-speech-cognitive-services": "5.0.1",
"web-speech-cognitive-services": "^6.0.0",
"whatwg-fetch": "^3.0.0"
},
"devDependencies": {
Expand Down
5 changes: 5 additions & 0 deletions packages/bundle/src/createDirectLineSpeechAdapters.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import { createAdapters } from 'botframework-directlinespeech-sdk';

export default function createDirectLineSpeechAdapters(...args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the value of this re-export??

return createAdapters(...args);
}
2 changes: 2 additions & 0 deletions packages/bundle/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import createAdaptiveCardsAttachmentMiddleware from './adaptiveCards/createAdapt
import createCognitiveServicesBingSpeechPonyfillFactory from './createCognitiveServicesBingSpeechPonyfillFactory';
import createCognitiveServicesSpeechServicesPonyfillFactory from './createCognitiveServicesSpeechServicesPonyfillFactory';
import createStyleSet from './createFullStyleSet';
import createDirectLineSpeechAdapters from './createDirectLineSpeechAdapters';
import defaultCreateDirectLine from './createDirectLine';
import FullComposer from './FullComposer';
import ReactWebChat from './FullReactWebChat';
Expand Down Expand Up @@ -57,6 +58,7 @@ window['WebChat'] = {
createCognitiveServicesBingSpeechPonyfillFactory,
createCognitiveServicesSpeechServicesPonyfillFactory,
createDirectLine,
createDirectLineSpeechAdapters,
createStyleSet,
hooks: patchedHooks,
ReactWebChat,
Expand Down
Loading