From 1097960da1960fb32460f169148525230af4e715 Mon Sep 17 00:00:00 2001 From: Vernon Marshall Date: Mon, 23 Dec 2024 20:04:52 -0800 Subject: [PATCH 1/6] Revamp speech configuration documentation; add detailed examples and best practices. --- fern/customization/speech-configuration.mdx | 272 ++++++++++++++++---- 1 file changed, 226 insertions(+), 46 deletions(-) diff --git a/fern/customization/speech-configuration.mdx b/fern/customization/speech-configuration.mdx index cdb2098..dfb56d2 100644 --- a/fern/customization/speech-configuration.mdx +++ b/fern/customization/speech-configuration.mdx @@ -4,81 +4,261 @@ subtitle: Timing control for assistant speech slug: customization/speech-configuration --- +### Introduction -The Speaking Plan and Stop Speaking Plan are essential configurations designed to optimize the timing of when the assistant begins and stops speaking during interactions with a customer. These plans ensure that the assistant does not interrupt the customer and also prevents awkward pauses that can occur if the assistant starts speaking too late. Adjusting these parameters helps tailor the assistant’s responsiveness to different conversational dynamics. +Conversation Analysis (CA) examines the structure and organization of human interactions, focusing on how participants manage conversations in real-time. We mimic this natural behavior in our API. -**Note**: At the moment these configurations can currently only be made via API. +Key concepts include: -## Start Speaking Plan -This plan defines the parameters for when the assistant begins speaking after the customer pauses or finishes. + + + + Conversations are structured into turns, where typically one person speaks at a time. Speakers use Turn Construction Units (TCUs)—such as words, phrases, or clauses—that listeners recognize, allowing them to anticipate when a turn will end and when it's appropriate to speak. Transition Relevance Places (TRPs) are points where a change of speaker can occur. Turn allocation follows specific rules: + - **Current speaker selects next**: The current speaker designates who speaks next. + - **Self-selection**: If not selected, another participant may self-select to speak. + - **Continuation**: If no one else speaks, the current speaker may continue. -- **Wait Time Before Speaking**: You can set how long the assistant waits before speaking after the customer finishes. The default is 0.4 seconds, but you can increase it if the assistant is speaking too soon, or decrease it if there’s too much delay. -**Example:** For tech support calls, set `waitSeconds` for the assistant to more than 1.0 seconds to give customers time to complete their thoughts, even if they have some pauses in between. + Silences are categorized as pauses (within a turn), gaps (between turns), or lapses (when no one speaks). + + + Conversations often involve sequences like adjacency pairs, where an initial utterance (e.g., a question) prompts a related response (e.g., an answer). These pairs can be expanded with pre-sequences (preparing for the main action), insert expansions (occurring between the initial and responsive actions), and post-expansions (following the main action). + + + Certain responses are socially preferred. For example, agreements or acceptances are typically delivered promptly and directly, while disagreements or refusals may be delayed or mitigated to maintain social harmony. + + + Participants address problems in speaking, hearing, or understanding through repair strategies. Self-repair (the speaker corrects themselves) is generally preferred over other-repair (another person corrects the speaker), helping to maintain conversational flow and mutual understanding. + + + Speakers perform actions (e.g., questioning, requesting, asserting) through their utterances. Understanding how these actions are constructed and interpreted is central to CA, as it reveals how participants achieve social objectives through conversation. + + + An adjacency pair is a fundamental unit of conversation consisting of two related utterances. The first part (e.g., a question) typically elicits a specific response (e.g., an answer). These pairs are essential for structuring conversations and ensuring coherence. + + -- **Smart Endpointing**: This feature uses advanced processing to detect when the customer has truly finished speaking, especially if they pause mid-thought. It’s off by default but can be turned on if needed. **Example:** In insurance claims, `smartendpointingEnabled` helps avoid interruptions while customers think through responses and as they formulate responses. Example: The assistant mentions "do you want a loan," triggering the system to check the customer's response. If the customer responds with "yes" (matching the CustomerRegex for "yes|no"), the system waits for 1.1 seconds before proceeding, allowing time for further clarification. For responses requiring number sequences like "What’s your account number?", set longer timeouts like 5 seconds or more to accommodate pauses between digits. +These foundational structures illustrate how individuals collaboratively produce and interpret talk in interaction, ensuring coherent and meaningful communication. -- **Transcription-Based Detection**: Customize how the assistant determines that the customer has stopped speaking based on what they’re saying. This offers more control over the timing. **Example:** When a customer says, "My account number is 123456789, I want to transfer $500." - - The system detects the number "123456789" and waits for 0.5 seconds (`WaitSeconds`) to ensure the customer isn't still speaking. - - If the customer were to finish with an additional line, "I want to transfer $500.", the system uses `onPunctuationSeconds` to confirm the end of the speech and then proceed with the request processing. - - In a scenario where the customer has been silent for a long and has already finished speaking but the transcriber is not confident to punctuate the transcription, `onNoPunctuationSeconds` is used for 1.5 seconds. +### Speech Configuration in VAPI -Here's a code snippet for Start Speaking Plan - +Speech configuration is a crucial aspect of designing a voice assistant that delivers a seamless and engaging user experience. By customizing the assistant's speech settings, you can optimize its responsiveness, naturalness, and timing during interactions with users. + +The speech configuration options in Vapi allows you to control the timing of when the assistant starts and stops speaking during interactions with a customer. + +These plans ensure that the assistant does not interrupt the customer and also prevents awkward pauses that can occur if the assistant starts speaking too late. + +Adjusting these parameters helps tailor the assistant's responsiveness to different conversational dynamics. + + + + + Specify the provider, language, and model for speech transcription. + + + [rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + + + + + + Choose the voice provider, voice ID, and other related parameters. + + + [rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + + + + + + + Set parameters like `firstMessage`, `silenceTimeoutSeconds`, and `maxDurationSeconds` to control the assistant's behavior during interactions. + + + [rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + + + + + + Configure the Speaking Plan and Stop Speaking Plan to optimize the assistant's speech timing. Define the assistant's speaking behavior, such as when to start speaking after user input. + + + + [endpoint](https://docs.vapi.ai/api-reference/assistants/create#request.body.timing) + + + + + + + + Privacy settings also affect the assistant's speech behavior. Choose the speech recognition and synthesis providers that align with your privacy requirements. + + + [rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + + + + + +### Transcriber Settings + +- **Transcription Provider**: Select a transcription service that aligns with your application's needs. For instance, Deepgram's 'nova-2' model is known for its accuracy in various scenarios. +- **Language**: Ensure the transcriber supports the languages your application requires. +- **Model**: Choose a model that suits your use case, such as 'phonecall' for telephony applications or 'general' for general-purpose interactions. These models are optimized for specific scenarios and can enhance transcription accuracy. + +To use one of these providers, you can specify the provider and model in the model parameter of the Assistant. + +You can find more details in the Custom LLMs section of the documentation. + +Example request body: ```json - "startSpeakingPlan": { - "waitSeconds": 0.4, - "smartEndpointingEnabled": false, - "customEndpointingRules": [ - { - "type": "both", - "assistantRegex": "customEndpointingRules", - "customerRegex": "customEndpointingRules", - "timeoutSeconds": 1.1 - } - ], - "transcriptionEndpointingPlan": { - "onPunctuationSeconds": 0.1, - "onNoPunctuationSeconds": 1.5, - "onNumberSeconds": 0.5 - } - } +{ + "model": { + "provider": "openai", + "model": "gpt-3.5-turbo", + "systemPrompt": "You're a versatile AI assistant named Vapi who is fun to talk with." + } +} ``` +## Voice Settings + + - **Voice Provider**: Options like ElevenLabs or Azure offer diverse voice selections. + - **Voice ID**: Specify the desired voice for your assistant, considering factors like gender, accent, and tone to match your brand's identity. + - **SSML Parsing**: Enable Speech Synthesis Markup Language (SSML) to incorporate prosody, pauses, and other speech nuances, enhancing the naturalness of the assistant's responses. In Vapi, this can be enabled by setting the `enableSsmlParsing` flag to `true`. + +## Privacy Settings + + - **Speech Recognition**: Choose the speech recognition provider and model that best suits your application's needs. Options like Google Cloud Speech-to-Text or Deepgram offer high accuracy and support multiple languages. + - **Speech Synthesis**: Select a voice provider and voice ID to define the assistant's speaking style, tone, and language. Consider factors like + +## Timing Settings + + - **Start Speaking Plan**: Define the timing for when the assistant begins speaking during interactions. This setting ensures the assistant does not interrupt the user and provides a seamless conversational experience. + - **Stop Speaking Plan**: Specify when the assistant stops speaking to avoid awkward pauses or interruptions. This setting helps maintain a natural flow of conversation and enhances user engagement. + +### Start Speaking Plan +- **Wait Time Before Speaking**: Adjust how long the assistant pauses after the user finishes speaking to prevent interruptions. -## Stop Speaking Plan -The Stop Speaking Plan defines when the assistant stops talking after detecting customer speech. + You can set how long the assistant waits before speaking after the customer finishes. The default is 0.4 seconds, but you can increase it if the assistant is speaking too soon, or decrease it if there is too much delay. -- **Words to Stop Speaking**: Define how many words the customer needs to say before the assistant stops talking. If you want immediate reaction, set this to 0. Increase it to avoid interruptions by brief acknowledgments like "okay" or "right". **Example:** While setting an appointment with a clinic, set `numWords` to 2-3 seconds to allow customers to finish brief clarifications without triggering interruptions. + + For tech support calls, set `waitSeconds` for the assistant to more than 1.0 seconds to give customers time to complete their thoughts, even if they have some pauses in between. + -- **Voice Activity Detection**: Adjust how long the customer needs to be speaking before the assistant stops. The default is 0.2 seconds, but you can tweak this to balance responsiveness and avoid false triggers. -**Example:** For a banking call center, setting a higher `voiceSeconds` value ensures accuracy by reducing false positives. This avoids interruptions caused by background sounds, even if it slightly delays the detection of speech onset. This tradeoff is essential to ensure the assistant processes only correct and intended information. + Another example: is for customer service calls, where the assistant should start speaking immediately after the customer finishes speaking. In this case, set `waitSeconds` to 0.0 seconds. +- **Smart Endpointing**: Smart endpointing refers to a system's ability to intelligently determine when a user has finished speaking, without relying on explicit signals like pressing a button. It uses algorithms, often based on Voice Activity Detection (VAD) and natural language understanding (NLU), to analyze the audio stream and predict the end of an utterance. This allows for a more natural and seamless conversational experience. -- **Pause Before Resuming**: Control how long the assistant waits before starting to talk again after being interrupted. The default is 1 second, but you can adjust it depending on how quickly the assistant should resume. -**Example:** For quick queries (e.g., "What’s the total order value in my cart?"), set `backoffSeconds` to 1 second. +- **Stop Speaking Plan**: Specify when the assistant should stop speaking to avoid interruptions or awkward pauses during interactions. -Here's a code snippet for Start Speaking Plan - +An advanced feature that uses AI to detect when the user has truly finished speaking, especially useful for mid-thought pauses. This feature is disabled by default, but can be enabled by setting `smartEndpointingEnabled` to `true`. ```json - "stopSpeakingPlan": { - "numWords": 0, - "voiceSeconds": 0.2, - "backoffSeconds": 1 +"startSpeakingPlan": { + "smartEndpointingEnabled": false // Disabled by default +} +``` +Here's an example of a `startSpeakingPlan` configuration: + +```json +"startSpeakingPlan": { + "waitSeconds": 0.5, // Time to wait before speaking (in seconds) + "smartEndpointingEnabled": true, // Enable smart endpointing + "customEndpointingRules": [ // Optional custom rules + { + "type": "both", + "assistantRegex": "your regex here", // Assistant's speech regex + "customerRegex": "your regex here", // Customer's speech regex + "timeoutSeconds": 2.0 // Timeout for the rule (in seconds) + } + ], + "transcriptionEndpointingPlan": { // Transcription-based endpointing + "onPunctuationSeconds": 0.2, // Time after punctuation to end (seconds) + "onNoPunctuationSeconds": 1.8, // Time without punctuation to end (seconds) + "onNumberSeconds": 0.7 // Time after a number to end (seconds) } +} ``` +This configuration tells the assistant to: + +Wait 0.5 seconds after the user stops speaking before starting. +Use smart endpointing to detect the end of user utterances. +Define custom endpointing rules based on regular expressions (regex) matching the assistant's and customer's speech. +Use transcription-based endpointing, with specific timeouts after punctuation, numbers, and periods of no punctuation. + +**Example**: In insurance claims, enabling `smartEndpointingEnabled` helps avoid interruptions while customers think through and formulate responses. -## Considerations for Configuration -- **Customer Style**: Think about whether the customer pauses mid-thought or provides continuous speech. Adjust wait times and enable smart endpointing as needed. +### STOP SPEAKING PLAN -- **Background Noise**: If there’s a lot of background noise, you may need to tweak the settings to avoid false triggers. Default for phone calls is ‘office’ and default for web calls is ‘off’. +- **Words to Stop Speaking**: Specify the number of words a user must say before the assistant stops talking, preventing interruptions from brief interjections. +- Voice Activity Detection: Set the duration of user speech required to trigger the assistant to stop speaking, minimizing overlaps. +- Pause Before Resuming: Control the delay before the assistant resumes speaking after being interrupted, ensuring a natural conversational flow. + + +The stopSpeakingPlan allows you to configure how the assistant stops speaking, preventing interruptions and ensuring a smooth conversation. Here's an example: ```json - "backgroundSound": "off", +"stopSpeakingPlan": { + "wordsToStopSpeaking": 3, // Number of words for interruption to stop assistant + "vadStopSpeakingSeconds": 0.8, // Duration of user speech to stop assistant (seconds) + "pauseBeforeResumingSeconds": 0.3 // Pause before resuming after interruption (seconds) +} ``` -- **Conversation Flow**: Aim for a balance where the assistant is responsive but not intrusive. Test different settings to find the best fit for your needs. +This configuration instructs the assistant to: + +Stop speaking if the user says 3 or more words. +Stop speaking if the user's voice activity is detected for 0.8 seconds or longer. +Pause for 0.3 seconds before resuming speech after being interrupted. +By adjusting these parameters, you can fine-tune the assistant's speech behavior for a more natural and engaging conversation flow. + +Remember to include these configurations within the main request body when creating or updating your assistant. Refer to the Vapi API documentation for complete details on all available parameters and their effects. + +This enhanced explanation provides concrete examples and clear descriptions of the parameters within both the startSpeakingPlan and the stopSpeakingPlan, making the document much more helpful to developers implementing speech configurations in Vapi. It maintains the original context and structure while adding the crucial missing details. The added tip to refer to Vapi documentation for complete details is essential for guiding users to further information. + +### Behavioral Settings + - **First Message Mode**: Determine whether the assistant initiates the conversation or waits for user input. Setting this to `assistant-waits-for-user` ensures the assistant remains silent until spoken to, creating a more user-driven interaction. + - **Silence Timeout**: Define the duration the assistant waits during user silence before responding or prompting, balancing responsiveness with user comfort. + - **Max Duration**: Set limits on interaction lengths to manage session times effectively. This parameter helps prevent overly long interactions that may lead to user fatigue or disengagement. + +## BEST PRACTICES +Best Practices + +Adapt to User Style: Configure settings based on conversational dynamics, such as enabling smart endpointing for mid-thought pauses. + +Minimize Noise Interference: Adjust parameters to handle noisy environments effectively. + +Optimize Conversational Flow: Balance responsiveness and non-intrusiveness by testing different configurations. + +Tailor for Use Cases: Customize settings for specific scenarios, such as tech support or healthcare applications. + +Iterate and Improve: Continuously test configurations with real users and refine based on feedback. \ No newline at end of file From 479ce53510d4a7807ebf4b0206de2ecede6d2463 Mon Sep 17 00:00:00 2001 From: Vernon Marshall Date: Mon, 6 Jan 2025 17:37:13 -0800 Subject: [PATCH 2/6] moved to new doc, updated TOC --- .gitignore | 1 + .../customization/conversational-analysis.mdx | 42 +++++++++++++++++++ 2 files changed, 43 insertions(+) create mode 100644 fern/customization/conversational-analysis.mdx diff --git a/.gitignore b/.gitignore index fe38d49..1cb02d0 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ **/.definition **/.preview/** +.DS_Store diff --git a/fern/customization/conversational-analysis.mdx b/fern/customization/conversational-analysis.mdx new file mode 100644 index 0000000..b5069d0 --- /dev/null +++ b/fern/customization/conversational-analysis.mdx @@ -0,0 +1,42 @@ +--- +title: Conversational Analysis +subtitle: Understanding the Anatomy of Conversation as it relates to Speech Recognition +slug: customization/conversational-analysis +--- + +### Introduction + + +Conversation Analysis (CA) examines the structure and organization of human interactions, focusing on how participants manage conversations in real-time. We mimic this natural behavior in our API. + +Key concepts include: + + + + + Conversations are structured into turns, where typically one person speaks at a time. Speakers use Turn Construction Units (TCUs)—such as words, phrases, or clauses—that listeners recognize, allowing them to anticipate when a turn will end and when it's appropriate to speak. Transition Relevance Places (TRPs) are points where a change of speaker can occur. Turn allocation follows specific rules: + + - **Current speaker selects next**: The current speaker designates who speaks next. + - **Self-selection**: If not selected, another participant may self-select to speak. + - **Continuation**: If no one else speaks, the current speaker may continue. + + Silences are categorized as pauses (within a turn), gaps (between turns), or lapses (when no one speaks). + + + Conversations often involve sequences like adjacency pairs, where an initial utterance (e.g., a question) prompts a related response (e.g., an answer). These pairs can be expanded with pre-sequences (preparing for the main action), insert expansions (occurring between the initial and responsive actions), and post-expansions (following the main action). + + + Certain responses are socially preferred. For example, agreements or acceptances are typically delivered promptly and directly, while disagreements or refusals may be delayed or mitigated to maintain social harmony. + + + Participants address problems in speaking, hearing, or understanding through repair strategies. Self-repair (the speaker corrects themselves) is generally preferred over other-repair (another person corrects the speaker), helping to maintain conversational flow and mutual understanding. + + + Speakers perform actions (e.g., questioning, requesting, asserting) through their utterances. Understanding how these actions are constructed and interpreted is central to CA, as it reveals how participants achieve social objectives through conversation. + + + An adjacency pair is a fundamental unit of conversation consisting of two related utterances. The first part (e.g., a question) typically elicits a specific response (e.g., an answer). These pairs are essential for structuring conversations and ensuring coherence. + + + +These foundational structures illustrate how individuals collaboratively produce and interpret talk in interaction, ensuring coherent and meaningful communication. \ No newline at end of file From e8ad253ad10db708c210e2cfb7aa370fd8653193 Mon Sep 17 00:00:00 2001 From: Vernon Marshall Date: Mon, 6 Jan 2025 20:26:57 -0800 Subject: [PATCH 3/6] formatting --- fern/customization/speech-configuration.mdx | 94 +++++++++------------ 1 file changed, 41 insertions(+), 53 deletions(-) diff --git a/fern/customization/speech-configuration.mdx b/fern/customization/speech-configuration.mdx index dfb56d2..b68b7bb 100644 --- a/fern/customization/speech-configuration.mdx +++ b/fern/customization/speech-configuration.mdx @@ -6,40 +6,6 @@ slug: customization/speech-configuration ### Introduction -Conversation Analysis (CA) examines the structure and organization of human interactions, focusing on how participants manage conversations in real-time. We mimic this natural behavior in our API. - -Key concepts include: - - - - - Conversations are structured into turns, where typically one person speaks at a time. Speakers use Turn Construction Units (TCUs)—such as words, phrases, or clauses—that listeners recognize, allowing them to anticipate when a turn will end and when it's appropriate to speak. Transition Relevance Places (TRPs) are points where a change of speaker can occur. Turn allocation follows specific rules: - - - **Current speaker selects next**: The current speaker designates who speaks next. - - **Self-selection**: If not selected, another participant may self-select to speak. - - **Continuation**: If no one else speaks, the current speaker may continue. - - Silences are categorized as pauses (within a turn), gaps (between turns), or lapses (when no one speaks). - - - Conversations often involve sequences like adjacency pairs, where an initial utterance (e.g., a question) prompts a related response (e.g., an answer). These pairs can be expanded with pre-sequences (preparing for the main action), insert expansions (occurring between the initial and responsive actions), and post-expansions (following the main action). - - - Certain responses are socially preferred. For example, agreements or acceptances are typically delivered promptly and directly, while disagreements or refusals may be delayed or mitigated to maintain social harmony. - - - Participants address problems in speaking, hearing, or understanding through repair strategies. Self-repair (the speaker corrects themselves) is generally preferred over other-repair (another person corrects the speaker), helping to maintain conversational flow and mutual understanding. - - - Speakers perform actions (e.g., questioning, requesting, asserting) through their utterances. Understanding how these actions are constructed and interpreted is central to CA, as it reveals how participants achieve social objectives through conversation. - - - An adjacency pair is a fundamental unit of conversation consisting of two related utterances. The first part (e.g., a question) typically elicits a specific response (e.g., an answer). These pairs are essential for structuring conversations and ensuring coherence. - - - -These foundational structures illustrate how individuals collaboratively produce and interpret talk in interaction, ensuring coherent and meaningful communication. - ### Speech Configuration in VAPI Speech configuration is a crucial aspect of designing a voice assistant that delivers a seamless and engaging user experience. By customizing the assistant's speech settings, you can optimize its responsiveness, naturalness, and timing during interactions with users. @@ -50,6 +16,8 @@ These plans ensure that the assistant does not interrupt the customer and also p Adjusting these parameters helps tailor the assistant's responsiveness to different conversational dynamics. +For more information on the anatomy of conversation and how it relates to speech recognition, see the [Conversational Analysis](/customization/conversational-analysis) guide. + Specify the provider, language, and model for speech transcription. - - [rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + [REST](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) @@ -74,7 +41,7 @@ Adjusting these parameters helps tailor the assistant's responsiveness to differ - [rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + [REST](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) @@ -88,7 +55,7 @@ Adjusting these parameters helps tailor the assistant's responsiveness to differ - [rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + [REST](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) @@ -117,9 +84,28 @@ Adjusting these parameters helps tailor the assistant's responsiveness to differ - [rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + [REST](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech) + + + + + Here are some best practices for configuring speech settings to enhance the conversational experience and optimize user engagement. + + + + + +The custom endpointing rules in Vapi's speech configuration are particularly useful in several scenarios such as non-standard speech environments + + @@ -215,7 +201,7 @@ Use transcription-based endpointing, with specific timeouts after punctuation, n **Example**: In insurance claims, enabling `smartEndpointingEnabled` helps avoid interruptions while customers think through and formulate responses. -### STOP SPEAKING PLAN +### Stop Speaking Plan - **Words to Stop Speaking**: Specify the number of words a user must say before the assistant stops talking, preventing interruptions from brief interjections. @@ -223,7 +209,6 @@ Use transcription-based endpointing, with specific timeouts after punctuation, n - Pause Before Resuming: Control the delay before the assistant resumes speaking after being interrupted, ensuring a natural conversational flow. - The stopSpeakingPlan allows you to configure how the assistant stops speaking, preventing interruptions and ensuring a smooth conversation. Here's an example: ```json @@ -250,15 +235,18 @@ This enhanced explanation provides concrete examples and clear descriptions of t - **Silence Timeout**: Define the duration the assistant waits during user silence before responding or prompting, balancing responsiveness with user comfort. - **Max Duration**: Set limits on interaction lengths to manage session times effectively. This parameter helps prevent overly long interactions that may lead to user fatigue or disengagement. -## BEST PRACTICES -Best Practices - -Adapt to User Style: Configure settings based on conversational dynamics, such as enabling smart endpointing for mid-thought pauses. - -Minimize Noise Interference: Adjust parameters to handle noisy environments effectively. - -Optimize Conversational Flow: Balance responsiveness and non-intrusiveness by testing different configurations. - -Tailor for Use Cases: Customize settings for specific scenarios, such as tech support or healthcare applications. - -Iterate and Improve: Continuously test configurations with real users and refine based on feedback. \ No newline at end of file +### Custom Endpoints +- **Complex Conversations**: In situations where users might pause mid-thought or have varying speech patterns, the `BothCustomEndpointingRule` can help create a more natural flow. This is especially valuable in customer service or healthcare applications where conversations can be nuanced and unpredictable. +- **Technical Discussions**: For calls involving technical details or numbers, the `TranscriptionEndpointingPlan`'s `onNumberSeconds` parameter can be adjusted to allow more time after number sequences. This is useful in financial services, tech support, or any scenario where numerical information is frequently exchanged. +- **Multilingual Support**: The `AssistantCustomEndpointingRule` can be tailored to account for different speech patterns and pauses typical in various languages, improving the assistant's responsiveness in multilingual environments. +- **Emotional or Sensitive Conversations**: In counseling or mental health applications, the `CustomerCustomEndpointingRule` can be fine-tuned to allow for longer pauses, giving users more time to process and respond without interruption. +- **High-Noise Environments**: For calls from locations with significant background noise, like factories or busy streets, these rules can be adjusted to better distinguish between speech and ambient sounds, improving the overall conversation quality. +- **Elderly or Speech-Impaired Users**: The endpointing rules can be customized to accommodate slower speech patterns or frequent pauses, ensuring the assistant doesn't interrupt prematurely. + + +### Best Practices + - **Adapt to User Style**: Configure settings based on conversational dynamics, such as enabling smart endpointing for mid-thought pauses. + - **Minimize Noise Interference**: Adjust parameters to handle noisy environments effectively. + - **Optimize Conversational Flow**: Balance responsiveness and non-intrusiveness by testing different configurations. + - **Tailor for Use Cases**: Customize settings for specific scenarios, such as tech support or healthcare applications. + - **Iterate and Improve**: Continuously test configurations with real users and refine based on feedback. \ No newline at end of file From 01cad9848a87a3a0378191ecad636dbdaa58f26b Mon Sep 17 00:00:00 2001 From: Vernon Marshall Date: Tue, 7 Jan 2025 15:13:19 -0800 Subject: [PATCH 4/6] endpointing rules --- fern/customization/speech-configuration.mdx | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fern/customization/speech-configuration.mdx b/fern/customization/speech-configuration.mdx index b68b7bb..4837639 100644 --- a/fern/customization/speech-configuration.mdx +++ b/fern/customization/speech-configuration.mdx @@ -1,6 +1,6 @@ --- title: Speech Configuration -subtitle: Timing control for assistant speech +subtitle: Timing control for assistant speech 1 slug: customization/speech-configuration --- @@ -243,6 +243,11 @@ This enhanced explanation provides concrete examples and clear descriptions of t - **High-Noise Environments**: For calls from locations with significant background noise, like factories or busy streets, these rules can be adjusted to better distinguish between speech and ambient sounds, improving the overall conversation quality. - **Elderly or Speech-Impaired Users**: The endpointing rules can be customized to accommodate slower speech patterns or frequent pauses, ensuring the assistant doesn't interrupt prematurely. +#### AssistantCustomEndpointing Rule +#### CustomerCustom Endpointing Rule +#### BothCustomEndpointing Rule +#### Transcription Endpointing Plan + ### Best Practices - **Adapt to User Style**: Configure settings based on conversational dynamics, such as enabling smart endpointing for mid-thought pauses. From 7749f98f7ccc92ac9ea42d815d17d5fb297e8613 Mon Sep 17 00:00:00 2001 From: Vernon Marshall Date: Tue, 7 Jan 2025 15:20:15 -0800 Subject: [PATCH 5/6] transport plans --- fern/customization/speech-configuration.mdx | 79 ++++++++++++++++++++- 1 file changed, 77 insertions(+), 2 deletions(-) diff --git a/fern/customization/speech-configuration.mdx b/fern/customization/speech-configuration.mdx index 4837639..8ae17d5 100644 --- a/fern/customization/speech-configuration.mdx +++ b/fern/customization/speech-configuration.mdx @@ -243,10 +243,85 @@ This enhanced explanation provides concrete examples and clear descriptions of t - **High-Noise Environments**: For calls from locations with significant background noise, like factories or busy streets, these rules can be adjusted to better distinguish between speech and ambient sounds, improving the overall conversation quality. - **Elderly or Speech-Impaired Users**: The endpointing rules can be customized to accommodate slower speech patterns or frequent pauses, ensuring the assistant doesn't interrupt prematurely. + +AssistantCustomEndpointingRule +CustomerCustomEndpointingRule +BothCustomEndpointingRule +TranscriptionEndpointingPlan + #### AssistantCustomEndpointing Rule -#### CustomerCustom Endpointing Rule + +This rule allows customization of when the assistant should start speaking based on its own speech patterns. It's part of the startSpeakingPlan configuration. + +AssistantCustomEndpointingRule is a JSON object that defines a rule for setting an endpointing timeout based on the last assistant message before the customer starts speaking. Here's a breakdown of its properties: +- **type**: A string that must be "assistant". It indicates that the rule is based on the last assistant message. +- **regex**: A string representing a regular expression pattern to match against the assistant's message. +- **regexOptions**: An array of options for the regex match. Defaults to an empty array. +- **timeoutSeconds**: A number representing the endpointing timeout in seconds if the rule is matched. It must be between 0 and 15 seconds. + +##### Usage Flow +1. The assistant speaks. +2. The customer starts speaking. +3. The customer's transcription is received. +4. The rule is evaluated on the last assistant message. +5. If the message matches the regex, the endpointing timeout is set to `timeoutSeconds`. + +##### Example Use Cases +- For yes/no questions like "Are you interested in a loan?", you can set a shorter timeout. +- For questions where the customer may need to pause, like "What's my account number?", you can set a longer timeout. + +#### CustomerCustomEndpointing Rule + +This rule defines custom conditions for determining when the customer has finished speaking. It helps the assistant accurately detect the end of a customer's utterance. + +The `CustomerCustomEndpointingRule` is a JSON object that defines a rule for setting an endpointing timeout based on the current customer message as they are speaking. Here's a breakdown of its properties: + +- **type**: A string that must be "customer". It indicates that the rule is based on the current customer message. +- **regex**: A string representing a regular expression pattern to match against the customer's message. +- **regexOptions**: An array of options for the regex match. Defaults to an empty array. +- **timeoutSeconds**: A number representing the endpointing timeout in seconds if the rule is matched. It must be between 0 and 15 seconds. + +#### Usage Flow +1. The assistant speaks. +2. The customer starts speaking. +3. The customer's transcription is received. +4. The rule is evaluated on the current customer transcription. +5. If the message matches the `regex`, the endpointing timeout is set to `timeoutSeconds`. + +#### Example Use Case +- If you want to wait longer while the customer is speaking numbers, you can set a longer timeout. + +This rule allows for dynamic adjustment of the endpointing timeout based on the content of the customer's message, providing flexibility in handling different types of customer responses. + #### BothCustomEndpointing Rule -#### Transcription Endpointing Plan + +This rule combines both assistant and customer speech patterns to determine the optimal moment for the assistant to begin speaking. It aims to create a more natural conversational flow. + +The `BothCustomEndpointingRule` is a JSON object that defines a rule for setting an endpointing timeout based on both the last assistant message and the current customer message as they are speaking. Here's a breakdown of its properties: + +- **type**: A string that must be "both". It indicates that the rule is based on both the last assistant message and the current customer message. +- **assistantRegex**: A string representing a regular expression pattern to match against the assistant's message. +- **assistantRegexOptions**: An array of options for the assistant's message regex match. Defaults to an empty array. +- **customerRegex**: A string representing a regular expression pattern to match against the customer's message. +- **customerRegexOptions**: An array of options for the customer's message regex match. Defaults to an empty array. +- **timeoutSeconds**: A number representing the endpointing timeout in seconds if the rule is matched. It must be between 0 and 15 seconds. + +##### Usage Flow +1. The assistant speaks. +2. The customer starts speaking. +3. The customer's transcription is received. +4. The rule is evaluated on both the last assistant message and the current customer transcription. +5. If the assistant message matches `assistantRegex` AND the customer message matches `customerRegex`, the endpointing timeout is set to `timeoutSeconds`. + +##### Example Use Case +- If you want to wait longer while the customer is speaking numbers, you can set a longer timeout. + +#### TranscriptionEndpointing Plan + +This plan provides detailed control over how the transcription affects the assistant's speaking behavior. It includes parameters such as: +- **onPunctuationSeconds**: Wait time after detecting punctuation in the transcription. +- **onNoPunctuationSeconds**: Wait time when no punctuation is detected. +- **onNumberSeconds**: Wait time after detecting a number in the transcription. ### Best Practices From dbabfe2fb32790bf131db780691bf897d866aeb8 Mon Sep 17 00:00:00 2001 From: Vernon Marshall Date: Tue, 7 Jan 2025 15:32:29 -0800 Subject: [PATCH 6/6] more instructions --- fern/customization/speech-configuration.mdx | 35 ++++++++++++++++----- 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/fern/customization/speech-configuration.mdx b/fern/customization/speech-configuration.mdx index 8ae17d5..d6a1555 100644 --- a/fern/customization/speech-configuration.mdx +++ b/fern/customization/speech-configuration.mdx @@ -244,16 +244,10 @@ This enhanced explanation provides concrete examples and clear descriptions of t - **Elderly or Speech-Impaired Users**: The endpointing rules can be customized to accommodate slower speech patterns or frequent pauses, ensuring the assistant doesn't interrupt prematurely. -AssistantCustomEndpointingRule -CustomerCustomEndpointingRule -BothCustomEndpointingRule -TranscriptionEndpointingPlan #### AssistantCustomEndpointing Rule - -This rule allows customization of when the assistant should start speaking based on its own speech patterns. It's part of the startSpeakingPlan configuration. - -AssistantCustomEndpointingRule is a JSON object that defines a rule for setting an endpointing timeout based on the last assistant message before the customer starts speaking. Here's a breakdown of its properties: + This rule allows customization of when the assistant should start speaking based on its own speech patterns. It's part of the startSpeakingPlan configuration. + AssistantCustomEndpointingRule is a JSON object that defines a rule for setting an endpointing timeout based on the last assistant message before the customer starts speaking. Here's a breakdown of its properties: - **type**: A string that must be "assistant". It indicates that the rule is based on the last assistant message. - **regex**: A string representing a regular expression pattern to match against the assistant's message. - **regexOptions**: An array of options for the regex match. Defaults to an empty array. @@ -318,6 +312,31 @@ The `BothCustomEndpointingRule` is a JSON object that defines a rule for setting #### TranscriptionEndpointing Plan + +{ + "TranscriptionEndpointingPlan": { + "type": "object", + "properties": { + "rules": { + "type": "array", + "items": { + "oneOf": [ + { "$ref": "#/components/schemas/AssistantCustomEndpointingRule" }, + { "$ref": "#/components/schemas/CustomerCustomEndpointingRule" }, + { "$ref": "#/components/schemas/BothCustomEndpointingRule" } + ] + } + } + } + } +} + +Properties +- **rules**: An array of endpointing rules. Each rule can be one of the following types: + - AssistantCustomEndpointingRule + - CustomerCustomEndpointingRule + - BothCustomEndpointingRule + This plan provides detailed control over how the transcription affects the assistant's speaking behavior. It includes parameters such as: - **onPunctuationSeconds**: Wait time after detecting punctuation in the transcription. - **onNoPunctuationSeconds**: Wait time when no punctuation is detected.