forked from livekit/agents
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
197 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
# Migrating to 0.8.x | ||
|
||
v0.8 is a major release of the framework, featuring significant reliability improvements to VoiceAssistant. This update includes a few breaking API changes that will impact the way you build your agents. We strive to minimize breaking changes and will stabilize the API as we approach version 1.0. | ||
|
||
## Job and Worker API | ||
|
||
### Specifying your entrypoint function | ||
|
||
`entrypoint_fnc` is now a parameter in WorkerOptions. Previously, you were required to explicitly accept the job. | ||
|
||
### Namespace had been removed | ||
|
||
We've removed the namespace option in order to simplify the registration process. In future versions, it'll be possible to provide an explicit `agent_name` to spin up multiple kinds of agents for each room. | ||
|
||
### Connecting to room is explicit | ||
|
||
You now need to call ctx.connect() to initiate the connection to the room. This allows for pre-connect setup (such as callback registrations) to avoid race conditions. | ||
|
||
### Example | ||
|
||
The above changes are reflected in the following minimal example: | ||
|
||
```python | ||
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli | ||
|
||
async def job_entrypoint(ctx: JobContext): | ||
await ctx.connect() | ||
# your logic here | ||
... | ||
|
||
if __name__ == "__main__": | ||
cli.run_app( | ||
WorkerOptions(entrypoint_fnc=job_entrypoint) | ||
) | ||
``` | ||
|
||
## VoiceAssistant | ||
|
||
VoiceAssistant API remains mostly unchanged, despite significant improvements to functionality and internals. However, there have been changes to the configuration. | ||
|
||
### Initialization args | ||
|
||
- Removed | ||
- base_volume | ||
- debug | ||
- sentence_tokenizer, word_tokenizer, hyphenate_word | ||
- Changed | ||
- transcription related options are grouped within `transcription` param | ||
|
||
```python | ||
class VoiceAssistant(utils.EventEmitter[EventTypes]): | ||
def __init__( | ||
self, | ||
*, | ||
vad: vad.VAD, | ||
stt: stt.STT, | ||
llm: LLM, | ||
tts: tts.TTS, | ||
chat_ctx: ChatContext | None = None, | ||
fnc_ctx: FunctionContext | None = None, | ||
allow_interruptions: bool = True, | ||
interrupt_speech_duration: float = 0.6, | ||
interrupt_min_words: int = 0, | ||
preemptive_synthesis: bool = True, | ||
transcription: AssistantTranscriptionOptions = AssistantTranscriptionOptions(), | ||
will_synthesize_assistant_reply: WillSynthesizeAssistantReply = _default_will_synthesize_assistant_reply, | ||
plotting: bool = False, | ||
loop: asyncio.AbstractEventLoop | None = None, | ||
) -> None: | ||
... | ||
``` | ||
|
||
## LLM | ||
|
||
The LLM class has been restructured to enhance ergonomics and improve the function calling support. | ||
|
||
### Function/tool calling | ||
|
||
Function calling has gotten a complete overhaul in v0.8.0. The primary breaking change is that function calls are now NOT automatically invoked when iterating the LLM stream. `LLMStream.execute_functions` needs to be called instead. (VoiceAssistant handles this automatically) | ||
|
||
### LLM.chat is no longer an async method | ||
|
||
Previously, LLM.chat() was an async method that returned an LLMStream (which itself was an AsyncIterable). | ||
|
||
We found it easier and less-confusing for LLM.chat() to be synchronous, while still returning the same AsyncIterable LLMStream. | ||
|
||
### LLM.chat `history` has been renamed to `chat_ctx` | ||
|
||
In order to improve consistency and reduce confusion. | ||
|
||
```python | ||
chat_ctx = llm.ChatContext() | ||
chat_ctx.append(role="user", text="user message") | ||
stream = llm_plugin.chat(chat_ctx=chat_ctx) | ||
``` | ||
|
||
## STT | ||
|
||
### SpeechStream.flush | ||
|
||
Previously, to communicate to a STT provider that you have sent enough input to generate a response - you could push_frame(None) to coax the TTS into synthesizing a response. | ||
|
||
In v0.8.0 that API has been removed and replaced with flush() | ||
|
||
### SpeechStream.end_input | ||
|
||
`end_input` signals to the STT provider that the input is complete and no additional input will follow. Previously, this was done using aclose(wait=True). | ||
|
||
### SpeechStream.aclose | ||
|
||
The `wait` arg of aclose has been removed in favor of SpeechStream.end_input (see above). Now, if you call `TTS.aclose()` without first calling STT.end_input, the behavior will be that the request is cancelled. | ||
|
||
```python | ||
stt_stream = my_stt_instance.stream() | ||
async for ev in audio_stream: | ||
stt_stream.push_frame(ev.frame) | ||
# optionally flush when enough frames have been pushed | ||
stt_stream.flush() | ||
|
||
stt_stream.end_input() | ||
await stt_stream.aclose() | ||
``` | ||
|
||
## TTS | ||
|
||
### SynthesizedAudio changed and SynthesisEvent removed | ||
|
||
SynthesizedAudio dataclass had gone through a major change | ||
|
||
```python | ||
# New SynthesizedAudio dataclass | ||
@dataclass | ||
class SynthesizedAudio: | ||
request_id: str | ||
"""Request ID (one segment could be made up of multiple requests)""" | ||
segment_id: str | ||
"""Segment ID, each segment is separated by a flush""" | ||
frame: rtc.AudioFrame | ||
"""Synthesized audio frame""" | ||
delta_text: str = "" | ||
"""Current segment of the synthesized audio""" | ||
|
||
#Old SynthesizedAudio dataclass | ||
@dataclass | ||
class SynthesizedAudio: | ||
text: str | ||
data: rtc.AudioFrame | ||
``` | ||
|
||
The SynthesisEvent has been removed entirely. All occurrences of it have been replaced with SynthesizedAudio | ||
|
||
### SynthesizeStream.flush | ||
|
||
Similar to the STT changes, this coaxes the TTS provider into generating a response. The SynthesizedAudio response will have a new segment_id after calls to flush(). | ||
|
||
### SynthesizeStream.end_input | ||
|
||
Similar to the STT changes, this replaces aclose(wait=True). | ||
|
||
### SynthesizeStream.aclose | ||
|
||
Similar to the STT changes, the wait arg has been removed. | ||
|
||
```python | ||
tts_stream = my_tts_instance.stream() | ||
tts_stream.push_text("This is the first sentence") | ||
tts_stream.flush() | ||
tts_stream.push_text("This is the second sentence") | ||
tts_stream.end_input() | ||
await tts_stream.aclose() | ||
``` | ||
|
||
## VAD | ||
|
||
The same changes made to STT and TTS have also been made to VAD | ||
|
||
```python | ||
vad_stream = my_vad_instance.stream() | ||
async for ev in audio_stream: | ||
vad_stream.push_frame(ev.frame) | ||
# optionally flush when enough frames have been pushed | ||
vad_stream.flush() | ||
|
||
vad_stream.end_input() | ||
await vad_stream.aclose() | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters