Skip to content

Commit

Permalink
0.8 migration guide (livekit#572)
Browse files Browse the repository at this point in the history
  • Loading branch information
davidzhao authored Aug 3, 2024
1 parent 297db92 commit 8a4aed1
Show file tree
Hide file tree
Showing 2 changed files with 197 additions and 2 deletions.
186 changes: 186 additions & 0 deletions 0.8-migration-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Migrating to 0.8.x

v0.8 is a major release of the framework, featuring significant reliability improvements to VoiceAssistant. This update includes a few breaking API changes that will impact the way you build your agents. We strive to minimize breaking changes and will stabilize the API as we approach version 1.0.

## Job and Worker API

### Specifying your entrypoint function

`entrypoint_fnc` is now a parameter in WorkerOptions. Previously, you were required to explicitly accept the job.

### Namespace had been removed

We've removed the namespace option in order to simplify the registration process. In future versions, it'll be possible to provide an explicit `agent_name` to spin up multiple kinds of agents for each room.

### Connecting to room is explicit

You now need to call ctx.connect() to initiate the connection to the room. This allows for pre-connect setup (such as callback registrations) to avoid race conditions.

### Example

The above changes are reflected in the following minimal example:

```python
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli

async def job_entrypoint(ctx: JobContext):
await ctx.connect()
# your logic here
...

if __name__ == "__main__":
cli.run_app(
WorkerOptions(entrypoint_fnc=job_entrypoint)
)
```

## VoiceAssistant

VoiceAssistant API remains mostly unchanged, despite significant improvements to functionality and internals. However, there have been changes to the configuration.

### Initialization args

- Removed
- base_volume
- debug
- sentence_tokenizer, word_tokenizer, hyphenate_word
- Changed
- transcription related options are grouped within `transcription` param

```python
class VoiceAssistant(utils.EventEmitter[EventTypes]):
def __init__(
self,
*,
vad: vad.VAD,
stt: stt.STT,
llm: LLM,
tts: tts.TTS,
chat_ctx: ChatContext | None = None,
fnc_ctx: FunctionContext | None = None,
allow_interruptions: bool = True,
interrupt_speech_duration: float = 0.6,
interrupt_min_words: int = 0,
preemptive_synthesis: bool = True,
transcription: AssistantTranscriptionOptions = AssistantTranscriptionOptions(),
will_synthesize_assistant_reply: WillSynthesizeAssistantReply = _default_will_synthesize_assistant_reply,
plotting: bool = False,
loop: asyncio.AbstractEventLoop | None = None,
) -> None:
...
```

## LLM

The LLM class has been restructured to enhance ergonomics and improve the function calling support.

### Function/tool calling

Function calling has gotten a complete overhaul in v0.8.0. The primary breaking change is that function calls are now NOT automatically invoked when iterating the LLM stream. `LLMStream.execute_functions` needs to be called instead. (VoiceAssistant handles this automatically)

### LLM.chat is no longer an async method

Previously, LLM.chat() was an async method that returned an LLMStream (which itself was an AsyncIterable).

We found it easier and less-confusing for LLM.chat() to be synchronous, while still returning the same AsyncIterable LLMStream.

### LLM.chat `history` has been renamed to `chat_ctx`

In order to improve consistency and reduce confusion.

```python
chat_ctx = llm.ChatContext()
chat_ctx.append(role="user", text="user message")
stream = llm_plugin.chat(chat_ctx=chat_ctx)
```

## STT

### SpeechStream.flush

Previously, to communicate to a STT provider that you have sent enough input to generate a response - you could push_frame(None) to coax the TTS into synthesizing a response.

In v0.8.0 that API has been removed and replaced with flush()

### SpeechStream.end_input

`end_input` signals to the STT provider that the input is complete and no additional input will follow. Previously, this was done using aclose(wait=True).

### SpeechStream.aclose

The `wait` arg of aclose has been removed in favor of SpeechStream.end_input (see above). Now, if you call `TTS.aclose()` without first calling STT.end_input, the behavior will be that the request is cancelled.

```python
stt_stream = my_stt_instance.stream()
async for ev in audio_stream:
stt_stream.push_frame(ev.frame)
# optionally flush when enough frames have been pushed
stt_stream.flush()

stt_stream.end_input()
await stt_stream.aclose()
```

## TTS

### SynthesizedAudio changed and SynthesisEvent removed

SynthesizedAudio dataclass had gone through a major change

```python
# New SynthesizedAudio dataclass
@dataclass
class SynthesizedAudio:
request_id: str
"""Request ID (one segment could be made up of multiple requests)"""
segment_id: str
"""Segment ID, each segment is separated by a flush"""
frame: rtc.AudioFrame
"""Synthesized audio frame"""
delta_text: str = ""
"""Current segment of the synthesized audio"""

#Old SynthesizedAudio dataclass
@dataclass
class SynthesizedAudio:
text: str
data: rtc.AudioFrame
```

The SynthesisEvent has been removed entirely. All occurrences of it have been replaced with SynthesizedAudio

### SynthesizeStream.flush

Similar to the STT changes, this coaxes the TTS provider into generating a response. The SynthesizedAudio response will have a new segment_id after calls to flush().

### SynthesizeStream.end_input

Similar to the STT changes, this replaces aclose(wait=True).

### SynthesizeStream.aclose

Similar to the STT changes, the wait arg has been removed.

```python
tts_stream = my_tts_instance.stream()
tts_stream.push_text("This is the first sentence")
tts_stream.flush()
tts_stream.push_text("This is the second sentence")
tts_stream.end_input()
await tts_stream.aclose()
```

## VAD

The same changes made to STT and TTS have also been made to VAD

```python
vad_stream = my_vad_instance.stream()
async for ev in audio_stream:
vad_stream.push_frame(ev.frame)
# optionally flush when enough frames have been pushed
vad_stream.flush()

vad_stream.end_input()
await vad_stream.aclose()
```
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ audio, video, and data streams.

The framework includes plugins for common workflows, such as voice activity detection and speech-to-text.

Agents integrates seamlessly with [LiveKit server](https://github.com/livekit/livekit), offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.
Agents integrates seamlessly with Cloud or self-hosted [LiveKit](https://livekit.io/) server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.

> This SDK is currently in Developer Preview. During this period, you may encounter bugs and the APIs may change.
>
Expand All @@ -33,6 +33,9 @@ Agents integrates seamlessly with [LiveKit server](https://github.com/livekit/li
- [Working with plugins](https://docs.livekit.io/agents/plugins)
- [Deploying agents](https://docs.livekit.io/agents/deployment)

> [!NOTE]
> There are breaking API changes between versions 0.7.x and 0.8.x. Please refer to the [0.8 migration guide](0.8-migration-guide.md) for a detailed overview of the changes.
## Examples

- [Voice assistant](https://github.com/livekit/agents/tree/main/examples/voice-assistant): A voice assistant with STT, LLM, and TTS. [Demo](https://kitt.livekit.io)
Expand Down Expand Up @@ -81,14 +84,20 @@ The framework exposes a CLI interface to run your agent. To get started, you'll
- LIVEKIT_API_KEY
- LIVEKIT_API_SECRET

### Running the worker
### Starting the worker

This will start the worker and wait for users to connect to your LiveKit server:

```bash
python my_agent.py start
```

To run the worker in dev-mode (with hot code reloading), you can use the dev command:

```bash
python my_agent.py dev
```

### Using playground for your agent UI

To ease the process of building and testing an agent, we've developed a versatile web frontend called "playground". You can use or modify this app to suit your specific requirements. It can also serve as a starting point for a completely custom agent application.
Expand Down

0 comments on commit 8a4aed1

Please sign in to comment.