Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce easy api for starting tasks for remote participants #679

Merged
merged 15 commits into from
Aug 28, 2024

Conversation

keepingitneil
Copy link
Contributor

No description provided.

Copy link

changeset-bot bot commented Aug 27, 2024

🦋 Changeset detected

Latest commit: a95cd86

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
livekit-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the ergonomics of ctx.add_participant_task

@@ -0,0 +1,30 @@
# Participant Task Example

This example shows how to do things when participants joins. For example, a common use case is to fetch some external data based on the participant's attributes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo!

livekit-agents/livekit/agents/job.py Outdated Show resolved Hide resolved
livekit-agents/livekit/agents/job.py Outdated Show resolved Hide resolved
examples/participant/participant_task.py Outdated Show resolved Hide resolved
for filter, coro in self._participant_coro_lookup.items():
if filter(p):
task = asyncio.create_task(coro(p))
self._participant_tasks[p.identity] = task
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this confused me a bit in terms of the sequencing (i.e. intention is to run them serially vs override). Does it make sense to enforce serial dispatch for two unrelated callbacks?

imagine this use cases:

async def task1(p: rtc.RemoteParticipant):
    for frame in p.audio_stream:
       pass

async def task2(p: rtc.RemoteParticipant):
    for frame in p.video_stream:
       pass

ctx.add_participant_task(task1)
ctx.add_participant_task(task2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, I suppose we could actually key off of the (identity,callback) tuple and say that each different callback runs in parallel but only one of each per participant runs at a time.

@davidzhao
Copy link
Member

a couple of other thoughts/questions:

  1. should this be in rtc? it doesn't look like any job specific functions are used. i.e. why not room.add_participant_task?
  2. lifecycle-wise, how does it end when either local or remote participant becomes disconnected? is there an easy way for the user to tell? or maybe the answer is just to listen to participant_disconnected
  3. is there a way to limit only the first participant?

@keepingitneil
Copy link
Contributor Author

a couple of other thoughts/questions:

  1. should this be in rtc? it doesn't look like any job specific functions are used. i.e. why not room.add_participant_task?
  2. lifecycle-wise, how does it end when either local or remote participant becomes disconnected? is there an easy way for the user to tell? or maybe the answer is just to listen to participant_disconnected
  3. is there a way to limit only the first participant?
  1. I leaned toward agents because it has some overlap participant_(dis)connected callbacks but behaves differently. I worry it could be confused as two different ways to do the same thing (even though they are different). A very rough symmetry argument could be made with other client libs: this feels like convenience on top of the core client lib, similar to the hooks you'd find in components-js.

For 2. it's just up to the users to end the task. So, for example, a task iterating audio frames would simply exit when the participant disconnects.

  1. Hrmm no not really, do you think it's needed? The user would probably know and be able to filter out participants they don't want. Of course, they could also use a closure-scoped variable to keep track of it if they really only wanted the first participant without caring about any other filtering criteria.

@keepingitneil keepingitneil marked this pull request as ready for review August 28, 2024 20:38
logger.warning(
f"a participant has joined before a prior participant task matching the same identity has finished: '{p.identity}'"
)
task = asyncio.create_task(coro(p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, add a taskname

like "participant_entrypoint"


def add_participant_entrypoint(
self,
*,
Copy link
Member

@theomonnom theomonnom Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's allow to put directly the callback instead of using * ?

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@keepingitneil keepingitneil merged commit 186dd9d into main Aug 28, 2024
8 checks passed
@keepingitneil keepingitneil deleted the neil/part branch August 28, 2024 21:50
This was referenced Aug 28, 2024
donnyyung added a commit to okolabs/livekit-agents that referenced this pull request Sep 26, 2024
* Fix deepgram English check (livekit#625)

* Cartesia bump to 0.4.0 (livekit#624)

* Introduce manual package release (livekit#626)

* Use the correct working directory in the manual publish job (livekit#627)

* Modified RAG plugin (livekit#629)

Co-authored-by: Théo Monnom <[email protected]>

* Revert "nltk: fix broken punkt download" (livekit#630)

* Expose WorkerType explicitly (livekit#632)

* openai: allow sending user IDs (livekit#633)

* silero: fix vad padding & choppy audio  (livekit#631)

* ipc: use our own duplex instead of mp.Queue (livekit#634)

* llm: fix optional arguments & non-hashable list (livekit#637)

* Add agent_name to WorkerOptions (livekit#636)

* Support OpenAI Assistants API (livekit#601)

* voiceassistant: fix will_synthesize_assistant_reply race (livekit#638)

* silero: adjust vad activation threshold (livekit#639)

* Version Packages (livekit#615)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* voiceassistant: fix llm not having the full chat context on bad interruption timing (livekit#640)

* livekit-plugins-browser: handle mouse/keyboard inputs on devmode  (livekit#644)

* nltk: fix another semver break (livekit#647)

* livekit-plugins-browser: python API (livekit#645)

* Delete test.py (livekit#652)

* livekit-plugins-browser: prepare for release (livekit#653)

* Version Packages (livekit#641)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Revert "Version Packages" (livekit#659)

* fix release workflow (livekit#661)

* Version Packages (livekit#660)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add ServerMessage.termination handler (livekit#635)

Co-authored-by: Théo Monnom <[email protected]>

* Introduce anthropic plugin (livekit#655)

* fix uninitialized SpeechHandle error on interruption  (livekit#665)

* voiceassistant: avoid stacking assistant replies when allow_interruptions=False (livekit#667)

* fix: disconnect event may now have some arguments  (livekit#668)

* Anthropic requires the first message to be a non empty 'user' role (livekit#669)

* support clova speech (livekit#439)

* Updated readme with LLM options (livekit#671)

* Update README.md (livekit#666)

* plugins: add docstrings explaining API keys (livekit#672)

* Disable anthropic test due to 429s (livekit#675)

* Remove duplicate entry from plugin table (livekit#673)

* Version Packages (livekit#662)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* deepgram: switch the default model to phonecall (livekit#676)

* update livekit to 0.14.0 and await tracksubscribed (livekit#678)

* Fix Google STT exception when no valid speech is recognized (livekit#680)

* Introduce easy api for starting tasks for remote participants (livekit#679)

* examples: document how to log chats (livekit#685)

* Version Packages (livekit#677)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* voiceassistant: keep punctuations when sending agent transcription (livekit#648)

* Pass context into participant entrypoint (livekit#694)

* Version Packages (livekit#693)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update examples to use participant_entrypoint (livekit#695)

* voiceassistant: add VoiceAssistantState (livekit#654)

Co-authored-by: Théo Monnom <[email protected]>

* Fix anthropic package publishing (livekit#701)

* fix non pickleable log (livekit#691)

* Revert "Update examples to use participant_entrypoint" (livekit#702)

* google-tts: ignore wav header (livekit#703)

* fix examples (livekit#704)

* skip processing of choice.delta when it is None (livekit#705)

* delete duplicate code (livekit#707)

* voiceassistant: skip speech initialization if interrupted  (livekit#715)

* Ensure room.name is available before connection (livekit#716)

* Add deepseek LLMs at OpenAI plugin (livekit#714)

* add threaded job runners (livekit#684)

* voiceassistant: add before_tts_cb callback (livekit#706)

* voiceassistant: fix mark_audio_segment_end with no audio data (livekit#719)

* add JobContext.wait_for_participant (livekit#712)

* Enable Google TTS with application default credentials (livekit#721)

* improve gracefully_cancel logic (livekit#720)

* bump required livekit version to 0.15.2 (livekit#722)

* elevenlabs: expose enable_ssml_parsing (livekit#723)

* Version Packages (livekit#697)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* release anthropic (livekit#724)

* Version Packages (livekit#725)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update examples to use wait_for_participant (livekit#726)

Co-authored-by: Théo Monnom <[email protected]>

* Introduce function calling to OpenAI Assistants (livekit#710)

Co-authored-by: Théo Monnom <[email protected]>

* tts_forwarder: don't raise inside mark_{audio,text}_segment_end when nothing was pushed (livekit#730)

* Add Cerebras to OpenAI Plugin (livekit#731)

* Fixes to Anthropic Function Calling (livekit#708)

* ci: don't run tests on forks (livekit#739)

* Only send actual audio to Deepgram (livekit#738)

* Add support for cartesia voice control (livekit#740)

Co-authored-by: Théo Monnom <[email protected]>

* Version Packages (livekit#727)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Allow setting LLM temperature with VoiceAssistant (livekit#741)

* Update STT sample README (livekit#709)

* avoid returning tiny frames from TTS (livekit#747)

* run tests on main (and make skipping clearer) (livekit#748)

* voiceassistant: avoid tiny frames on playout (livekit#750)

* limit concurrent process init to 1 (livekit#751)

* windows: default to threaded executor & fix dev mode  (livekit#755)

* improve graceful shutdown  (livekit#756)

* better dev defaults (livekit#762)

* 11labs: send phoneme in one entire xml chunk (livekit#766)

* ipc: fix process not starting if num_idle_processes is zero (livekit#763)

* limit noisy logs & keep the root logger info (livekit#768)

* use os.exit to exit forcefully  (livekit#770)

* Fix Assistant API Vision Capabilities (livekit#771)

* voiceassistant: allow to cancel llm generation inside before_llm_cb (livekit#753)

* Remove useless logs (livekit#773)

* voiceassistant: expose min_endpointing_delay (livekit#752)

* Add typing-extensions as a dependency (livekit#778)

* rename voice_assistant.state to agent.state (livekit#772)

Co-authored-by: aoife cassidy <[email protected]>

* bump rtc (livekit#782)

* Version Packages (livekit#744)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* added livekit-plugins-playht text-to-speech (livekit#735)

* Fix function for OpenAI Assistants (livekit#784)

* fix the problem of infinite loop when agent speech is interrupted (livekit#790)

---------

Co-authored-by: David Zhao <[email protected]>
Co-authored-by: Neil Dwyer <[email protected]>
Co-authored-by: Alejandro Figar Gutierrez <[email protected]>
Co-authored-by: Théo Monnom <[email protected]>
Co-authored-by: Théo Monnom <[email protected]>
Co-authored-by: aoife cassidy <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: josephkieu <[email protected]>
Co-authored-by: Mehadi Hasan Menon <[email protected]>
Co-authored-by: lukasIO <[email protected]>
Co-authored-by: xsg22 <[email protected]>
Co-authored-by: Yuan He <[email protected]>
Co-authored-by: Ryan Sinnet <[email protected]>
Co-authored-by: Henry Tu <[email protected]>
Co-authored-by: Ben Cherry <[email protected]>
Co-authored-by: Jaydev <[email protected]>
Co-authored-by: Jax <[email protected]>
SuJingnan pushed a commit to SuJingnan/agents that referenced this pull request Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants