Skip to content

Commit

Permalink
feat: add e2e gen ai app starter pack multimodal live api pattern
Browse files Browse the repository at this point in the history
  • Loading branch information
eliasecchig committed Jan 8, 2025
1 parent 3d9976a commit c521f2a
Show file tree
Hide file tree
Showing 49 changed files with 27,921 additions and 0 deletions.
10 changes: 10 additions & 0 deletions .github/actions/spelling/allow.txt
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,7 @@ PYINK
Pakeman
Paquete
Parmar
Pastra
Pengyu
Persero
Phaidon
Expand Down Expand Up @@ -452,6 +453,7 @@ TPUs
TSLA
TSMC
TSNE
TTFB
TTFT
TTH
TTT
Expand Down Expand Up @@ -720,6 +722,7 @@ etils
eur
evals
evse
evt
expl
faiss
fastapi
Expand Down Expand Up @@ -857,6 +860,7 @@ kenleejr
keras
keychain
kfp
khz
kickstart
konnte
kotlin
Expand Down Expand Up @@ -922,6 +926,7 @@ mrag
mrr
mrtydi
msmarco
msr
multitool
mvn
mvnw
Expand Down Expand Up @@ -974,6 +979,7 @@ onesie
onesies
openai
openfda
opsz
osm
osx
outdir
Expand Down Expand Up @@ -1152,6 +1158,7 @@ timechart
tion
titlebar
tobytes
toolcall
toself
toset
tqdm
Expand Down Expand Up @@ -1203,12 +1210,14 @@ webcam
webclient
webpage
webpages
webfonts
webrtc
websites
weightage
welcom
werden
whatsapp
wght
wiffle
wikipedia
wil
Expand All @@ -1218,6 +1227,7 @@ wip
wishlist
womens
workarounds
worklets
wparam
wscore
wscores
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Multimodal Live Agent

This pattern showcases a real-time conversational RAG agent powered by Google Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses.

![live_api_diagram](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_diagram.png)

**Key components:**

- **Python Backend** (in `app/` folder): A production-ready server built with [FastAPI](https://fastapi.tiangolo.com/) and [google-genai](https://googleapis.github.io/python-genai/) that features:

- **Real-time bidirectional communication** via WebSockets between the frontend and Gemini model
- **Integrated tool calling** with vector database support for contextual document retrieval
- **Production-grade reliability** with retry logic and automatic reconnection capabilities
- **Deployment flexibility** supporting both AI Studio and Vertex AI endpoints
- **Feedback logging endpoint** for collecting user interactions

- **React Frontend** (in `frontend/` folder): Extends the [Multimodal live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console), with added features like **custom URLs** and **feedback collection**.

![live api demo](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_pattern_demo.gif)

## Usage

You can use this pattern in two ways:

1. As a standalone template for rapid prototyping (⚡ 1 minute setup!)
2. As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests.

### Standalone Usage

#### Prerequisites

Before you begin, ensure you have the following installed: [Python 3.10+](https://www.python.org/downloads/), [Poetry](https://python-poetry.org/docs/#installation), [Node.js](https://nodejs.org/) (including npm), [Google Cloud SDK](https://cloud.google.com/sdk/docs/install)

#### Download the Pattern

Download the Multimodal Live Agent pattern using `gsutil` CLI:

```bash
gsutil cp gs://e2e-gen-ai-app-starter-pack/multimodal-live-agent.zip . && unzip multimodal-live-agent.zip && cd multimodal-live-agent
```

#### Backend Setup

1. **Set your default Google Cloud project and region:**

```bash
export PROJECT_ID="your-gcp-project"

gcloud auth login --update-adc
gcloud config set project $PROJECT_ID
gcloud auth application-default set-quota-project $PROJECT_ID
```

<details>
<summary><b>For AI Studio setup:</b></summary>

```bash
export VERTEXAI=false
export GOOGLE_API_KEY=your-google-api-key
```

</details>

2. **Install Dependencies:**

Install the required Python packages using Poetry:

```bash
poetry install
```

3. **Run the Backend Server:**

Start the FastAPI server:

```bash
poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload
```

#### Frontend Setup

1. **Install Dependencies:**

In a separate terminal, install the required Node.js packages for the frontend:

```bash
npm --prefix frontend install
```

2. **Start the Frontend:**

Launch the React development server:

```bash
npm --prefix frontend start
```

This command starts the frontend application, accessible at `http://localhost:3000`.

#### Interact with the Agent

Once both the backend and frontend are running, click the play button in the frontend UI to establish a connection with the backend. You can now interact with the Multimodal Live Agent! You can try asking questions such as "Using the tool you have, define Governance in the context MLOPs" to allow the agent to use the [documentation](https://cloud.google.com/architecture/deploy-operate-generative-ai-applications) it was provided to.

Check warning on line 102 in gemini/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`MLOPs` is not a recognized word. (unrecognized-spelling)

#### Remote deployment in Cloud Run

You can quickly test the application in [Cloud Run](https://cloud.google.com/run). Ensure your service account has the `roles/aiplatform.user` role to access Gemini.

1. **Deploy:**

```bash
export REGION="your-gcp-region"

gcloud run deploy genai-app-sample \
--source . \
--project $PROJECT_ID \
--memory "4Gi" \
--region $REGION
```

2. **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`:

```bash
gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION
```

You can then use the same frontend setup described above to interact with your Cloud Run deployment.

### Integrating with the Starter Pack

This pattern is designed for seamless integration with the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack). The starter pack offers a streamlined approach to setting up and deploying multimodal live agents, complete with robust infrastructure and CI/CD pipelines.

### Getting Started

1. **Download the Starter Pack:**

Obtain the starter pack using the following command:

```bash
gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack
```

2. **Prepare the Pattern:**

Run the provided script to prepare the multimodal live agent pattern:

```bash
python app/patterns/multimodal_live_agent/utils/prepare_pattern.py
```

The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`.

3. **Set up CI/CD:**

Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines.

#### Current Limitations and Future Enhancements

We are actively developing and improving this pattern. Currently, the following limitations are known:

- **Observability:** Comprehensive observability features are not yet fully implemented.
- **Load Testing:** Load testing capabilities are not included in this version.

## Your Feedback Matters

We highly value your feedback and encourage you to share your thoughts and suggestions. Your input helps us prioritize new features and enhancements. Please reach out to us at <a href="mailto:[email protected]">[email protected]</a> to let us know what features you'd like to see implemented or any other feedback you may have.

## Additional Resources for Multimodal Live API

Explore these resources to learn more about the Multimodal Live API and see examples of its usage:

- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for the Gemini Multimodal Live API.
- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex AI
- [Gemini 2 Cookbook](https://github.com/google-gemini/cookbook/tree/main/gemini-2): Practical examples and tutorials for working with Gemini 2
- [Multimodal Live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console): Interactive React-based web interface for testing and experimenting with Gemini Multimodal Live API.
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
from typing import Dict

import google
import vertexai
from google import genai
from google.genai.types import LiveConnectConfig, Content, FunctionDeclaration, Tool
from langchain_google_vertexai import VertexAIEmbeddings

from app.templates import SYSTEM_INSTRUCTION, FORMAT_DOCS
from app.vector_store import get_vector_store

# Constants
VERTEXAI = os.getenv("VERTEXAI", "true").lower() == "true"
LOCATION = "us-central1"
EMBEDDING_MODEL = "text-embedding-004"
MODEL_ID = "gemini-2.0-flash-exp"
URLS = [
"https://cloud.google.com/architecture/deploy-operate-generative-ai-applications"
]

# Initialize Google Cloud clients
credentials, project_id = google.auth.default()
vertexai.init(project=project_id, location=LOCATION)


if VERTEXAI:
genai_client = genai.Client(project=project_id, location=LOCATION, vertexai=True)
else:
# API key should be set using GOOGLE_API_KEY environment variable
genai_client = genai.Client(http_options={"api_version": "v1alpha"})

# Initialize vector store and retriever
embedding = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)
vector_store = get_vector_store(embedding=embedding, urls=URLS)
retriever = vector_store.as_retriever()


def retrieve_docs(query: str) -> Dict[str, str]:
"""
Retrieves pre-formatted documents about MLOps (Machine Learning Operations),
Gen AI lifecycle, and production deployment best practices.
Args:
query: Search query string related to MLOps, Gen AI, or production deployment.
Returns:
A set of relevant, pre-formatted documents.
"""
docs = retriever.invoke(query)
formatted_docs = FORMAT_DOCS.format(docs=docs)
return {"output": formatted_docs}


# Configure tools and live connection
retrieve_docs_tool = Tool(
function_declarations=[
FunctionDeclaration.from_function(client=genai_client, func=retrieve_docs)
]
)

tool_functions = {"retrieve_docs": retrieve_docs}

live_connect_config = LiveConnectConfig(
response_modalities=["AUDIO"],
tools=[retrieve_docs_tool],
system_instruction=Content(parts=[{"text": SYSTEM_INSTRUCTION}]),
)
Loading

0 comments on commit c521f2a

Please sign in to comment.