feat: add e2e gen ai app starter pack multimodal live api pattern

GoogleCloudPlatform · Jan 8, 2025 · c521f2a · c521f2a
1 parent 3d9976a
commit c521f2a
Show file tree

Hide file tree

Showing 49 changed files with 27,921 additions and 0 deletions.
diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt
@@ -353,6 +353,7 @@ PYINK
 Pakeman
 Paquete
 Parmar
+Pastra
 Pengyu
 Persero
 Phaidon
@@ -452,6 +453,7 @@ TPUs
 TSLA
 TSMC
 TSNE
+TTFB
 TTFT
 TTH
 TTT
@@ -720,6 +722,7 @@ etils
 eur
 evals
 evse
+evt
 expl
 faiss
 fastapi
@@ -857,6 +860,7 @@ kenleejr
 keras
 keychain
 kfp
+khz
 kickstart
 konnte
 kotlin
@@ -922,6 +926,7 @@ mrag
 mrr
 mrtydi
 msmarco
+msr
 multitool
 mvn
 mvnw
@@ -974,6 +979,7 @@ onesie
 onesies
 openai
 openfda
+opsz
 osm
 osx
 outdir
@@ -1152,6 +1158,7 @@ timechart
 tion
 titlebar
 tobytes
+toolcall
 toself
 toset
 tqdm
@@ -1203,12 +1210,14 @@ webcam
 webclient
 webpage
 webpages
+webfonts
 webrtc
 websites
 weightage
 welcom
 werden
 whatsapp
+wght
 wiffle
 wikipedia
 wil
@@ -1218,6 +1227,7 @@ wip
 wishlist
 womens
 workarounds
+worklets
 wparam
 wscore
 wscores

diff --git a/...e-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md b/...e-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md
@@ -0,0 +1,174 @@
+# Multimodal Live Agent
+
+This pattern showcases a real-time conversational RAG agent powered by Google Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses.
+
+![live_api_diagram](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_diagram.png)
+
+**Key components:**
+
+- **Python Backend** (in `app/` folder): A production-ready server built with [FastAPI](https://fastapi.tiangolo.com/) and [google-genai](https://googleapis.github.io/python-genai/) that features:
+
+  - **Real-time bidirectional communication** via WebSockets between the frontend and Gemini model
+  - **Integrated tool calling** with vector database support for contextual document retrieval
+  - **Production-grade reliability** with retry logic and automatic reconnection capabilities
+  - **Deployment flexibility** supporting both AI Studio and Vertex AI endpoints
+  - **Feedback logging endpoint** for collecting user interactions
+
+- **React Frontend** (in `frontend/` folder): Extends the [Multimodal live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console), with added features like **custom URLs** and **feedback collection**.
+
+![live api demo](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_pattern_demo.gif)
+
+## Usage
+
+You can use this pattern in two ways:
+
+1. As a standalone template for rapid prototyping (⚡ 1 minute setup!)
+2. As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests.
+
+### Standalone Usage
+
+#### Prerequisites
+
+Before you begin, ensure you have the following installed: [Python 3.10+](https://www.python.org/downloads/), [Poetry](https://python-poetry.org/docs/#installation), [Node.js](https://nodejs.org/) (including npm), [Google Cloud SDK](https://cloud.google.com/sdk/docs/install)
+
+#### Download the Pattern
+
+Download the Multimodal Live Agent pattern using `gsutil` CLI:
+
+```bash
+gsutil cp gs://e2e-gen-ai-app-starter-pack/multimodal-live-agent.zip . && unzip multimodal-live-agent.zip && cd multimodal-live-agent
+```
+
+#### Backend Setup
+
+1. **Set your default Google Cloud project and region:**
+
+   ```bash
+   export PROJECT_ID="your-gcp-project"
+
+   gcloud auth login --update-adc
+   gcloud config set project $PROJECT_ID
+   gcloud auth application-default set-quota-project $PROJECT_ID
+   ```
+
+   <details>
+   <summary><b>For AI Studio setup:</b></summary>
+
+   ```bash
+   export VERTEXAI=false
+   export GOOGLE_API_KEY=your-google-api-key
+   ```
+
+   </details>
+
+2. **Install Dependencies:**
+
+   Install the required Python packages using Poetry:
+
+   ```bash
+   poetry install
+   ```
+
+3. **Run the Backend Server:**
+
+   Start the FastAPI server:
+
+   ```bash
+   poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload
+   ```
+
+#### Frontend Setup
+
+1. **Install Dependencies:**
+
+   In a separate terminal, install the required Node.js packages for the frontend:
+
+   ```bash
+   npm --prefix frontend install
+   ```
+
+2. **Start the Frontend:**
+
+   Launch the React development server:
+
+   ```bash
+   npm --prefix frontend start
+   ```
+
+   This command starts the frontend application, accessible at `http://localhost:3000`.
+
+#### Interact with the Agent
+
+Once both the backend and frontend are running, click the play button in the frontend UI to establish a connection with the backend. You can now interact with the Multimodal Live Agent! You can try asking questions such as "Using the tool you have, define Governance in the context MLOPs" to allow the agent to use the [documentation](https://cloud.google.com/architecture/deploy-operate-generative-ai-applications) it was provided to.
+
+#### Remote deployment in Cloud Run
+
+You can quickly test the application in [Cloud Run](https://cloud.google.com/run). Ensure your service account has the `roles/aiplatform.user` role to access Gemini.
+
+1. **Deploy:**
+
+   ```bash
+   export REGION="your-gcp-region"
+
+   gcloud run deploy genai-app-sample \
+     --source . \
+     --project $PROJECT_ID \
+     --memory "4Gi" \
+     --region $REGION
+   ```
+
+2. **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`:
+
+   ```bash
+   gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION
+   ```
+
+   You can then use the same frontend setup described above to interact with your Cloud Run deployment.
+
+### Integrating with the Starter Pack
+
+This pattern is designed for seamless integration with the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack). The starter pack offers a streamlined approach to setting up and deploying multimodal live agents, complete with robust infrastructure and CI/CD pipelines.
+
+### Getting Started
+
+1. **Download the Starter Pack:**
+
+   Obtain the starter pack using the following command:
+
+   ```bash
+   gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack
+   ```
+
+2. **Prepare the Pattern:**
+
+   Run the provided script to prepare the multimodal live agent pattern:
+
+   ```bash
+   python app/patterns/multimodal_live_agent/utils/prepare_pattern.py
+   ```
+
+   The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`.
+
+3. **Set up CI/CD:**
+
+   Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines.
+
+#### Current Limitations and Future Enhancements
+
+We are actively developing and improving this pattern. Currently, the following limitations are known:
+
+- **Observability:** Comprehensive observability features are not yet fully implemented.
+- **Load Testing:** Load testing capabilities are not included in this version.
+
+## Your Feedback Matters
+
+We highly value your feedback and encourage you to share your thoughts and suggestions. Your input helps us prioritize new features and enhancements. Please reach out to us at <a href="mailto:[email protected]">[email protected]</a> to let us know what features you'd like to see implemented or any other feedback you may have.
+
+## Additional Resources for Multimodal Live API
+
+Explore these resources to learn more about the Multimodal Live API and see examples of its usage:
+
+- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for the Gemini Multimodal Live API.
+- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex AI
+- [Gemini 2 Cookbook](https://github.com/google-gemini/cookbook/tree/main/gemini-2): Practical examples and tutorials for working with Gemini 2
+- [Multimodal Live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console): Interactive React-based web interface for testing and experimenting with Gemini Multimodal Live API.
diff --git a/...i/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py b/...i/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py
@@ -0,0 +1,82 @@
+# Copyright 2024 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import Dict
+
+import google
+import vertexai
+from google import genai
+from google.genai.types import LiveConnectConfig, Content, FunctionDeclaration, Tool
+from langchain_google_vertexai import VertexAIEmbeddings
+
+from app.templates import SYSTEM_INSTRUCTION, FORMAT_DOCS
+from app.vector_store import get_vector_store
+
+# Constants
+VERTEXAI = os.getenv("VERTEXAI", "true").lower() == "true"
+LOCATION = "us-central1"
+EMBEDDING_MODEL = "text-embedding-004"
+MODEL_ID = "gemini-2.0-flash-exp"
+URLS = [
+    "https://cloud.google.com/architecture/deploy-operate-generative-ai-applications"
+]
+
+# Initialize Google Cloud clients
+credentials, project_id = google.auth.default()
+vertexai.init(project=project_id, location=LOCATION)
+
+
+if VERTEXAI:
+    genai_client = genai.Client(project=project_id, location=LOCATION, vertexai=True)
+else:
+    # API key should be set using GOOGLE_API_KEY environment variable
+    genai_client = genai.Client(http_options={"api_version": "v1alpha"})
+
+# Initialize vector store and retriever
+embedding = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)
+vector_store = get_vector_store(embedding=embedding, urls=URLS)
+retriever = vector_store.as_retriever()
+
+
+def retrieve_docs(query: str) -> Dict[str, str]:
+    """
+    Retrieves pre-formatted documents about MLOps (Machine Learning Operations),
+      Gen AI lifecycle, and production deployment best practices.
+
+    Args:
+        query: Search query string related to MLOps, Gen AI, or production deployment.
+
+    Returns:
+        A set of relevant, pre-formatted documents.
+    """
+    docs = retriever.invoke(query)
+    formatted_docs = FORMAT_DOCS.format(docs=docs)
+    return {"output": formatted_docs}
+
+
+# Configure tools and live connection
+retrieve_docs_tool = Tool(
+    function_declarations=[
+        FunctionDeclaration.from_function(client=genai_client, func=retrieve_docs)
+    ]
+)
+
+tool_functions = {"retrieve_docs": retrieve_docs}
+
+live_connect_config = LiveConnectConfig(
+    response_modalities=["AUDIO"],
+    tools=[retrieve_docs_tool],
+    system_instruction=Content(parts=[{"text": SYSTEM_INSTRUCTION}]),
+)