-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add e2e gen ai app starter pack multimodal live api pattern
- Loading branch information
1 parent
3d9976a
commit c521f2a
Showing
49 changed files
with
27,921 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
174 changes: 174 additions & 0 deletions
174
...e-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
# Multimodal Live Agent | ||
|
||
This pattern showcases a real-time conversational RAG agent powered by Google Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses. | ||
|
||
![live_api_diagram](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_diagram.png) | ||
|
||
**Key components:** | ||
|
||
- **Python Backend** (in `app/` folder): A production-ready server built with [FastAPI](https://fastapi.tiangolo.com/) and [google-genai](https://googleapis.github.io/python-genai/) that features: | ||
|
||
- **Real-time bidirectional communication** via WebSockets between the frontend and Gemini model | ||
- **Integrated tool calling** with vector database support for contextual document retrieval | ||
- **Production-grade reliability** with retry logic and automatic reconnection capabilities | ||
- **Deployment flexibility** supporting both AI Studio and Vertex AI endpoints | ||
- **Feedback logging endpoint** for collecting user interactions | ||
|
||
- **React Frontend** (in `frontend/` folder): Extends the [Multimodal live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console), with added features like **custom URLs** and **feedback collection**. | ||
|
||
![live api demo](https://storage.googleapis.com/github-repo/generative-ai/sample-apps/e2e-gen-ai-app-starter-pack/live_api_pattern_demo.gif) | ||
|
||
## Usage | ||
|
||
You can use this pattern in two ways: | ||
|
||
1. As a standalone template for rapid prototyping (⚡ 1 minute setup!) | ||
2. As part of the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack) for production deployment with Terraform and CI/CD. The pattern comes with comprehensive unit and integration tests. | ||
|
||
### Standalone Usage | ||
|
||
#### Prerequisites | ||
|
||
Before you begin, ensure you have the following installed: [Python 3.10+](https://www.python.org/downloads/), [Poetry](https://python-poetry.org/docs/#installation), [Node.js](https://nodejs.org/) (including npm), [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) | ||
|
||
#### Download the Pattern | ||
|
||
Download the Multimodal Live Agent pattern using `gsutil` CLI: | ||
|
||
```bash | ||
gsutil cp gs://e2e-gen-ai-app-starter-pack/multimodal-live-agent.zip . && unzip multimodal-live-agent.zip && cd multimodal-live-agent | ||
``` | ||
|
||
#### Backend Setup | ||
|
||
1. **Set your default Google Cloud project and region:** | ||
|
||
```bash | ||
export PROJECT_ID="your-gcp-project" | ||
|
||
gcloud auth login --update-adc | ||
gcloud config set project $PROJECT_ID | ||
gcloud auth application-default set-quota-project $PROJECT_ID | ||
``` | ||
|
||
<details> | ||
<summary><b>For AI Studio setup:</b></summary> | ||
|
||
```bash | ||
export VERTEXAI=false | ||
export GOOGLE_API_KEY=your-google-api-key | ||
``` | ||
|
||
</details> | ||
|
||
2. **Install Dependencies:** | ||
|
||
Install the required Python packages using Poetry: | ||
|
||
```bash | ||
poetry install | ||
``` | ||
|
||
3. **Run the Backend Server:** | ||
|
||
Start the FastAPI server: | ||
|
||
```bash | ||
poetry run uvicorn app.server:app --host 0.0.0.0 --port 8000 --reload | ||
``` | ||
|
||
#### Frontend Setup | ||
|
||
1. **Install Dependencies:** | ||
|
||
In a separate terminal, install the required Node.js packages for the frontend: | ||
|
||
```bash | ||
npm --prefix frontend install | ||
``` | ||
|
||
2. **Start the Frontend:** | ||
|
||
Launch the React development server: | ||
|
||
```bash | ||
npm --prefix frontend start | ||
``` | ||
|
||
This command starts the frontend application, accessible at `http://localhost:3000`. | ||
|
||
#### Interact with the Agent | ||
|
||
Once both the backend and frontend are running, click the play button in the frontend UI to establish a connection with the backend. You can now interact with the Multimodal Live Agent! You can try asking questions such as "Using the tool you have, define Governance in the context MLOPs" to allow the agent to use the [documentation](https://cloud.google.com/architecture/deploy-operate-generative-ai-applications) it was provided to. | ||
|
||
#### Remote deployment in Cloud Run | ||
|
||
You can quickly test the application in [Cloud Run](https://cloud.google.com/run). Ensure your service account has the `roles/aiplatform.user` role to access Gemini. | ||
|
||
1. **Deploy:** | ||
|
||
```bash | ||
export REGION="your-gcp-region" | ||
|
||
gcloud run deploy genai-app-sample \ | ||
--source . \ | ||
--project $PROJECT_ID \ | ||
--memory "4Gi" \ | ||
--region $REGION | ||
``` | ||
|
||
2. **Access:** Use [Cloud Run proxy](https://cloud.google.com/sdk/gcloud/reference/run/services/proxy) for local access. The backend will be accessible at `http://localhost:8000`: | ||
|
||
```bash | ||
gcloud run services proxy genai-app-sample --port 8000 --project $PROJECT_ID --region $REGION | ||
``` | ||
|
||
You can then use the same frontend setup described above to interact with your Cloud Run deployment. | ||
|
||
### Integrating with the Starter Pack | ||
|
||
This pattern is designed for seamless integration with the [starter pack](https://goo.gle/e2e-gen-ai-app-starter-pack). The starter pack offers a streamlined approach to setting up and deploying multimodal live agents, complete with robust infrastructure and CI/CD pipelines. | ||
|
||
### Getting Started | ||
|
||
1. **Download the Starter Pack:** | ||
|
||
Obtain the starter pack using the following command: | ||
|
||
```bash | ||
gsutil cp gs://e2e-gen-ai-app-starter-pack/app-starter-pack.zip . && unzip app-starter-pack.zip && cd app-starter-pack | ||
``` | ||
|
||
2. **Prepare the Pattern:** | ||
|
||
Run the provided script to prepare the multimodal live agent pattern: | ||
|
||
```bash | ||
python app/patterns/multimodal_live_agent/utils/prepare_pattern.py | ||
``` | ||
|
||
The script will organize the project structure for you. The current readme will be available in the root folder with the name `PATTERN_README.md`. | ||
|
||
3. **Set up CI/CD:** | ||
|
||
Refer to the instructions in `deployment/readme.md` for detailed guidance on configuring the CI/CD pipelines. | ||
|
||
#### Current Limitations and Future Enhancements | ||
|
||
We are actively developing and improving this pattern. Currently, the following limitations are known: | ||
|
||
- **Observability:** Comprehensive observability features are not yet fully implemented. | ||
- **Load Testing:** Load testing capabilities are not included in this version. | ||
|
||
## Your Feedback Matters | ||
|
||
We highly value your feedback and encourage you to share your thoughts and suggestions. Your input helps us prioritize new features and enhancements. Please reach out to us at <a href="mailto:[email protected]">[email protected]</a> to let us know what features you'd like to see implemented or any other feedback you may have. | ||
|
||
## Additional Resources for Multimodal Live API | ||
|
||
Explore these resources to learn more about the Multimodal Live API and see examples of its usage: | ||
|
||
- [Project Pastra](https://github.com/heiko-hotz/gemini-multimodal-live-dev-guide/tree/main): a comprehensive developer guide for the Gemini Multimodal Live API. | ||
- [Google Cloud Multimodal Live API demos and samples](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api): Collection of code samples and demo applications leveraging multimodal live API in Vertex AI | ||
- [Gemini 2 Cookbook](https://github.com/google-gemini/cookbook/tree/main/gemini-2): Practical examples and tutorials for working with Gemini 2 | ||
- [Multimodal Live API Web Console](https://github.com/google-gemini/multimodal-live-api-web-console): Interactive React-based web interface for testing and experimenting with Gemini Multimodal Live API. |
82 changes: 82 additions & 0 deletions
82
...i/sample-apps/e2e-gen-ai-app-starter-pack/app/patterns/multimodal_live_agent/app/agent.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import os | ||
from typing import Dict | ||
|
||
import google | ||
import vertexai | ||
from google import genai | ||
from google.genai.types import LiveConnectConfig, Content, FunctionDeclaration, Tool | ||
from langchain_google_vertexai import VertexAIEmbeddings | ||
|
||
from app.templates import SYSTEM_INSTRUCTION, FORMAT_DOCS | ||
from app.vector_store import get_vector_store | ||
|
||
# Constants | ||
VERTEXAI = os.getenv("VERTEXAI", "true").lower() == "true" | ||
LOCATION = "us-central1" | ||
EMBEDDING_MODEL = "text-embedding-004" | ||
MODEL_ID = "gemini-2.0-flash-exp" | ||
URLS = [ | ||
"https://cloud.google.com/architecture/deploy-operate-generative-ai-applications" | ||
] | ||
|
||
# Initialize Google Cloud clients | ||
credentials, project_id = google.auth.default() | ||
vertexai.init(project=project_id, location=LOCATION) | ||
|
||
|
||
if VERTEXAI: | ||
genai_client = genai.Client(project=project_id, location=LOCATION, vertexai=True) | ||
else: | ||
# API key should be set using GOOGLE_API_KEY environment variable | ||
genai_client = genai.Client(http_options={"api_version": "v1alpha"}) | ||
|
||
# Initialize vector store and retriever | ||
embedding = VertexAIEmbeddings(model_name=EMBEDDING_MODEL) | ||
vector_store = get_vector_store(embedding=embedding, urls=URLS) | ||
retriever = vector_store.as_retriever() | ||
|
||
|
||
def retrieve_docs(query: str) -> Dict[str, str]: | ||
""" | ||
Retrieves pre-formatted documents about MLOps (Machine Learning Operations), | ||
Gen AI lifecycle, and production deployment best practices. | ||
Args: | ||
query: Search query string related to MLOps, Gen AI, or production deployment. | ||
Returns: | ||
A set of relevant, pre-formatted documents. | ||
""" | ||
docs = retriever.invoke(query) | ||
formatted_docs = FORMAT_DOCS.format(docs=docs) | ||
return {"output": formatted_docs} | ||
|
||
|
||
# Configure tools and live connection | ||
retrieve_docs_tool = Tool( | ||
function_declarations=[ | ||
FunctionDeclaration.from_function(client=genai_client, func=retrieve_docs) | ||
] | ||
) | ||
|
||
tool_functions = {"retrieve_docs": retrieve_docs} | ||
|
||
live_connect_config = LiveConnectConfig( | ||
response_modalities=["AUDIO"], | ||
tools=[retrieve_docs_tool], | ||
system_instruction=Content(parts=[{"text": SYSTEM_INSTRUCTION}]), | ||
) |
Oops, something went wrong.