From 391dd42c92b7539d49826f52919023aebc243ba6 Mon Sep 17 00:00:00 2001 From: Deepak moonat Date: Thu, 12 Dec 2024 10:41:43 +0530 Subject: [PATCH] chore: add documentation links and what's next block (#1515) # Description - [x] Follow the [`CONTRIBUTING` Guide](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/CONTRIBUTING.md). - [x] You are listed as the author in your notebook or README file. - [x] Your account is listed in [`CODEOWNERS`](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/.github/CODEOWNERS) for the file(s). - [x] Make your Pull Request title in the specification. - [x] Ensure the tests and linter pass (Run `nox -s format` from the repository root to format). - [x] Appropriate docs were updated (if necessary) --- .../real_time_rag_retail_gemini_2_0.ipynb | 3627 +++++++++-------- 1 file changed, 1823 insertions(+), 1804 deletions(-) diff --git a/gemini/multimodal-live-api/real_time_rag_retail_gemini_2_0.ipynb b/gemini/multimodal-live-api/real_time_rag_retail_gemini_2_0.ipynb index 9bd4620434a..24d34c6b56f 100644 --- a/gemini/multimodal-live-api/real_time_rag_retail_gemini_2_0.ipynb +++ b/gemini/multimodal-live-api/real_time_rag_retail_gemini_2_0.ipynb @@ -1,1806 +1,1825 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2024 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Real-time Retrieval Augmented Generation (RAG) using the Multimodal Live API with Gemini 2.0\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Google
Open in Colab\n", - "
\n", - "
\n", - " \n", - " \"Google
Open in Colab Enterprise\n", - "
\n", - "
\n", - " \n", - " \"Vertex
Open in Vertex AI Workbench\n", - "
\n", - "
\n", - " \n", - " \"GitHub
View on GitHub\n", - "
\n", - "
\n", - "\n", - "
\n", - "\n", - "
\n", - "\n", - "
\n", - "
\n", - "Share to:\n", - "\n", - "\n", - " \"LinkedIn\n", - "\n", - "\n", - "\n", - " \"Bluesky\n", - "\n", - "\n", - "\n", - " \"X\n", - "\n", - "\n", - "\n", - " \"Reddit\n", - "\n", - "\n", - "\n", - " \"Facebook\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "84f0f73a0f76" - }, - "source": [ - "| | |\n", - "|-|-|\n", - "| Author(s) | [Deepak Moonat](https://github.com/dmoonat/) |" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-MDW_A-nBksi" - }, - "source": [ - "
\n", - "\n", - "⚠️ Gemini 2.0 Flash (Model ID: gemini-2.0-flash-exp) and the Google Gen AI SDK are currently experimental and output can vary ⚠️\n", - "
\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook provides a comprehensive demonstration of the Vertex AI Gemini and Multimodal Live APIs, showcasing text and audio generation capabilities. Users will learn to develop a real-time Retrieval Augmented Generation (RAG) system leveraging the Multimodal Live API for a retail use-case. This system will generate audio and text responses grounded in provided documents. The tutorial covers the following:\n", - "\n", - "- **Gemini API:** Text output generation.\n", - "- **Multimodal Live API:** Text and audio output generation.\n", - "- **Retrieval Augmented Generation (RAG):** Text and audio output generation grounded in provided documents for a retail use-case." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "xKVzRJhgJ4EZ" - }, - "source": [ - "### Gemini 2.0\n", - "\n", - "[Gemini 2.0 Flash](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2) is a new multimodal generative ai model from the Gemini family developed by [Google DeepMind](https://deepmind.google/). It now available as an experimental preview release through the Gemini API in Vertex AI and Vertex AI Studio. The model introduces new features and enhanced core capabilities:\n", - "\n", - "- Multimodal Live API: This new API helps you create real-time vision and audio streaming applications with tool use.\n", - "- Speed and performance: Gemini 2.0 Flash is the fastest model in the industry, with a 3x improvement in time to first token (TTFT) over 1.5 Flash.\n", - "- Quality: The model maintains quality comparable to larger models like Gemini 1.5 Pro and GPT-4o.\n", - "- Improved agentic experiences: Gemini 2.0 delivers improvements to multimodal understanding, coding, complex instruction following, and function calling.\n", - "- New Modalities: Gemini 2.0 introduces native image generation and controllable text-to-speech capabilities, enabling image editing, localized artwork creation, and expressive storytelling.\n", - "- To support the new model, we're also shipping an all new SDK that supports simple migration between the Gemini Developer API and the Gemini API in Vertex AI.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "61RBz8LLbxCR" - }, - "source": [ - "## Get started" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "No17Cw5hgx12" - }, - "source": [ - "### Install Dependencies\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ue_G9ZU80ON0" - }, - "source": [ - "- `google-genai`: Google Gen AI python library\n", - "- `PyPDF2`: To read PDFs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "tFy3H3aPgx12" - }, - "outputs": [], - "source": [ - "%%capture\n", - "\n", - "%pip install --upgrade --quiet google-genai PyPDF2" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "R5Xep4W9lq-Z" - }, - "source": [ - "### Restart runtime\n", - "\n", - "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n", - "\n", - "The restart might take a minute or longer. After it's restarted, continue to the next step." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "XRvKdaPDTznN" - }, - "outputs": [], - "source": [ - "import IPython\n", - "\n", - "app = IPython.Application.instance()\n", - "app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "SbmM4z7FOBpM" - }, - "source": [ - "
\n", - "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n", - "
\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dmWOrTJ3gx13" - }, - "source": [ - "### Authenticate your notebook environment (Colab only)\n", - "\n", - "If you're running this notebook on Google Colab, run the cell below to authenticate your environment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NyKGtVQjgx13" - }, - "outputs": [], - "source": [ - "import sys\n", - "\n", - "if \"google.colab\" in sys.modules:\n", - " from google.colab import auth\n", - "\n", - " auth.authenticate_user()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "DF4l8DTdWgPY" - }, - "source": [ - "### Set Google Cloud project information\n", - "\n", - "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n", - "\n", - "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Nqwi-5ufWp_B" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", - "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", - " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", - "\n", - "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5303c05f7aa6" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6fc324893334" - }, - "outputs": [], - "source": [ - "# For asynchronous operations\n", - "import asyncio\n", - "\n", - "# For data processing\n", - "import glob\n", - "from typing import Any\n", - "\n", - "from IPython.display import Audio, Markdown, display\n", - "import PyPDF2\n", - "\n", - "# For GenerativeAI\n", - "from google import genai\n", - "from google.genai import types\n", - "from google.genai.types import LiveConnectConfig\n", - "import numpy as np\n", - "import pandas as pd\n", - "\n", - "# For similarity score\n", - "from sklearn.metrics.pairwise import cosine_similarity\n", - "\n", - "# For retry mechanism\n", - "from tenacity import retry, stop_after_attempt, wait_random_exponential" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "OV5bFDTVE3oX" - }, - "source": [ - "#### Initialize Gen AI client" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "3pjBP_V7JqhD" - }, - "source": [ - "- Client for calling the Gemini API in Vertex AI\n", - "- `vertexai=True`, indicates the client should communicate with the Vertex AI API endpoints." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bEhq_4GBEW2a" - }, - "outputs": [], - "source": [ - "# Vertex AI API\n", - "client = genai.Client(\n", - " vertexai=True,\n", - " project=PROJECT_ID,\n", - " location=LOCATION,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e43229f3ad4f" - }, - "source": [ - "### Initialize model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf93d5f0ce00" - }, - "outputs": [], - "source": [ - "MODEL_ID = \"gemini-2.0-flash-exp\" # @param {type:\"string\", isTemplate: true}\n", - "MODEL = (\n", - " f\"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}\"\n", - ")\n", - "\n", - "text_embedding_model = \"text-embedding-004\" # @param {type:\"string\", isTemplate: true}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "H4TDOc3aqwuz" - }, - "source": [ - "## Sample Use Case - Retail Customer Support Assistance" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "cH6zJeecq6SU" - }, - "source": [ - "Let's imagine a bicycle shop called `Cymbal Bikes` that offers various services like brake repair, chain replacement, and more. Our goal is to create a straightforward support system that can answer customer questions based on the shop's policies and service offerings." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "uA3X24j86uE7" - }, - "source": [ - "Having a customer support assistance offers numerous advantages for businesses, ultimately leading to improved customer satisfaction and loyalty, as well as increased profitability. Here are some key benefits:\n", - "\n", - "- Faster Resolution of Issues: Users can quickly find answers to their questions without having to search through store's website.\n", - "- Improved Efficiency: The assistant can handle simple, repetitive questions, freeing up human agents to focus on more complex or strategic tasks.\n", - "- 24/7 Availability: Unlike human colleagues, the assistant is available around the clock, providing immediate support regardless of time zones or working hours.\n", - "- Consistent Information: The assistant provides standardized answers, ensuring consistency and accuracy." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mZZLuCecsp0e" - }, - "source": [ - "#### Context Documents" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nWrK7HHjssqB" - }, - "source": [ - "- Download the documents from Google Cloud Storage bucket\n", - "- These documents are specific to `Cymbal Bikes` store\n", - " - [`Cymbal Bikes Return Policy`](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/documents/CymbalBikesReturnPolicy.pdf): Contains information about return policy\n", - " - [`Cymbal Bikes Services`](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/documents/CymbalBikesServices.pdf): Contains information about services provided by Cymbal Bikes" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "iLhNfYfYspnC" - }, - "outputs": [], - "source": [ - "!gsutil cp \"gs://github-repo/generative-ai/gemini2/use-cases/retail_rag/documents/CymbalBikesReturnPolicy.pdf\" \"documents/CymbalBikesReturnPolicy.pdf\"\n", - "!gsutil cp \"https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/documents/CymbalBikesServices.pdf\" \"documents/CymbalBikesServices.pdf\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "GOFNGNGjjEzD" - }, - "source": [ - "### Text" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "QlcEVrUtP9TI" - }, - "source": [ - "- Let's check a specific query to our retail use-case" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "eLqbaZjoCzng" - }, - "outputs": [], - "source": [ - "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", - "\n", - "response = client.models.generate_content(\n", - " model=MODEL_ID,\n", - " contents=query,\n", - ")\n", - "\n", - "display(Markdown(response.text))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-D6q7KUDuH-E" - }, - "source": [ - "> The correct answer to the query is `A basic tune-up costs $100.`\n", - "\n", - "![BasicTuneUp](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/images/BasicTuneUp.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "uoigEKWkQjwi" - }, - "source": [ - "- You can see, the model is unable to answer it correctly, as it's very specific to our hypothetical use-case. However, it does provide some details to get the answer from the internet.\n", - "\n", - "- Without the necessary context, the model's response is essentially a guess and may not align with the desired information.\n", - "\n", - "- LLM is trained on vast amount of data, which leads to hallucinations. To overcome this challenge, in coming sections we'll look into how to ground the answers using Retrieval Augmented Generation (RAG)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nhzKqZdunwYJ" - }, - "source": [ - "## Grounding" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "kzNcDkRevJi3" - }, - "source": [ - "Grounding is crucial in this scenario because the model needs to access and process relevant information from external sources (the \"Cymbal Bikes Return Policy\" and \"Cymbal Bikes Services\" documents) to answer specific queries accurately. Without grounding, the model relies solely on its pre-trained knowledge, which may not contain the specific details about the bike store's policies.\n", - "\n", - "In the example, the question about the return policy for bike helmets at Cymbal Bikes cannot be answered correctly without accessing the provided documents. The model's general knowledge of return policies is insufficient. Grounding allows the model to:\n", - "\n", - "1. **Retrieve relevant information:** The system must first locate the pertinent sections within the provided documents that address the user's question about bike helmet returns.\n", - "\n", - "2. **Process and synthesize information:** After retrieving relevant passages, the model must then understand and synthesize the information to construct an accurate answer.\n", - "\n", - "3. **Generate a grounded response:** Finally, the response needs to be directly derived from the factual content of the documents. This ensures accuracy and avoids hallucinations – generating incorrect or nonsensical information not present in the source documents.\n", - "\n", - "Without grounding, the model is forced to guess or extrapolate from its general knowledge, which can lead to inaccurate or misleading responses. The grounding process makes the model's responses more reliable and trustworthy, especially for domain-specific knowledge like store policies or procedures.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-SyokS1pUR9O" - }, - "source": [ - "## Multimodal Live API" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "pwZeOc5-UXKD" - }, - "source": [ - "The multimodal live API enables you to build low-latency, multi-modal applications. It currently supports text as input and text & audio as output.\n", - "\n", - "- Low Latency, where audio output is required, where the Text-to-Speech step can be skipped\n", - "- Provides a more interactive user experience.\n", - "- Suitable for applications requiring immediate audio feedback" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aS1zTjSMcij2" - }, - "source": [ - "#### Asynchronous (async) operation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "iH9CBOpncnK8" - }, - "source": [ - "When to use async calls:\n", - "1. **I/O-bound operations**: When your code spends a significant amount of time waiting for external resources\n", - " (e.g., network requests, file operations, database queries). Async allows other tasks to run while waiting. \n", - " This is especially beneficial for real-time applications or when dealing with multiple concurrent requests.\n", - " \n", - " Example:\n", - " - Fetching data from a remote server.\n", - "\n", - "2. **Parallel tasks**: When you have independent tasks that can run concurrently without blocking each other. Async\n", - " allows you to efficiently utilize multiple CPU cores or network connections.\n", - " \n", - " Example:\n", - " - Processing a large number of prompts and generating audio for each.\n", - "\n", - "\n", - "3. **User interfaces**: In applications with graphical user interfaces (GUIs), async operations prevent the UI from\n", - " freezing while performing long-running tasks. Users can interact with the interface even when background\n", - " operations are active.\n", - " \n", - " Example: \n", - " - A chatbot interacting in real time, where an audio response is generated in the background.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aB4U6s1-UlFw" - }, - "source": [ - "### Text" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "YvUJzbgPM26m" - }, - "source": [ - "For text generation, you need to set the `response_modalities` to `TEXT`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "YQOurRs5UU9p" - }, - "outputs": [], - "source": [ - "async def generate_content(query: str) -> str:\n", - " \"\"\"Function to generate text content using Gemini live API.\n", - "\n", - " Args:\n", - " query: The query to generate content for.\n", - "\n", - " Returns:\n", - " The generated content.\n", - " \"\"\"\n", - " config = LiveConnectConfig(response_modalities=[\"TEXT\"])\n", - "\n", - " async with client.aio.live.connect(model=MODEL, config=config) as session:\n", - "\n", - " await session.send(input=query, end_of_turn=True)\n", - "\n", - " response = []\n", - " async for message in session.receive():\n", - " try:\n", - " response.append(message.server_content.model_turn.parts[0].text)\n", - " except AttributeError:\n", - " pass\n", - "\n", - " if message.server_content.turn_complete:\n", - " response = \"\".join(str(x) for x in response)\n", - " return response" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ye1TwWVaVSxF" - }, - "source": [ - "- Try a specific query" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "gGqsp6nFDNsG" - }, - "outputs": [], - "source": [ - "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", - "\n", - "response = await generate_content(query)\n", - "display(Markdown(response))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "roXuCp_cXE9q" - }, - "source": [ - "### Audio" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lBnz34QaakVM" - }, - "source": [ - "- For audio generation, you need to set the `response_modalities` to `AUDIO`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "BmLuvxnFbC4Z" - }, - "outputs": [], - "source": [ - "async def generate_audio_content(query: str):\n", - " \"\"\"Function to generate audio response for provided query using Gemini Multimodal Live API.\n", - "\n", - " Args:\n", - " query: The query to generate audio response for.\n", - "\n", - " Returns:\n", - " The audio response.\n", - " \"\"\"\n", - " config = LiveConnectConfig(response_modalities=[\"AUDIO\"])\n", - " async with client.aio.live.connect(model=MODEL, config=config) as session:\n", - "\n", - " await session.send(input=query, end_of_turn=True)\n", - "\n", - " audio_parts = []\n", - " async for message in session.receive():\n", - " if message.server_content.model_turn:\n", - " for part in message.server_content.model_turn.parts:\n", - " audio_parts.append(\n", - " np.frombuffer(part.inline_data.data, dtype=np.int16)\n", - " )\n", - "\n", - " if message.server_content.turn_complete:\n", - " if audio_parts:\n", - " audio_data = np.concatenate(audio_parts, axis=0)\n", - " await asyncio.sleep(0.4)\n", - " display(Audio(audio_data, rate=24000, autoplay=True))\n", - " break" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "xKQ_l6wiLH_w" - }, - "source": [ - "In this example, you send a text prompt and request the model response in audio." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rXJRoxUAcFVB" - }, - "source": [ - "- Let's check the same query as before" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "CfZy_XZeDUtS" - }, - "outputs": [], - "source": [ - "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", - "\n", - "await generate_audio_content(query)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "clfXp2PZmxDZ" - }, - "source": [ - "- Model is unable to answer the query, but with the Multimodal Live API, it doesn't hallucinate, which is pretty good!!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "wT2oB1BOqDYP" - }, - "source": [ - "### Continuous Audio Interaction (Not multiturn)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "T4iAJCstqR5s" - }, - "source": [ - " - Below function generates audio output based on the provided text prompt.\n", - " - The generated audio is displayed using `IPython.display.Audio`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bZntNTPiYLA8" - }, - "source": [ - "- Input your prompts (type `q` or `quit` or `exit` to exit).\n", - "- Example prompts:\n", - " - Hello\n", - " - Who are you?\n", - " - What's the largest planet in our solar system?\n", - " - Tell me 3 fun facts about the universe?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "7M0zkHNrOBQf" - }, - "outputs": [], - "source": [ - "async def continuous_audio_generation():\n", - " \"\"\"Continuously generates audio responses for the asked queries.\"\"\"\n", - " while True:\n", - " query = input(\"Your query > \")\n", - " if any(query.lower() in s for s in [\"q\", \"quit\", \"exit\"]):\n", - " break\n", - " await generate_audio_content(query)\n", - "\n", - "\n", - "await continuous_audio_generation()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "QX9k92TlJ864" - }, - "source": [ - "## Enhancing LLM Accuracy with RAG" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "oOJ-Wx18hpju" - }, - "source": [ - "We'll be showcasing the design pattern for how to implement Real-time Retrieval Augmented Generation (RAG) using Gemini 2.0 multimodal live API.\n", - "\n", - "- Multimodal live API uses websockets to communicate over the internet\n", - "- It maintains a continuous connection\n", - "- Ideal for real-time applications which require persistent communication\n", - "\n", - "\n", - "> Note: Replicating real-life scenarios with Python can be challenging within the constraints of a Colab environment.\n", - "\n", - "\n", - "However, the flow shown in this section can be modified for streaming audio input and output.\n", - "\n", - "
\n", - "\n", - "We'll build the RAG pipeline from scratch to help you understand each and every components of the pipeline.\n", - "\n", - "There are other ways to build the RAG pipeline using open source tools such as [LangChain](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb), [LlamaIndex](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/llamaindex_rag.ipynb) etc." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "u5CXTtsPEyJ0" - }, - "source": [ - "### Context Documents" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "vvdcw1AOg4se" - }, - "source": [ - "- Documents are the building blocks of any RAG pipeline, as it provides the relevant context needed to ground the LLM responses\n", - "- We'll be using the documents already downloaded at the start of the notebook\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "M22BSDb2Xxpb" - }, - "outputs": [], - "source": [ - "documents = glob.glob(\"documents/*\")\n", - "documents" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zNpUL7t0e054" - }, - "source": [ - "### Retrieval Augmented Generation Architecture" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "vV5Et4YHbqqE" - }, - "source": [ - "In general, RAG architecture consists of the following components\n", - "\n", - "**Data Preparation**\n", - "1. Chunking: Dividing the document into smaller, manageable pieces for processing.\n", - "2. Embedding: Transforming text chunks into numerical vectors representing semantic meaning.\n", - "3. Indexing: Organizing embeddings for efficient similarity search." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "563756fa3b7f" - }, - "source": [ - "![RAGArchitecture](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/images/RAGArchitecture.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "pf4sXzYUby57" - }, - "source": [ - "**Inference**\n", - "1. Retrieval: Finding the most relevant chunks based on the query embedding.\n", - "2. Query Augmentation: Enhancing the query with retrieved context for improved generation.\n", - "3. Generation: Synthesizing a coherent and informative answer based on the augmented query." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1a30b41b63f1" - }, - "source": [ - "![LiveAPI](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/images/LiveAPI.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "M-0zlJ3_FRfa" - }, - "source": [ - "#### Document Embedding and Indexing" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0fY3xLaFKBIS" - }, - "source": [ - "Following blocks of code shows how to process unstructured data(PDFs), extract text, and divide them into smaller chunks for efficient embedding and retrieval." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JTTOQ35Ia-V2" - }, - "source": [ - "- Embeddings:\n", - " - Numerical representations of text\n", - " - It capture the semantic meaning and context of the text\n", - " - We'll use Vertex AI's text embedding model to generate embeddings\n", - " - Error handling (like the retry mechanism) during embedding generation due to potential API quota limits.\n", - "\n", - "- Indexing:\n", - " - Build a searchable index from embeddings, enabling efficient similarity search.\n", - " - For example, the index is like a detailed table of contents for a massive reference book.\n", - "\n", - "\n", - "Check out the Google Cloud Platform [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings) for detailed understanding and example use-cases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Vun69x23FWiw" - }, - "outputs": [], - "source": [ - "@retry(wait=wait_random_exponential(multiplier=1, max=120), stop=stop_after_attempt(4))\n", - "def get_embeddings(\n", - " embedding_client: Any, embedding_model: str, text: str, output_dim: int = 768\n", - ") -> list[float]:\n", - " \"\"\"\n", - " Generate embeddings for text with retry logic for API quota management.\n", - "\n", - " Args:\n", - " embedding_client: The client object used to generate embeddings.\n", - " embedding_model: The name of the embedding model to use.\n", - " text: The text for which to generate embeddings.\n", - " output_dim: The desired dimensionality of the output embeddings (default is 768).\n", - "\n", - " Returns:\n", - " A list of floats representing the generated embeddings. Returns None if a \"RESOURCE_EXHAUSTED\" error occurs.\n", - "\n", - " Raises:\n", - " Exception: Any exception encountered during embedding generation, excluding \"RESOURCE_EXHAUSTED\" errors.\n", - " \"\"\"\n", - " try:\n", - " response = embedding_client.models.embed_content(\n", - " model=embedding_model,\n", - " contents=[text],\n", - " config=types.EmbedContentConfig(output_dimensionality=output_dim),\n", - " )\n", - " return [response.embeddings[0].values]\n", - " except Exception as e:\n", - " if \"RESOURCE_EXHAUSTED\" in str(e):\n", - " return None\n", - " print(f\"Error generating embeddings: {str(e)}\")\n", - " raise" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2csDY5NsswwJ" - }, - "source": [ - "- The code block executes the following steps:\n", - "\n", - " - Extracts text from PDF documents and segments it into smaller chunks for processing.\n", - " - Employs a Vertex AI model to transform each text chunk into a numerical embedding vector, facilitating semantic representation and search.\n", - " - Constructs a Pandas DataFrame to store the embeddings, enriched with metadata such as document name and page number, effectively creating a searchable index for efficient retrieval.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9TJlvdIsRfmX" - }, - "outputs": [], - "source": [ - "def build_index(\n", - " document_paths: list[str],\n", - " embedding_client: Any,\n", - " embedding_model: str,\n", - " chunk_size: int = 512,\n", - ") -> pd.DataFrame:\n", - " \"\"\"\n", - " Build searchable index from a list of PDF documents with page-wise processing.\n", - "\n", - " Args:\n", - " document_paths: A list of file paths to PDF documents.\n", - " embedding_client: The client object used to generate embeddings.\n", - " embedding_model: The name of the embedding model to use.\n", - " chunk_size: The maximum size (in characters) of each text chunk. Defaults to 512.\n", - "\n", - " Returns:\n", - " A Pandas DataFrame where each row represents a text chunk. The DataFrame includes columns for:\n", - " - 'document_name': The path to the source PDF document.\n", - " - 'page_number': The page number within the document.\n", - " - 'page_text': The full text of the page.\n", - " - 'chunk_number': The chunk number within the page.\n", - " - 'chunk_text': The text content of the chunk.\n", - " - 'embeddings': The embedding vector for the chunk.\n", - "\n", - " Raises:\n", - " ValueError: If no chunks are created from the input documents.\n", - " Exception: Any exceptions encountered during file processing are printed to the console and the function continues to the next document.\n", - " \"\"\"\n", - " all_chunks = []\n", - "\n", - " for doc_path in document_paths:\n", - " try:\n", - " with open(doc_path, \"rb\") as file:\n", - " pdf_reader = PyPDF2.PdfReader(file)\n", - "\n", - " for page_num in range(len(pdf_reader.pages)):\n", - " page = pdf_reader.pages[page_num]\n", - " page_text = page.extract_text()\n", - "\n", - " chunks = [\n", - " page_text[i : i + chunk_size]\n", - " for i in range(0, len(page_text), chunk_size)\n", - " ]\n", - "\n", - " for chunk_num, chunk_text in enumerate(chunks):\n", - " embeddings = get_embeddings(\n", - " embedding_client, embedding_model, chunk_text\n", - " )\n", - "\n", - " if embeddings is None:\n", - " print(\n", - " f\"Warning: Could not generate embeddings for chunk {chunk_num} on page {page_num + 1}\"\n", - " )\n", - " continue\n", - "\n", - " chunk_info = {\n", - " \"document_name\": doc_path,\n", - " \"page_number\": page_num + 1,\n", - " \"page_text\": page_text,\n", - " \"chunk_number\": chunk_num,\n", - " \"chunk_text\": chunk_text,\n", - " \"embeddings\": embeddings,\n", - " }\n", - " all_chunks.append(chunk_info)\n", - "\n", - " except Exception as e:\n", - " print(f\"Error processing document {doc_path}: {str(e)}\")\n", - " continue\n", - "\n", - " if not all_chunks:\n", - " raise ValueError(\"No chunks were created from the documents\")\n", - "\n", - " return pd.DataFrame(all_chunks)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "yFGsl-Zvlej6" - }, - "source": [ - "Let's create embeddings and an index using the provided documents" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "hjl5FDQckDcO" - }, - "outputs": [], - "source": [ - "vector_db_mini_vertex = build_index(\n", - " documents, embedding_client=client, embedding_model=text_embedding_model\n", - ")\n", - "vector_db_mini_vertex" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pZLX5ozMlxTX" - }, - "outputs": [], - "source": [ - "# Index size\n", - "vector_db_mini_vertex.shape" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cvNVn3kT9FiB" - }, - "outputs": [], - "source": [ - "# Example of how a chunk looks like\n", - "vector_db_mini_vertex.loc[0, \"chunk_text\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Hul4bjAkBkg0" - }, - "source": [ - "To enhance the performance of retrieval systems, consider the following:\n", - "\n", - "- Optimize chunk size selection to balance granularity and context.\n", - "- Evaluate various chunking strategies to identify the most effective approach for your data.\n", - "- Explore managed services and scalable indexing solutions, such as [Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/create-datastore-ingest), to enhance performance and efficiency." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "43txjyVlHT6v" - }, - "source": [ - "#### Retrieval" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "y92jM-v8KBfV" - }, - "source": [ - "The below code demonstrates how to query the index and uses a cosine similarity measure for comparing query vectors against the index. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bI1YsFoKtyxY" - }, - "source": [ - "* **Input:** Accepts a query string and parameters like the number of relevant chunks to return.\n", - "* **Embedding Generation:** Generates an embedding for the input query using the same model used to embed the document chunks.\n", - "* **Similarity Search:** Compares the query embedding to the embeddings of all indexed document chunks, using cosine similarity. Could use other distance metrics as well.\n", - "* **Ranking:** Ranks the chunks based on their similarity scores to the query.\n", - "* **Top-k Retrieval:** Returns the top *k* most similar chunks, where *k* is specified by the input parameters. This could be configurable.\n", - "* **Output:** Returns a list of relevant chunks, potentially including the original chunk text, similarity score, document source (filename, page number), and chunk metadata.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "88ndL_2wJ5ZD" - }, - "outputs": [], - "source": [ - "def get_relevant_chunks(\n", - " query: str,\n", - " vector_db: pd.DataFrame,\n", - " embedding_client: Any,\n", - " embedding_model: str,\n", - " top_k: int = 3,\n", - ") -> str:\n", - " \"\"\"\n", - " Retrieve the most relevant document chunks for a query using similarity search.\n", - "\n", - " Args:\n", - " query: The search query string.\n", - " vector_db: A pandas DataFrame containing the vectorized document chunks.\n", - " It must contain columns named 'embeddings', 'document_name',\n", - " 'page_number', and 'chunk_text'.\n", - " The 'embeddings' column should contain lists or numpy arrays\n", - " representing the embeddings.\n", - " embedding_client: The client object used to generate embeddings.\n", - " embedding_model: The name of the embedding model to use.\n", - " top_k: The number of most similar chunks to retrieve. Defaults to 3.\n", - "\n", - " Returns:\n", - " A formatted string containing the top_k most relevant chunks. Each chunk is\n", - " presented with its page number and chunk number. Returns an error message if\n", - " the query processing fails or if an error occurs during chunk retrieval.\n", - "\n", - " Raises:\n", - " Exception: If any error occurs during the process (e.g., issues with the embedding client,\n", - " data format problems in the vector database).\n", - " The specific error is printed to the console.\n", - " \"\"\"\n", - " try:\n", - " query_embedding = get_embeddings(embedding_client, embedding_model, query)\n", - "\n", - " if query_embedding is None:\n", - " return \"Could not process query due to quota issues\"\n", - "\n", - " similarities = [\n", - " cosine_similarity(query_embedding, chunk_emb)[0][0]\n", - " for chunk_emb in vector_db[\"embeddings\"]\n", - " ]\n", - "\n", - " top_indices = np.argsort(similarities)[-top_k:]\n", - " relevant_chunks = vector_db.iloc[top_indices]\n", - "\n", - " context = []\n", - " for _, row in relevant_chunks.iterrows():\n", - " context.append(\n", - " {\n", - " \"document_name\": row[\"document_name\"],\n", - " \"page_number\": row[\"page_number\"],\n", - " \"chunk_number\": row[\"chunk_number\"],\n", - " \"chunk_text\": row[\"chunk_text\"],\n", - " }\n", - " )\n", - "\n", - " return \"\\n\\n\".join(\n", - " [\n", - " f\"[Page {chunk['page_number']}, Chunk {chunk['chunk_number']}]: {chunk['chunk_text']}\"\n", - " for chunk in context\n", - " ]\n", - " )\n", - "\n", - " except Exception as e:\n", - " print(f\"Error getting relevant chunks: {str(e)}\")\n", - " return \"Error retrieving relevant chunks\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "3hxyLlTjsstI" - }, - "source": [ - "Let's test out our retrieval component" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Ek4aF0Esck2H" - }, - "source": [ - "- Let's try the same query for which the model was not able to answer earlier, due to lack of context" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "lSd8ZeH6D7m4" - }, - "outputs": [], - "source": [ - "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", - "relevant_context = get_relevant_chunks(\n", - " query, vector_db_mini_vertex, client, text_embedding_model, top_k=3\n", - ")\n", - "relevant_context" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "YBxnXReUn8Iy" - }, - "source": [ - "- You can see, with the help of the relevant context we can derive the answer as it contains the chunks specific to the asked query.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "25eb6422c9cf" - }, - "source": [ - "![Context](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/images/Context.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "kHzw7_UwzutC" - }, - "source": [ - "For optimal performance, consider these points:\n", - "\n", - "* **Context Window:** Considers a context window around the retrieved chunks to provide more comprehensive context. This could involve returning neighboring chunks or a specified window size.\n", - "* **Filtering:** Option to filter retrieved chunks based on criteria like minimum similarity score or source document.\n", - "* **Efficiency:** Designed for efficient retrieval, especially for large indexes, potentially using optimized search algorithms or data structures." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ZEfJkwSqJ5KR" - }, - "source": [ - "### Generation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "b7OZpv33KBx_" - }, - "source": [ - "* **Contextual Answer Synthesis:** The core function of the generation component is to synthesize a coherent and informative answer based on the retrieved context. It takes the user's query and the relevant document chunks as input.\n", - "* **Large Language Model (LLM) Integration:** It leverages a large language model (LLM) to generate the final answer. The LLM processes both the query and the retrieved context to produce a response. The quality of the answer heavily relies on the capabilities of the chosen LLM.\n", - "* **Coherence and Relevance:** A good generation function ensures the generated answer is coherent, factually accurate, and directly addresses the user's query, using only the provided context. It avoids hallucinations (generating information not present in the context).\n", - "* **Prompt Engineering:** The effectiveness of the LLM is heavily influenced by the prompt. The generation function likely incorporates prompt engineering techniques to guide the LLM towards generating the desired output. This may involve carefully crafting instructions for the LLM or providing examples.\n", - "\n", - "For more details on prompt engineering, check out the [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/prompt-design-strategies)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0xs-AQmqm03l" - }, - "source": [ - "Let's see two use-cases, `Text-In-Text-Out` and `Text-In-Audio-Out`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "xp7doymTJ7Iu" - }, - "outputs": [], - "source": [ - "@retry(wait=wait_random_exponential(multiplier=1, max=120), stop=stop_after_attempt(4))\n", - "async def generate_answer(\n", - " query: str, context: str, llm_client: Any, modality: str = \"text\"\n", - ") -> str:\n", - " \"\"\"\n", - " Generate answer using LLM with retry logic for API quota management.\n", - "\n", - " Args:\n", - " query: User query.\n", - " context: Relevant text providing context for the query.\n", - " llm_client: Client for accessing LLM API.\n", - " modality: Output modality (text or audio).\n", - "\n", - " Returns:\n", - " Generated answer.\n", - "\n", - " Raises:\n", - " Exception: If an unexpected error occurs during the LLM call (after retry attempts are exhausted).\n", - " \"\"\"\n", - " try:\n", - " # If context indicates earlier quota issues, return early\n", - " if context in [\n", - " \"Could not process query due to quota issues\",\n", - " \"Error retrieving relevant chunks\",\n", - " ]:\n", - " return \"Can't Process, Quota Issues\"\n", - "\n", - " prompt = f\"\"\"Based on the following context, please answer the question.\n", - "\n", - " Context:\n", - " {context}\n", - "\n", - " Question: {query}\n", - "\n", - " Answer:\"\"\"\n", - "\n", - " if modality == \"text\":\n", - " # Generate text answer using LLM\n", - " response = await generate_content(prompt)\n", - " return response\n", - "\n", - " elif modality == \"audio\":\n", - " # Generate audio answer using LLM\n", - " await generate_audio_content(prompt)\n", - "\n", - " except Exception as e:\n", - " if \"RESOURCE_EXHAUSTED\" in str(e):\n", - " return \"Can't Process, Quota Issues\"\n", - " print(f\"Error generating answer: {str(e)}\")\n", - " return \"Error generating answer\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "11q0Sf0oJ7wL" - }, - "source": [ - "Let's test our `Generation` component" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "S-iesR2BEHnI" - }, - "outputs": [], - "source": [ - "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", - "\n", - "generated_answer = await generate_answer(\n", - " query, relevant_context, client, modality=\"text\"\n", - ")\n", - "display(Markdown(generated_answer))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "W7EHYeP-EMpN" - }, - "outputs": [], - "source": [ - "await generate_answer(query, relevant_context, client, modality=\"audio\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "CbQB5PbMrrsB" - }, - "source": [ - "> And the answer is... CORRECT !! 🎉" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1gnr-j-ocxlx" - }, - "source": [ - "- The accuracy of the generated answer is attributed to the provision of relevant context to the Large Language Model (LLM), enabling it to effectively comprehend the query and produce an appropriate response." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2MNlAoAHR0Do" - }, - "source": [ - "### Pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8LemsW6WrOfm" - }, - "source": [ - "Let's put `Retrieval` and `Generation` components together in a pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "yoOeqxETR2G_" - }, - "outputs": [], - "source": [ - "async def rag(\n", - " question: str,\n", - " vector_db: pd.DataFrame,\n", - " embedding_client: Any,\n", - " embedding_model: str,\n", - " llm_client: Any,\n", - " top_k: int,\n", - " llm_model: str,\n", - " modality: str = \"text\",\n", - ") -> str | None:\n", - " \"\"\"\n", - " RAG Pipeline.\n", - "\n", - " Args:\n", - " question: User query.\n", - " vector_db: DataFrame containing document chunks and embeddings.\n", - " embedding_client: Client for accessing embedding API.\n", - " embedding_model: Name of the embedding model.\n", - " llm_client: Client for accessing LLM API.\n", - " top_k: The number of top relevant chunks to retrieve from the vector database.\n", - " llm_model: Name of the LLM model.\n", - " modality: Output modality (text or audio).\n", - "\n", - " Returns:\n", - " For text modality, generated answer.\n", - " For audio modality, audio playback widget.\n", - "\n", - " Raises:\n", - " Exception: Catches and prints any exceptions during processing. Returns an error message.\n", - " \"\"\"\n", - "\n", - " try:\n", - " # Get relevant context for question\n", - " relevant_context = get_relevant_chunks(\n", - " question, vector_db, embedding_client, embedding_model, top_k=top_k\n", - " )\n", - "\n", - " if modality == \"text\":\n", - " # Generate text answer using LLM\n", - " generated_answer = await generate_answer(\n", - " question,\n", - " relevant_context,\n", - " llm_client,\n", - " )\n", - " return generated_answer\n", - "\n", - " elif modality == \"audio\":\n", - " # Generate audio answer using LLM\n", - " await generate_answer(\n", - " question, relevant_context, llm_client, modality=modality\n", - " )\n", - " return\n", - "\n", - " except Exception as e:\n", - " print(f\"Error processing question '{question}': {str(e)}\")\n", - " return {\"question\": question, \"generated_answer\": \"Error processing question\"}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Q8bNzUvbVJcx" - }, - "source": [ - "Our Retrieval Augmented Generation (RAG) architecture allows for flexible output modality(text and audio) selection. By modifying only the generation component, we can produce both text and audio output while maintaining the same retrieval mechanism. This highlights the adaptability of RAG in catering to diverse content presentation needs." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Pkn75-1cFW1J" - }, - "source": [ - "### Inference" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "QMGtlPWcVXT0" - }, - "source": [ - "Let's test our simple RAG pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0vwfQbodn89Y" - }, - "source": [ - "#### Sample Queries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Zx_GwXESk9aP" - }, - "outputs": [], - "source": [ - "question_set = [\n", - " {\n", - " \"question\": \"What is the price of a basic tune-up at Cymbal Bikes?\",\n", - " \"answer\": \"A basic tune-up costs $100.\",\n", - " },\n", - " {\n", - " \"question\": \"How much does it cost to replace a tire at Cymbal Bikes?\",\n", - " \"answer\": \"Replacing a tire at Cymbal Bikes costs $50 per tire.\",\n", - " },\n", - " {\n", - " \"question\": \"What does gear repair at Cymbal Bikes include?\",\n", - " \"answer\": \"Gear repair includes inspection and repair of the gears, including replacement of chainrings, cogs, and cables as needed.\",\n", - " },\n", - " {\n", - " \"question\": \"What is the cost of replacing a tube at Cymbal Bikes?\",\n", - " \"answer\": \"Replacing a tube at Cymbal Bikes costs $20.\",\n", - " },\n", - " {\n", - " \"question\": \"Can I return clothing items to Cymbal Bikes?\",\n", - " \"answer\": \"Clothing can only be returned if it is unworn and in the original packaging.\",\n", - " },\n", - " {\n", - " \"question\": \"What is the time frame for returning items to Cymbal Bikes?\",\n", - " \"answer\": \"Cymbal Bikes offers a 30-day return policy on all items.\",\n", - " },\n", - " {\n", - " \"question\": \"Can I return edible items like energy gels?\",\n", - " \"answer\": \"No, edible items are not returnable.\",\n", - " },\n", - " {\n", - " \"question\": \"How can I return an item purchased online from Cymbal Bikes?\",\n", - " \"answer\": \"Items purchased online can be returned to any Cymbal Bikes store or mailed back.\",\n", - " },\n", - " {\n", - " \"question\": \"What should I include when returning an item to Cymbal Bikes?\",\n", - " \"answer\": \"Please include the original receipt and a copy of your shipping confirmation when returning an item.\",\n", - " },\n", - " {\n", - " \"question\": \"Does Cymbal Bikes offer refunds for shipping charges?\",\n", - " \"answer\": \"Cymbal Bikes does not offer refunds for shipping charges, except for defective items.\",\n", - " },\n", - " {\n", - " \"question\": \"How do I process a return for a defective item at Cymbal Bikes?\",\n", - " \"answer\": \"To process a return for a defective item, please contact Cymbal Bikes first.\",\n", - " },\n", - "]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ZUo_fcNzoAp3" - }, - "source": [ - "#### Text" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "y1RC5-djV0-r" - }, - "source": [ - "First we will try, `modality='text'`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "dmyN-h18EZdT" - }, - "outputs": [], - "source": [ - "question_set[0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "-f3hsHqBEbwc" - }, - "outputs": [], - "source": [ - "response = await rag(\n", - " question=question_set[0][\"question\"],\n", - " vector_db=vector_db_mini_vertex,\n", - " embedding_client=client, # For embedding generation\n", - " embedding_model=text_embedding_model, # For embedding model\n", - " llm_client=client, # For answer generation,\n", - " top_k=3,\n", - " llm_model=MODEL,\n", - " modality=\"text\",\n", - ")\n", - "display(Markdown(response))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Nb3VytmIyo-1" - }, - "source": [ - "#### Audio" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "kEl80N8VV_6E" - }, - "source": [ - "Now, let's try `modality='audio'` to get audio response." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "STdO_EtxEhFA" - }, - "outputs": [], - "source": [ - "await rag(\n", - " question=question_set[0][\"question\"],\n", - " vector_db=vector_db_mini_vertex,\n", - " embedding_client=client, # For embedding generation\n", - " embedding_model=text_embedding_model, # For embedding model\n", - " llm_client=client, # For answer generation,\n", - " top_k=3,\n", - " llm_model=MODEL,\n", - " modality=\"audio\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "l9NMyJm-_0lM" - }, - "source": [ - "Evaluating Retrieval Augmented Generation (RAG) applications before production is crucial for identifying areas for improvement and ensuring optimal performance.\n", - "Check out the Vertex AI [Gen AI evaluation service](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-overview)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Erp1ImX9Lu1Y" - }, - "source": [ - "## Conclusion" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "W2A4xXWP1EB4" - }, - "source": [ - "Congratulations on making it through this notebook!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Uyc3uq1uYHEN" - }, - "source": [ - "- We have seen how to use the Gemini API in Vertex AI to generate text and Multimodal Live API to generate text and audio output.\n", - "- Developed a fully functional Retrieval Augmented Generation (RAG) pipeline capable of answering questions based on provided documents.\n", - "- Demonstrated the versatility of the RAG architecture by enabling both text and audio output modalities.\n", - "- Ensured the adaptability of the RAG pipeline to various use cases by enabling seamless integration of different context documents.\n", - "- Established a foundation for building more advanced RAG systems leveraging larger document sets and sophisticated indexing/retrieval services like Vertex AI Datastore/Agent Builder and Vertex AI Multimodal Live API." - ] - } - ], - "metadata": { - "colab": { - "name": "real_time_rag_retail_gemini_2_0.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2024 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Real-time Retrieval Augmented Generation (RAG) using the Multimodal Live API with Gemini 2.0\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Open in Colab\n", + "
\n", + "
\n", + " \n", + " \"Google
Open in Colab Enterprise\n", + "
\n", + "
\n", + " \n", + " \"Vertex
Open in Vertex AI Workbench\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
\n", + "\n", + "
\n", + "\n", + "
\n", + "\n", + "
\n", + "
\n", + "Share to:\n", + "\n", + "\n", + " \"LinkedIn\n", + "\n", + "\n", + "\n", + " \"Bluesky\n", + "\n", + "\n", + "\n", + " \"X\n", + "\n", + "\n", + "\n", + " \"Reddit\n", + "\n", + "\n", + "\n", + " \"Facebook\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "84f0f73a0f76" + }, + "source": [ + "| | |\n", + "|-|-|\n", + "| Author(s) | [Deepak Moonat](https://github.com/dmoonat/) |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-MDW_A-nBksi" + }, + "source": [ + "
\n", + "\n", + "⚠️ Gemini 2.0 Flash (Model ID: gemini-2.0-flash-exp) and the Google Gen AI SDK are currently experimental and output can vary ⚠️\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook provides a comprehensive demonstration of the Vertex AI Gemini and Multimodal Live APIs, showcasing text and audio generation capabilities. Users will learn to develop a real-time Retrieval Augmented Generation (RAG) system leveraging the Multimodal Live API for a retail use-case. This system will generate audio and text responses grounded in provided documents. The tutorial covers the following:\n", + "\n", + "- **Gemini API:** Text output generation.\n", + "- **Multimodal Live API:** Text and audio output generation.\n", + "- **Retrieval Augmented Generation (RAG):** Text and audio output generation grounded in provided documents for a retail use-case." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xKVzRJhgJ4EZ" + }, + "source": [ + "### Gemini 2.0\n", + "\n", + "[Gemini 2.0 Flash](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2) is a new multimodal generative ai model from the Gemini family developed by [Google DeepMind](https://deepmind.google/). It now available as an experimental preview release through the Gemini API in Vertex AI and Vertex AI Studio. The model introduces new features and enhanced core capabilities:\n", + "\n", + "- Multimodal Live API: This new API helps you create real-time vision and audio streaming applications with tool use.\n", + "- Speed and performance: Gemini 2.0 Flash is the fastest model in the industry, with a 3x improvement in time to first token (TTFT) over 1.5 Flash.\n", + "- Quality: The model maintains quality comparable to larger models like Gemini 1.5 Pro and GPT-4o.\n", + "- Improved agentic experiences: Gemini 2.0 delivers improvements to multimodal understanding, coding, complex instruction following, and function calling.\n", + "- New Modalities: Gemini 2.0 introduces native image generation and controllable text-to-speech capabilities, enabling image editing, localized artwork creation, and expressive storytelling.\n", + "- To support the new model, we're also shipping an all new SDK that supports simple migration between the Gemini Developer API and the Gemini API in Vertex AI.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "61RBz8LLbxCR" + }, + "source": [ + "## Get started" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "No17Cw5hgx12" + }, + "source": [ + "### Install Dependencies\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ue_G9ZU80ON0" + }, + "source": [ + "- `google-genai`: Google Gen AI python library\n", + "- `PyPDF2`: To read PDFs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tFy3H3aPgx12" + }, + "outputs": [], + "source": [ + "%%capture\n", + "\n", + "%pip install --upgrade --quiet google-genai PyPDF2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R5Xep4W9lq-Z" + }, + "source": [ + "### Restart runtime\n", + "\n", + "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n", + "\n", + "The restart might take a minute or longer. After it's restarted, continue to the next step." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XRvKdaPDTznN" + }, + "outputs": [], + "source": [ + "import IPython\n", + "\n", + "app = IPython.Application.instance()\n", + "app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SbmM4z7FOBpM" + }, + "source": [ + "
\n", + "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dmWOrTJ3gx13" + }, + "source": [ + "### Authenticate your notebook environment (Colab only)\n", + "\n", + "If you're running this notebook on Google Colab, run the cell below to authenticate your environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NyKGtVQjgx13" + }, + "outputs": [], + "source": [ + "import sys\n", + "\n", + "if \"google.colab\" in sys.modules:\n", + " from google.colab import auth\n", + "\n", + " auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DF4l8DTdWgPY" + }, + "source": [ + "### Set Google Cloud project information\n", + "\n", + "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n", + "\n", + "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Nqwi-5ufWp_B" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", + "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", + " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", + "\n", + "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5303c05f7aa6" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6fc324893334" + }, + "outputs": [], + "source": [ + "# For asynchronous operations\n", + "import asyncio\n", + "\n", + "# For data processing\n", + "import glob\n", + "from typing import Any\n", + "\n", + "from IPython.display import Audio, Markdown, display\n", + "import PyPDF2\n", + "\n", + "# For GenerativeAI\n", + "from google import genai\n", + "from google.genai import types\n", + "from google.genai.types import LiveConnectConfig\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "# For similarity score\n", + "from sklearn.metrics.pairwise import cosine_similarity\n", + "\n", + "# For retry mechanism\n", + "from tenacity import retry, stop_after_attempt, wait_random_exponential" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OV5bFDTVE3oX" + }, + "source": [ + "#### Initialize Gen AI client" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3pjBP_V7JqhD" + }, + "source": [ + "- Client for calling the Gemini API in Vertex AI\n", + "- `vertexai=True`, indicates the client should communicate with the Vertex AI API endpoints." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bEhq_4GBEW2a" + }, + "outputs": [], + "source": [ + "# Vertex AI API\n", + "client = genai.Client(\n", + " vertexai=True,\n", + " project=PROJECT_ID,\n", + " location=LOCATION,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e43229f3ad4f" + }, + "source": [ + "### Initialize model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf93d5f0ce00" + }, + "outputs": [], + "source": [ + "MODEL_ID = \"gemini-2.0-flash-exp\" # @param {type:\"string\", isTemplate: true}\n", + "MODEL = (\n", + " f\"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}\"\n", + ")\n", + "\n", + "text_embedding_model = \"text-embedding-004\" # @param {type:\"string\", isTemplate: true}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H4TDOc3aqwuz" + }, + "source": [ + "## Sample Use Case - Retail Customer Support Assistance" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cH6zJeecq6SU" + }, + "source": [ + "Let's imagine a bicycle shop called `Cymbal Bikes` that offers various services like brake repair, chain replacement, and more. Our goal is to create a straightforward support system that can answer customer questions based on the shop's policies and service offerings." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uA3X24j86uE7" + }, + "source": [ + "Having a customer support assistance offers numerous advantages for businesses, ultimately leading to improved customer satisfaction and loyalty, as well as increased profitability. Here are some key benefits:\n", + "\n", + "- Faster Resolution of Issues: Users can quickly find answers to their questions without having to search through store's website.\n", + "- Improved Efficiency: The assistant can handle simple, repetitive questions, freeing up human agents to focus on more complex or strategic tasks.\n", + "- 24/7 Availability: Unlike human colleagues, the assistant is available around the clock, providing immediate support regardless of time zones or working hours.\n", + "- Consistent Information: The assistant provides standardized answers, ensuring consistency and accuracy." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mZZLuCecsp0e" + }, + "source": [ + "#### Context Documents" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nWrK7HHjssqB" + }, + "source": [ + "- Download the documents from Google Cloud Storage bucket\n", + "- These documents are specific to `Cymbal Bikes` store\n", + " - [`Cymbal Bikes Return Policy`](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/documents/CymbalBikesReturnPolicy.pdf): Contains information about return policy\n", + " - [`Cymbal Bikes Services`](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/documents/CymbalBikesServices.pdf): Contains information about services provided by Cymbal Bikes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iLhNfYfYspnC" + }, + "outputs": [], + "source": [ + "!gsutil cp \"gs://github-repo/generative-ai/gemini2/use-cases/retail_rag/documents/CymbalBikesReturnPolicy.pdf\" \"documents/CymbalBikesReturnPolicy.pdf\"\n", + "!gsutil cp \"https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/documents/CymbalBikesServices.pdf\" \"documents/CymbalBikesServices.pdf\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GOFNGNGjjEzD" + }, + "source": [ + "### Text" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QlcEVrUtP9TI" + }, + "source": [ + "- Let's check a specific query to our retail use-case" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eLqbaZjoCzng" + }, + "outputs": [], + "source": [ + "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", + "\n", + "response = client.models.generate_content(\n", + " model=MODEL_ID,\n", + " contents=query,\n", + ")\n", + "\n", + "display(Markdown(response.text))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-D6q7KUDuH-E" + }, + "source": [ + "> The correct answer to the query is `A basic tune-up costs $100.`\n", + "\n", + "![BasicTuneUp](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/images/BasicTuneUp.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uoigEKWkQjwi" + }, + "source": [ + "- You can see, the model is unable to answer it correctly, as it's very specific to our hypothetical use-case. However, it does provide some details to get the answer from the internet.\n", + "\n", + "- Without the necessary context, the model's response is essentially a guess and may not align with the desired information.\n", + "\n", + "- LLM is trained on vast amount of data, which leads to hallucinations. To overcome this challenge, in coming sections we'll look into how to ground the answers using Retrieval Augmented Generation (RAG)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nhzKqZdunwYJ" + }, + "source": [ + "## Grounding" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kzNcDkRevJi3" + }, + "source": [ + "Grounding is crucial in this scenario because the model needs to access and process relevant information from external sources (the \"Cymbal Bikes Return Policy\" and \"Cymbal Bikes Services\" documents) to answer specific queries accurately. Without grounding, the model relies solely on its pre-trained knowledge, which may not contain the specific details about the bike store's policies.\n", + "\n", + "In the example, the question about the return policy for bike helmets at Cymbal Bikes cannot be answered correctly without accessing the provided documents. The model's general knowledge of return policies is insufficient. Grounding allows the model to:\n", + "\n", + "1. **Retrieve relevant information:** The system must first locate the pertinent sections within the provided documents that address the user's question about bike helmet returns.\n", + "\n", + "2. **Process and synthesize information:** After retrieving relevant passages, the model must then understand and synthesize the information to construct an accurate answer.\n", + "\n", + "3. **Generate a grounded response:** Finally, the response needs to be directly derived from the factual content of the documents. This ensures accuracy and avoids hallucinations – generating incorrect or nonsensical information not present in the source documents.\n", + "\n", + "Without grounding, the model is forced to guess or extrapolate from its general knowledge, which can lead to inaccurate or misleading responses. The grounding process makes the model's responses more reliable and trustworthy, especially for domain-specific knowledge like store policies or procedures.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-SyokS1pUR9O" + }, + "source": [ + "## Multimodal Live API" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pwZeOc5-UXKD" + }, + "source": [ + "The multimodal live API enables you to build low-latency, multi-modal applications. It currently supports text as input and text & audio as output.\n", + "\n", + "- Low Latency, where audio output is required, where the Text-to-Speech step can be skipped\n", + "- Provides a more interactive user experience.\n", + "- Suitable for applications requiring immediate audio feedback" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "See the [Multimodal Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live) page for more details." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aS1zTjSMcij2" + }, + "source": [ + "#### Asynchronous (async) operation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iH9CBOpncnK8" + }, + "source": [ + "When to use async calls:\n", + "1. **I/O-bound operations**: When your code spends a significant amount of time waiting for external resources\n", + " (e.g., network requests, file operations, database queries). Async allows other tasks to run while waiting. \n", + " This is especially beneficial for real-time applications or when dealing with multiple concurrent requests.\n", + " \n", + " Example:\n", + " - Fetching data from a remote server.\n", + "\n", + "2. **Parallel tasks**: When you have independent tasks that can run concurrently without blocking each other. Async\n", + " allows you to efficiently utilize multiple CPU cores or network connections.\n", + " \n", + " Example:\n", + " - Processing a large number of prompts and generating audio for each.\n", + "\n", + "\n", + "3. **User interfaces**: In applications with graphical user interfaces (GUIs), async operations prevent the UI from\n", + " freezing while performing long-running tasks. Users can interact with the interface even when background\n", + " operations are active.\n", + " \n", + " Example: \n", + " - A chatbot interacting in real time, where an audio response is generated in the background.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aB4U6s1-UlFw" + }, + "source": [ + "### Text" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YvUJzbgPM26m" + }, + "source": [ + "For text generation, you need to set the `response_modalities` to `TEXT`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YQOurRs5UU9p" + }, + "outputs": [], + "source": [ + "async def generate_content(query: str) -> str:\n", + " \"\"\"Function to generate text content using Gemini live API.\n", + "\n", + " Args:\n", + " query: The query to generate content for.\n", + "\n", + " Returns:\n", + " The generated content.\n", + " \"\"\"\n", + " config = LiveConnectConfig(response_modalities=[\"TEXT\"])\n", + "\n", + " async with client.aio.live.connect(model=MODEL, config=config) as session:\n", + "\n", + " await session.send(input=query, end_of_turn=True)\n", + "\n", + " response = []\n", + " async for message in session.receive():\n", + " try:\n", + " response.append(message.server_content.model_turn.parts[0].text)\n", + " except AttributeError:\n", + " pass\n", + "\n", + " if message.server_content.turn_complete:\n", + " response = \"\".join(str(x) for x in response)\n", + " return response" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ye1TwWVaVSxF" + }, + "source": [ + "- Try a specific query" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gGqsp6nFDNsG" + }, + "outputs": [], + "source": [ + "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", + "\n", + "response = await generate_content(query)\n", + "display(Markdown(response))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "roXuCp_cXE9q" + }, + "source": [ + "### Audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lBnz34QaakVM" + }, + "source": [ + "- For audio generation, you need to set the `response_modalities` to `AUDIO`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BmLuvxnFbC4Z" + }, + "outputs": [], + "source": [ + "async def generate_audio_content(query: str):\n", + " \"\"\"Function to generate audio response for provided query using Gemini Multimodal Live API.\n", + "\n", + " Args:\n", + " query: The query to generate audio response for.\n", + "\n", + " Returns:\n", + " The audio response.\n", + " \"\"\"\n", + " config = LiveConnectConfig(response_modalities=[\"AUDIO\"])\n", + " async with client.aio.live.connect(model=MODEL, config=config) as session:\n", + "\n", + " await session.send(input=query, end_of_turn=True)\n", + "\n", + " audio_parts = []\n", + " async for message in session.receive():\n", + " if message.server_content.model_turn:\n", + " for part in message.server_content.model_turn.parts:\n", + " audio_parts.append(\n", + " np.frombuffer(part.inline_data.data, dtype=np.int16)\n", + " )\n", + "\n", + " if message.server_content.turn_complete:\n", + " if audio_parts:\n", + " audio_data = np.concatenate(audio_parts, axis=0)\n", + " await asyncio.sleep(0.4)\n", + " display(Audio(audio_data, rate=24000, autoplay=True))\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xKQ_l6wiLH_w" + }, + "source": [ + "In this example, you send a text prompt and request the model response in audio." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rXJRoxUAcFVB" + }, + "source": [ + "- Let's check the same query as before" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CfZy_XZeDUtS" + }, + "outputs": [], + "source": [ + "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", + "\n", + "await generate_audio_content(query)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "clfXp2PZmxDZ" + }, + "source": [ + "- Model is unable to answer the query, but with the Multimodal Live API, it doesn't hallucinate, which is pretty good!!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wT2oB1BOqDYP" + }, + "source": [ + "### Continuous Audio Interaction (Not multiturn)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T4iAJCstqR5s" + }, + "source": [ + " - Below function generates audio output based on the provided text prompt.\n", + " - The generated audio is displayed using `IPython.display.Audio`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bZntNTPiYLA8" + }, + "source": [ + "- Input your prompts (type `q` or `quit` or `exit` to exit).\n", + "- Example prompts:\n", + " - Hello\n", + " - Who are you?\n", + " - What's the largest planet in our solar system?\n", + " - Tell me 3 fun facts about the universe?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7M0zkHNrOBQf" + }, + "outputs": [], + "source": [ + "async def continuous_audio_generation():\n", + " \"\"\"Continuously generates audio responses for the asked queries.\"\"\"\n", + " while True:\n", + " query = input(\"Your query > \")\n", + " if any(query.lower() in s for s in [\"q\", \"quit\", \"exit\"]):\n", + " break\n", + " await generate_audio_content(query)\n", + "\n", + "\n", + "await continuous_audio_generation()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QX9k92TlJ864" + }, + "source": [ + "## Enhancing LLM Accuracy with RAG" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oOJ-Wx18hpju" + }, + "source": [ + "We'll be showcasing the design pattern for how to implement Real-time Retrieval Augmented Generation (RAG) using Gemini 2.0 multimodal live API.\n", + "\n", + "- Multimodal live API uses websockets to communicate over the internet\n", + "- It maintains a continuous connection\n", + "- Ideal for real-time applications which require persistent communication\n", + "\n", + "\n", + "> Note: Replicating real-life scenarios with Python can be challenging within the constraints of a Colab environment.\n", + "\n", + "\n", + "However, the flow shown in this section can be modified for streaming audio input and output.\n", + "\n", + "
\n", + "\n", + "We'll build the RAG pipeline from scratch to help you understand each and every components of the pipeline.\n", + "\n", + "There are other ways to build the RAG pipeline using open source tools such as [LangChain](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb), [LlamaIndex](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/llamaindex_rag.ipynb) etc." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u5CXTtsPEyJ0" + }, + "source": [ + "### Context Documents" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vvdcw1AOg4se" + }, + "source": [ + "- Documents are the building blocks of any RAG pipeline, as it provides the relevant context needed to ground the LLM responses\n", + "- We'll be using the documents already downloaded at the start of the notebook\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "M22BSDb2Xxpb" + }, + "outputs": [], + "source": [ + "documents = glob.glob(\"documents/*\")\n", + "documents" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zNpUL7t0e054" + }, + "source": [ + "### Retrieval Augmented Generation Architecture" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vV5Et4YHbqqE" + }, + "source": [ + "In general, RAG architecture consists of the following components\n", + "\n", + "**Data Preparation**\n", + "1. Chunking: Dividing the document into smaller, manageable pieces for processing.\n", + "2. Embedding: Transforming text chunks into numerical vectors representing semantic meaning.\n", + "3. Indexing: Organizing embeddings for efficient similarity search." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "563756fa3b7f" + }, + "source": [ + "![RAGArchitecture](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/images/RAGArchitecture.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pf4sXzYUby57" + }, + "source": [ + "**Inference**\n", + "1. Retrieval: Finding the most relevant chunks based on the query embedding.\n", + "2. Query Augmentation: Enhancing the query with retrieved context for improved generation.\n", + "3. Generation: Synthesizing a coherent and informative answer based on the augmented query." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1a30b41b63f1" + }, + "source": [ + "![LiveAPI](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/images/LiveAPI.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M-0zlJ3_FRfa" + }, + "source": [ + "#### Document Embedding and Indexing" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0fY3xLaFKBIS" + }, + "source": [ + "Following blocks of code shows how to process unstructured data(PDFs), extract text, and divide them into smaller chunks for efficient embedding and retrieval." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JTTOQ35Ia-V2" + }, + "source": [ + "- Embeddings:\n", + " - Numerical representations of text\n", + " - It capture the semantic meaning and context of the text\n", + " - We'll use Vertex AI's text embedding model to generate embeddings\n", + " - Error handling (like the retry mechanism) during embedding generation due to potential API quota limits.\n", + "\n", + "- Indexing:\n", + " - Build a searchable index from embeddings, enabling efficient similarity search.\n", + " - For example, the index is like a detailed table of contents for a massive reference book.\n", + "\n", + "\n", + "Check out the Google Cloud Platform [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings) for detailed understanding and example use-cases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Vun69x23FWiw" + }, + "outputs": [], + "source": [ + "@retry(wait=wait_random_exponential(multiplier=1, max=120), stop=stop_after_attempt(4))\n", + "def get_embeddings(\n", + " embedding_client: Any, embedding_model: str, text: str, output_dim: int = 768\n", + ") -> list[float]:\n", + " \"\"\"\n", + " Generate embeddings for text with retry logic for API quota management.\n", + "\n", + " Args:\n", + " embedding_client: The client object used to generate embeddings.\n", + " embedding_model: The name of the embedding model to use.\n", + " text: The text for which to generate embeddings.\n", + " output_dim: The desired dimensionality of the output embeddings (default is 768).\n", + "\n", + " Returns:\n", + " A list of floats representing the generated embeddings. Returns None if a \"RESOURCE_EXHAUSTED\" error occurs.\n", + "\n", + " Raises:\n", + " Exception: Any exception encountered during embedding generation, excluding \"RESOURCE_EXHAUSTED\" errors.\n", + " \"\"\"\n", + " try:\n", + " response = embedding_client.models.embed_content(\n", + " model=embedding_model,\n", + " contents=[text],\n", + " config=types.EmbedContentConfig(output_dimensionality=output_dim),\n", + " )\n", + " return [response.embeddings[0].values]\n", + " except Exception as e:\n", + " if \"RESOURCE_EXHAUSTED\" in str(e):\n", + " return None\n", + " print(f\"Error generating embeddings: {str(e)}\")\n", + " raise" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2csDY5NsswwJ" + }, + "source": [ + "- The code block executes the following steps:\n", + "\n", + " - Extracts text from PDF documents and segments it into smaller chunks for processing.\n", + " - Employs a Vertex AI model to transform each text chunk into a numerical embedding vector, facilitating semantic representation and search.\n", + " - Constructs a Pandas DataFrame to store the embeddings, enriched with metadata such as document name and page number, effectively creating a searchable index for efficient retrieval.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9TJlvdIsRfmX" + }, + "outputs": [], + "source": [ + "def build_index(\n", + " document_paths: list[str],\n", + " embedding_client: Any,\n", + " embedding_model: str,\n", + " chunk_size: int = 512,\n", + ") -> pd.DataFrame:\n", + " \"\"\"\n", + " Build searchable index from a list of PDF documents with page-wise processing.\n", + "\n", + " Args:\n", + " document_paths: A list of file paths to PDF documents.\n", + " embedding_client: The client object used to generate embeddings.\n", + " embedding_model: The name of the embedding model to use.\n", + " chunk_size: The maximum size (in characters) of each text chunk. Defaults to 512.\n", + "\n", + " Returns:\n", + " A Pandas DataFrame where each row represents a text chunk. The DataFrame includes columns for:\n", + " - 'document_name': The path to the source PDF document.\n", + " - 'page_number': The page number within the document.\n", + " - 'page_text': The full text of the page.\n", + " - 'chunk_number': The chunk number within the page.\n", + " - 'chunk_text': The text content of the chunk.\n", + " - 'embeddings': The embedding vector for the chunk.\n", + "\n", + " Raises:\n", + " ValueError: If no chunks are created from the input documents.\n", + " Exception: Any exceptions encountered during file processing are printed to the console and the function continues to the next document.\n", + " \"\"\"\n", + " all_chunks = []\n", + "\n", + " for doc_path in document_paths:\n", + " try:\n", + " with open(doc_path, \"rb\") as file:\n", + " pdf_reader = PyPDF2.PdfReader(file)\n", + "\n", + " for page_num in range(len(pdf_reader.pages)):\n", + " page = pdf_reader.pages[page_num]\n", + " page_text = page.extract_text()\n", + "\n", + " chunks = [\n", + " page_text[i : i + chunk_size]\n", + " for i in range(0, len(page_text), chunk_size)\n", + " ]\n", + "\n", + " for chunk_num, chunk_text in enumerate(chunks):\n", + " embeddings = get_embeddings(\n", + " embedding_client, embedding_model, chunk_text\n", + " )\n", + "\n", + " if embeddings is None:\n", + " print(\n", + " f\"Warning: Could not generate embeddings for chunk {chunk_num} on page {page_num + 1}\"\n", + " )\n", + " continue\n", + "\n", + " chunk_info = {\n", + " \"document_name\": doc_path,\n", + " \"page_number\": page_num + 1,\n", + " \"page_text\": page_text,\n", + " \"chunk_number\": chunk_num,\n", + " \"chunk_text\": chunk_text,\n", + " \"embeddings\": embeddings,\n", + " }\n", + " all_chunks.append(chunk_info)\n", + "\n", + " except Exception as e:\n", + " print(f\"Error processing document {doc_path}: {str(e)}\")\n", + " continue\n", + "\n", + " if not all_chunks:\n", + " raise ValueError(\"No chunks were created from the documents\")\n", + "\n", + " return pd.DataFrame(all_chunks)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yFGsl-Zvlej6" + }, + "source": [ + "Let's create embeddings and an index using the provided documents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hjl5FDQckDcO" + }, + "outputs": [], + "source": [ + "vector_db_mini_vertex = build_index(\n", + " documents, embedding_client=client, embedding_model=text_embedding_model\n", + ")\n", + "vector_db_mini_vertex" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pZLX5ozMlxTX" + }, + "outputs": [], + "source": [ + "# Index size\n", + "vector_db_mini_vertex.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cvNVn3kT9FiB" + }, + "outputs": [], + "source": [ + "# Example of how a chunk looks like\n", + "vector_db_mini_vertex.loc[0, \"chunk_text\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Hul4bjAkBkg0" + }, + "source": [ + "To enhance the performance of retrieval systems, consider the following:\n", + "\n", + "- Optimize chunk size selection to balance granularity and context.\n", + "- Evaluate various chunking strategies to identify the most effective approach for your data.\n", + "- Explore managed services and scalable indexing solutions, such as [Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/create-datastore-ingest), to enhance performance and efficiency." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "43txjyVlHT6v" + }, + "source": [ + "#### Retrieval" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y92jM-v8KBfV" + }, + "source": [ + "The below code demonstrates how to query the index and uses a cosine similarity measure for comparing query vectors against the index. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bI1YsFoKtyxY" + }, + "source": [ + "* **Input:** Accepts a query string and parameters like the number of relevant chunks to return.\n", + "* **Embedding Generation:** Generates an embedding for the input query using the same model used to embed the document chunks.\n", + "* **Similarity Search:** Compares the query embedding to the embeddings of all indexed document chunks, using cosine similarity. Could use other distance metrics as well.\n", + "* **Ranking:** Ranks the chunks based on their similarity scores to the query.\n", + "* **Top-k Retrieval:** Returns the top *k* most similar chunks, where *k* is specified by the input parameters. This could be configurable.\n", + "* **Output:** Returns a list of relevant chunks, potentially including the original chunk text, similarity score, document source (filename, page number), and chunk metadata.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "88ndL_2wJ5ZD" + }, + "outputs": [], + "source": [ + "def get_relevant_chunks(\n", + " query: str,\n", + " vector_db: pd.DataFrame,\n", + " embedding_client: Any,\n", + " embedding_model: str,\n", + " top_k: int = 3,\n", + ") -> str:\n", + " \"\"\"\n", + " Retrieve the most relevant document chunks for a query using similarity search.\n", + "\n", + " Args:\n", + " query: The search query string.\n", + " vector_db: A pandas DataFrame containing the vectorized document chunks.\n", + " It must contain columns named 'embeddings', 'document_name',\n", + " 'page_number', and 'chunk_text'.\n", + " The 'embeddings' column should contain lists or numpy arrays\n", + " representing the embeddings.\n", + " embedding_client: The client object used to generate embeddings.\n", + " embedding_model: The name of the embedding model to use.\n", + " top_k: The number of most similar chunks to retrieve. Defaults to 3.\n", + "\n", + " Returns:\n", + " A formatted string containing the top_k most relevant chunks. Each chunk is\n", + " presented with its page number and chunk number. Returns an error message if\n", + " the query processing fails or if an error occurs during chunk retrieval.\n", + "\n", + " Raises:\n", + " Exception: If any error occurs during the process (e.g., issues with the embedding client,\n", + " data format problems in the vector database).\n", + " The specific error is printed to the console.\n", + " \"\"\"\n", + " try:\n", + " query_embedding = get_embeddings(embedding_client, embedding_model, query)\n", + "\n", + " if query_embedding is None:\n", + " return \"Could not process query due to quota issues\"\n", + "\n", + " similarities = [\n", + " cosine_similarity(query_embedding, chunk_emb)[0][0]\n", + " for chunk_emb in vector_db[\"embeddings\"]\n", + " ]\n", + "\n", + " top_indices = np.argsort(similarities)[-top_k:]\n", + " relevant_chunks = vector_db.iloc[top_indices]\n", + "\n", + " context = []\n", + " for _, row in relevant_chunks.iterrows():\n", + " context.append(\n", + " {\n", + " \"document_name\": row[\"document_name\"],\n", + " \"page_number\": row[\"page_number\"],\n", + " \"chunk_number\": row[\"chunk_number\"],\n", + " \"chunk_text\": row[\"chunk_text\"],\n", + " }\n", + " )\n", + "\n", + " return \"\\n\\n\".join(\n", + " [\n", + " f\"[Page {chunk['page_number']}, Chunk {chunk['chunk_number']}]: {chunk['chunk_text']}\"\n", + " for chunk in context\n", + " ]\n", + " )\n", + "\n", + " except Exception as e:\n", + " print(f\"Error getting relevant chunks: {str(e)}\")\n", + " return \"Error retrieving relevant chunks\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3hxyLlTjsstI" + }, + "source": [ + "Let's test out our retrieval component" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ek4aF0Esck2H" + }, + "source": [ + "- Let's try the same query for which the model was not able to answer earlier, due to lack of context" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lSd8ZeH6D7m4" + }, + "outputs": [], + "source": [ + "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", + "relevant_context = get_relevant_chunks(\n", + " query, vector_db_mini_vertex, client, text_embedding_model, top_k=3\n", + ")\n", + "relevant_context" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YBxnXReUn8Iy" + }, + "source": [ + "- You can see, with the help of the relevant context we can derive the answer as it contains the chunks specific to the asked query.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "25eb6422c9cf" + }, + "source": [ + "![Context](https://storage.googleapis.com/github-repo/generative-ai/gemini2/use-cases/retail_rag/images/Context.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kHzw7_UwzutC" + }, + "source": [ + "For optimal performance, consider these points:\n", + "\n", + "* **Context Window:** Considers a context window around the retrieved chunks to provide more comprehensive context. This could involve returning neighboring chunks or a specified window size.\n", + "* **Filtering:** Option to filter retrieved chunks based on criteria like minimum similarity score or source document.\n", + "* **Efficiency:** Designed for efficient retrieval, especially for large indexes, potentially using optimized search algorithms or data structures." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZEfJkwSqJ5KR" + }, + "source": [ + "### Generation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b7OZpv33KBx_" + }, + "source": [ + "* **Contextual Answer Synthesis:** The core function of the generation component is to synthesize a coherent and informative answer based on the retrieved context. It takes the user's query and the relevant document chunks as input.\n", + "* **Large Language Model (LLM) Integration:** It leverages a large language model (LLM) to generate the final answer. The LLM processes both the query and the retrieved context to produce a response. The quality of the answer heavily relies on the capabilities of the chosen LLM.\n", + "* **Coherence and Relevance:** A good generation function ensures the generated answer is coherent, factually accurate, and directly addresses the user's query, using only the provided context. It avoids hallucinations (generating information not present in the context).\n", + "* **Prompt Engineering:** The effectiveness of the LLM is heavily influenced by the prompt. The generation function likely incorporates prompt engineering techniques to guide the LLM towards generating the desired output. This may involve carefully crafting instructions for the LLM or providing examples.\n", + "\n", + "For more details on prompt engineering, check out the [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/prompt-design-strategies)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0xs-AQmqm03l" + }, + "source": [ + "Let's see two use-cases, `Text-In-Text-Out` and `Text-In-Audio-Out`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xp7doymTJ7Iu" + }, + "outputs": [], + "source": [ + "@retry(wait=wait_random_exponential(multiplier=1, max=120), stop=stop_after_attempt(4))\n", + "async def generate_answer(\n", + " query: str, context: str, llm_client: Any, modality: str = \"text\"\n", + ") -> str:\n", + " \"\"\"\n", + " Generate answer using LLM with retry logic for API quota management.\n", + "\n", + " Args:\n", + " query: User query.\n", + " context: Relevant text providing context for the query.\n", + " llm_client: Client for accessing LLM API.\n", + " modality: Output modality (text or audio).\n", + "\n", + " Returns:\n", + " Generated answer.\n", + "\n", + " Raises:\n", + " Exception: If an unexpected error occurs during the LLM call (after retry attempts are exhausted).\n", + " \"\"\"\n", + " try:\n", + " # If context indicates earlier quota issues, return early\n", + " if context in [\n", + " \"Could not process query due to quota issues\",\n", + " \"Error retrieving relevant chunks\",\n", + " ]:\n", + " return \"Can't Process, Quota Issues\"\n", + "\n", + " prompt = f\"\"\"Based on the following context, please answer the question.\n", + "\n", + " Context:\n", + " {context}\n", + "\n", + " Question: {query}\n", + "\n", + " Answer:\"\"\"\n", + "\n", + " if modality == \"text\":\n", + " # Generate text answer using LLM\n", + " response = await generate_content(prompt)\n", + " return response\n", + "\n", + " elif modality == \"audio\":\n", + " # Generate audio answer using LLM\n", + " await generate_audio_content(prompt)\n", + "\n", + " except Exception as e:\n", + " if \"RESOURCE_EXHAUSTED\" in str(e):\n", + " return \"Can't Process, Quota Issues\"\n", + " print(f\"Error generating answer: {str(e)}\")\n", + " return \"Error generating answer\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "11q0Sf0oJ7wL" + }, + "source": [ + "Let's test our `Generation` component" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "S-iesR2BEHnI" + }, + "outputs": [], + "source": [ + "query = \"What is the price of a basic tune-up at Cymbal Bikes?\"\n", + "\n", + "generated_answer = await generate_answer(\n", + " query, relevant_context, client, modality=\"text\"\n", + ")\n", + "display(Markdown(generated_answer))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "W7EHYeP-EMpN" + }, + "outputs": [], + "source": [ + "await generate_answer(query, relevant_context, client, modality=\"audio\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CbQB5PbMrrsB" + }, + "source": [ + "> And the answer is... CORRECT !! 🎉" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1gnr-j-ocxlx" + }, + "source": [ + "- The accuracy of the generated answer is attributed to the provision of relevant context to the Large Language Model (LLM), enabling it to effectively comprehend the query and produce an appropriate response." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2MNlAoAHR0Do" + }, + "source": [ + "### Pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8LemsW6WrOfm" + }, + "source": [ + "Let's put `Retrieval` and `Generation` components together in a pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yoOeqxETR2G_" + }, + "outputs": [], + "source": [ + "async def rag(\n", + " question: str,\n", + " vector_db: pd.DataFrame,\n", + " embedding_client: Any,\n", + " embedding_model: str,\n", + " llm_client: Any,\n", + " top_k: int,\n", + " llm_model: str,\n", + " modality: str = \"text\",\n", + ") -> str | None:\n", + " \"\"\"\n", + " RAG Pipeline.\n", + "\n", + " Args:\n", + " question: User query.\n", + " vector_db: DataFrame containing document chunks and embeddings.\n", + " embedding_client: Client for accessing embedding API.\n", + " embedding_model: Name of the embedding model.\n", + " llm_client: Client for accessing LLM API.\n", + " top_k: The number of top relevant chunks to retrieve from the vector database.\n", + " llm_model: Name of the LLM model.\n", + " modality: Output modality (text or audio).\n", + "\n", + " Returns:\n", + " For text modality, generated answer.\n", + " For audio modality, audio playback widget.\n", + "\n", + " Raises:\n", + " Exception: Catches and prints any exceptions during processing. Returns an error message.\n", + " \"\"\"\n", + "\n", + " try:\n", + " # Get relevant context for question\n", + " relevant_context = get_relevant_chunks(\n", + " question, vector_db, embedding_client, embedding_model, top_k=top_k\n", + " )\n", + "\n", + " if modality == \"text\":\n", + " # Generate text answer using LLM\n", + " generated_answer = await generate_answer(\n", + " question,\n", + " relevant_context,\n", + " llm_client,\n", + " )\n", + " return generated_answer\n", + "\n", + " elif modality == \"audio\":\n", + " # Generate audio answer using LLM\n", + " await generate_answer(\n", + " question, relevant_context, llm_client, modality=modality\n", + " )\n", + " return\n", + "\n", + " except Exception as e:\n", + " print(f\"Error processing question '{question}': {str(e)}\")\n", + " return {\"question\": question, \"generated_answer\": \"Error processing question\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q8bNzUvbVJcx" + }, + "source": [ + "Our Retrieval Augmented Generation (RAG) architecture allows for flexible output modality(text and audio) selection. By modifying only the generation component, we can produce both text and audio output while maintaining the same retrieval mechanism. This highlights the adaptability of RAG in catering to diverse content presentation needs." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Pkn75-1cFW1J" + }, + "source": [ + "### Inference" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QMGtlPWcVXT0" + }, + "source": [ + "Let's test our simple RAG pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0vwfQbodn89Y" + }, + "source": [ + "#### Sample Queries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Zx_GwXESk9aP" + }, + "outputs": [], + "source": [ + "question_set = [\n", + " {\n", + " \"question\": \"What is the price of a basic tune-up at Cymbal Bikes?\",\n", + " \"answer\": \"A basic tune-up costs $100.\",\n", + " },\n", + " {\n", + " \"question\": \"How much does it cost to replace a tire at Cymbal Bikes?\",\n", + " \"answer\": \"Replacing a tire at Cymbal Bikes costs $50 per tire.\",\n", + " },\n", + " {\n", + " \"question\": \"What does gear repair at Cymbal Bikes include?\",\n", + " \"answer\": \"Gear repair includes inspection and repair of the gears, including replacement of chainrings, cogs, and cables as needed.\",\n", + " },\n", + " {\n", + " \"question\": \"What is the cost of replacing a tube at Cymbal Bikes?\",\n", + " \"answer\": \"Replacing a tube at Cymbal Bikes costs $20.\",\n", + " },\n", + " {\n", + " \"question\": \"Can I return clothing items to Cymbal Bikes?\",\n", + " \"answer\": \"Clothing can only be returned if it is unworn and in the original packaging.\",\n", + " },\n", + " {\n", + " \"question\": \"What is the time frame for returning items to Cymbal Bikes?\",\n", + " \"answer\": \"Cymbal Bikes offers a 30-day return policy on all items.\",\n", + " },\n", + " {\n", + " \"question\": \"Can I return edible items like energy gels?\",\n", + " \"answer\": \"No, edible items are not returnable.\",\n", + " },\n", + " {\n", + " \"question\": \"How can I return an item purchased online from Cymbal Bikes?\",\n", + " \"answer\": \"Items purchased online can be returned to any Cymbal Bikes store or mailed back.\",\n", + " },\n", + " {\n", + " \"question\": \"What should I include when returning an item to Cymbal Bikes?\",\n", + " \"answer\": \"Please include the original receipt and a copy of your shipping confirmation when returning an item.\",\n", + " },\n", + " {\n", + " \"question\": \"Does Cymbal Bikes offer refunds for shipping charges?\",\n", + " \"answer\": \"Cymbal Bikes does not offer refunds for shipping charges, except for defective items.\",\n", + " },\n", + " {\n", + " \"question\": \"How do I process a return for a defective item at Cymbal Bikes?\",\n", + " \"answer\": \"To process a return for a defective item, please contact Cymbal Bikes first.\",\n", + " },\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZUo_fcNzoAp3" + }, + "source": [ + "#### Text" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y1RC5-djV0-r" + }, + "source": [ + "First we will try, `modality='text'`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "dmyN-h18EZdT" + }, + "outputs": [], + "source": [ + "question_set[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-f3hsHqBEbwc" + }, + "outputs": [], + "source": [ + "response = await rag(\n", + " question=question_set[0][\"question\"],\n", + " vector_db=vector_db_mini_vertex,\n", + " embedding_client=client, # For embedding generation\n", + " embedding_model=text_embedding_model, # For embedding model\n", + " llm_client=client, # For answer generation,\n", + " top_k=3,\n", + " llm_model=MODEL,\n", + " modality=\"text\",\n", + ")\n", + "display(Markdown(response))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Nb3VytmIyo-1" + }, + "source": [ + "#### Audio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kEl80N8VV_6E" + }, + "source": [ + "Now, let's try `modality='audio'` to get audio response." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "STdO_EtxEhFA" + }, + "outputs": [], + "source": [ + "await rag(\n", + " question=question_set[0][\"question\"],\n", + " vector_db=vector_db_mini_vertex,\n", + " embedding_client=client, # For embedding generation\n", + " embedding_model=text_embedding_model, # For embedding model\n", + " llm_client=client, # For answer generation,\n", + " top_k=3,\n", + " llm_model=MODEL,\n", + " modality=\"audio\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l9NMyJm-_0lM" + }, + "source": [ + "Evaluating Retrieval Augmented Generation (RAG) applications before production is crucial for identifying areas for improvement and ensuring optimal performance.\n", + "Check out the Vertex AI [Gen AI evaluation service](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-overview)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Erp1ImX9Lu1Y" + }, + "source": [ + "## Conclusion" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W2A4xXWP1EB4" + }, + "source": [ + "Congratulations on making it through this notebook!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Uyc3uq1uYHEN" + }, + "source": [ + "- We have seen how to use the Gemini API in Vertex AI to generate text and Multimodal Live API to generate text and audio output.\n", + "- Developed a fully functional Retrieval Augmented Generation (RAG) pipeline capable of answering questions based on provided documents.\n", + "- Demonstrated the versatility of the RAG architecture by enabling both text and audio output modalities.\n", + "- Ensured the adaptability of the RAG pipeline to various use cases by enabling seamless integration of different context documents.\n", + "- Established a foundation for building more advanced RAG systems leveraging larger document sets and sophisticated indexing/retrieval services like Vertex AI Datastore/Agent Builder and Vertex AI Multimodal Live API." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What's next\n", + "\n", + "- Learn how to [build a web application that enables you to use your voice and camera to talk to Gemini 2.0 through the Multimodal Live API.](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/multimodal-live-api/websocket-demo-app)\n", + "- See the [Multimodal Live API reference docs](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live).\n", + "- See the [Google Gen AI SDK reference docs](https://googleapis.github.io/python-genai/).\n", + "- Explore other notebooks in the [Google Cloud Generative AI GitHub repository](https://github.com/GoogleCloudPlatform/generative-ai)." + ] + } + ], + "metadata": { + "colab": { + "name": "real_time_rag_retail_gemini_2_0.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 }