diff --git a/ai-ml/gemini-multimodal/vertex_ai_practical_multimodal_use_cases_with_gemini.ipynb b/ai-ml/gemini-multimodal/vertex_ai_practical_multimodal_use_cases_with_gemini.ipynb new file mode 100644 index 0000000..d4dd50f --- /dev/null +++ b/ai-ml/gemini-multimodal/vertex_ai_practical_multimodal_use_cases_with_gemini.ipynb @@ -0,0 +1,1055 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "VEqbX8OhE8y9", + "tags": [] + }, + "source": [ + "# Practical multimodal use cases with Gemini on Vertex AI\n", + "\n", + "\n", + "\n", + " \n", + "
\n", + " \n", + " \"Google
Run in Colab\n", + "
\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "| | |\n", + "|-|-|\n", + "|Author(s) | [Eric Dong](https://github.com/gericdong)|" + ], + "metadata": { + "id": "8j-NqQ1IX06A" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VK1Q5ZYdVL4Y" + }, + "source": [ + "## Overview\n", + "\n", + "What are the applications of multimodality with Gemini? This session will cover a variety of different multimodal use cases for text, images, and video, and provide some ideas on how to apply multimodality to practical business scenarios.\n", + "\n", + "\n", + "In this session, you will learn how to use the Python SDK to work with Gemini 1.5's native multimodality and long context window capabilities. You'll explore:\n", + "\n", + "- **Single modality**: Working with text, PDF, image, audio and video inputs individually.\n", + "- **Multimodality**: Combining different input types for more complex interactions.\n", + "- **Real-world use case**: A practical e-commerce example to demonstrate Gemini's capabilities.\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "9naL8y_ie4oG" + } + }, + { + "cell_type": "markdown", + "source": [ + "๐Ÿงก There are two levels of API platforms for Gemini API:\n", + "\n", + "- **Google AI for Developers**: Experiment, prototype, and deploy small projects.\n", + "\n", + "- **Vertex AI**: Build enterprise-ready projects on Google Cloud โœ…\n", + "\n", + "This notebook uses the Vertex AI to explore multimodal use cases with Gemini." + ], + "metadata": { + "id": "41y3sqIJW79s" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QDU0XJ1xRDlL" + }, + "source": [ + "## Getting Started\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N5afkyDMSBW5" + }, + "source": [ + "### Install Vertex AI AI SDK and other required packages\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kc4WxYmLSBW5", + "tags": [] + }, + "outputs": [], + "source": [ + "%%capture\n", + "\n", + "!pip install google-cloud-aiplatform" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Authentication" + ], + "metadata": { + "id": "aCFVWORUbuxx" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JNDLW6CNA7dY" + }, + "outputs": [], + "source": [ + "from google.colab import auth\n", + "\n", + "\n", + "# Authenticate Colab user to Google Cloud\n", + "auth.authenticate_user()\n", + "\n", + "# Define Google Cloud project information\n", + "PROJECT_ID = \"\" # @param {type:\"string\"}\n", + "LOCATION = \"us-central1\" # @param {type:\"string\"}\n", + "\n", + "# Initialize Vertex AI\n", + "import vertexai\n", + "\n", + "vertexai.init(project=PROJECT_ID, location=LOCATION)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Import libraries" + ], + "metadata": { + "id": "CIJcdXkveAOZ" + } + }, + { + "cell_type": "code", + "source": [ + "import vertexai.generative_models as genai\n", + "\n", + "import PIL.Image\n", + "from IPython.display import display, Markdown, Latex, Image, Audio, Video\n", + "from IPython.core.interactiveshell import InteractiveShell\n", + "\n", + "InteractiveShell.ast_node_interactivity = \"all\"" + ], + "metadata": { + "id": "tczKqvnjMSyA" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Gemini models\n", + "\n", + "The Gemini family of models are the most general and capable AI models we've ever built.\n", + "\n", + "- Ultra โ€” Our largest model for highly complex tasks.\n", + "- Pro โ€” Our best model for general performance across a wide range of tasks.\n", + "- Flash โ€” Our lightweight model, optimized for speed and efficiency\n", + "\n" + ], + "metadata": { + "id": "k6FDFTSMEyeU" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Gemini 1.5 Pro and Gemini 1.5 Flash\n", + "\n", + "- `[Gemini 1.5 Pro](https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-pro): Optimized for complex reasoning tasks such as code and text generation, text editing, problem solving, data extraction and generation\n", + "- [Gemini 1.5 Flash](https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-flash): Fast and versatile performance across a diverse variety of tasks\n", + "\n", + "If you are not sure which model to use, try `gemini-1.5-flash`. `gemini-1.5-flash` is optimized for multimodal use cases where speed and cost are important.\n" + ], + "metadata": { + "id": "FB6fVsHHvA3_" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Use the Gemini API\n", + "\n", + "These are the examples that demonstrate how to prompt a Gemini 1.5 model using the Gemini API." + ], + "metadata": { + "id": "CuG-TrB4_I2W" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N7rZuTClfNs0" + }, + "source": [ + "#### Gemini 1.5 models\n", + "\n", + "Gemini 1.5 Pro and Gemini 1.5 Flash are multimodal models that support multimodal prompts. Use `GenerativeModel` to load a model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2998506fe6d1", + "tags": [] + }, + "outputs": [], + "source": [ + "GEMINI_FLASH_MODEL_ID = \"gemini-1.5-flash\" # @param {type:\"string\"}\n", + "\n", + "model = genai.GenerativeModel(GEMINI_FLASH_MODEL_ID)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### Generate content\n", + "\n", + "The `generate_content` method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports." + ], + "metadata": { + "id": "XtTD01rIqqLN" + } + }, + { + "cell_type": "code", + "source": [ + "response = model.generate_content(\"What is the meaning of life?\")" + ], + "metadata": { + "id": "zqMCBUdyqTVh" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### Print model output\n", + "\n", + "In simple cases, the `response.text` accessor is all you need." + ], + "metadata": { + "id": "wbMbHCOyr8nv" + } + }, + { + "cell_type": "code", + "source": [ + "print(response.text)" + ], + "metadata": { + "id": "Jg5rDI6FsBbH" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "You can use `Markdown` to display formatted text." + ], + "metadata": { + "id": "P45hc5CrtUsC" + } + }, + { + "cell_type": "code", + "source": [ + "from IPython.display import Markdown\n", + "\n", + "Markdown(response.text)" + ], + "metadata": { + "id": "TYrE09q6tMrD" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### Streaming\n", + "\n", + "By default, the model returns a response after completing the entire generation process. You can also stream the response as it is being generated, and the model will return chunks of the response as soon as they are generated." + ], + "metadata": { + "id": "ahqKlqo0SNon" + } + }, + { + "cell_type": "code", + "source": [ + "response = model.generate_content(\"What is the meaning of life?\", stream=True)" + ], + "metadata": { + "id": "6JCEHzwwabRc" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "for chunk in response:\n", + " print(chunk.text)\n", + " print(\"_\" * 80)" + ], + "metadata": { + "id": "sYxFBoTBad0R" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Configure model parameters\n", + "\n", + "You learn how to create structured instructions in a prompt and configure the Gemini API with the following configuration options:\n", + "\n", + "- [System instructions](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/system-instructions)\n", + "- [Generation parameters](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#parameters)\n", + "- [Safety settings](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes)\n", + "\n", + "In this example, you ask the model to be a role of a travel assistant that helps build itineraries based on a location." + ], + "metadata": { + "id": "oFOMzxuH_Ee8" + } + }, + { + "cell_type": "code", + "source": [ + "from vertexai.generative_models import (\n", + " GenerationConfig,\n", + " HarmCategory,\n", + " HarmBlockThreshold,\n", + " Part\n", + ")\n", + "\n", + "instruction = (\"\"\"\n", + "You are a seasoned travel blogger and guide with a knack for unearthing hidden gems\n", + "and creating unforgettable travel itineraries.\n", + "\n", + "Your task focuses on trip inspiration, detailed planning, and seamless logistics based\n", + "on the location the customer is interested in. Document a potential user journey for\n", + "finding, curating, and utilizing a travel itinerary designed for this specific location.\n", + "\n", + "Format these itinerary into a table with columns Day, Location, Experiences,\n", + "Things to know and The How. The How column describes in detail how to accomplish the\n", + "plan for the experience recommended.\n", + "\"\"\")\n", + "\n", + "# Load a model with system instruction\n", + "model = genai.GenerativeModel(\n", + " model_name=GEMINI_FLASH_MODEL_ID,\n", + " system_instruction=instruction,\n", + ")\n", + "\n", + "# Set generation parameters\n", + "generation_config = GenerationConfig(\n", + " temperature=0.7,\n", + " top_p=1.0,\n", + " top_k=32,\n", + " candidate_count=1,\n", + " max_output_tokens=8192,\n", + ")\n", + "\n", + "# Set safety settings\n", + "safety_settings = {\n", + " HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", + " HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", + " HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", + " HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", + "}\n" + ], + "metadata": { + "id": "yCnr-ext_W2l" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "prompt = \"\"\"\n", + "Location: Beijing\n", + "Itinerary:\n", + "\"\"\"\n", + "\n", + "# Set contents to send to the model\n", + "contents = [prompt]\n", + "\n", + "# Prompt the model to generate content\n", + "response = model.generate_content(contents,\n", + " generation_config=generation_config,\n", + " safety_settings=safety_settings)\n", + "\n", + "# Print formatted markdown text\n", + "Markdown(response.text)" + ], + "metadata": { + "id": "W8fJsu2Mx3xf" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Multimodal examples\n" + ], + "metadata": { + "id": "FPltGY11r-aW" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0GTyrWHugKFi" + }, + "source": [ + "### ๐Ÿ–ผ๏ธ Image understanding\n", + "\n", + "This example uses the Gemini API to analyze a product sketch (in this case, a drawing of a Jet Backpack), suggests marketing ideas for it.\n", + "\n", + "First, you download the image and load it with PIL:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JgbFtil0gLNf" + }, + "outputs": [], + "source": [ + "productSketchUrl = \"https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg\"\n", + "!curl -o jetpack.jpg {productSketchUrl}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0rcYDbcDga8s" + }, + "outputs": [], + "source": [ + "img = PIL.Image.open('jetpack.jpg')\n", + "display(Image('jetpack.jpg', width=500))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RJyRsfQi0tp6" + }, + "source": [ + "Then you can include the image in our prompt by just passing a list of items to `generate_content`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UTgRAmEHOaAz" + }, + "outputs": [], + "source": [ + "prompt = \"\"\"This image contains a sketch of a potential product along with some notes.\n", + "Given the product sketch, describe the product as thoroughly as possible based on what you\n", + "see in the image, making sure to note all of the product features. Return output in json format:\n", + "{description: description, features: [feature1, feature2, feature3, etc]}\"\"\"\n", + "\n", + "image_file = Part.from_uri(\n", + " \"gs://generativeai-downloads/images/jetpack.jpg\", \"image/jpeg\"\n", + ")\n", + "\n", + "response = model.generate_content([image_file, prompt])\n", + "print(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e873a42810be" + }, + "source": [ + "#### Generate marketing ideas\n", + "\n", + "Now using the image you can use Gemini API to generate marketing names ideas:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GioGOu5xspug" + }, + "outputs": [], + "source": [ + "prompt = \"\"\"You are a marketing whiz and writer trying to come up with a name for the\n", + "product shown in the image. Come up with ten varied, interesting possible names in Chinese. Return the result\n", + "in array format, like this: ['name 1', 'name 2', ...]. Pay careful attention\n", + "to return a valid array in the format described above, and no other text.\n", + "The most important thing is that you stick to the array format.\"\"\"\n", + "\n", + "response = model.generate_content([image_file, prompt])\n", + "\n", + "names = eval(response.text)\n", + "print(names)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ‘ Try it yourself ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ‘\n", + "\n", + "**Example images**\n", + "- https://storage.googleapis.com/cloud-samples-data/generative-ai/image/mooncake.png\n", + "- https://storage.googleapis.com/cloud-samples-data/generative-ai/image/a-man-and-a-dog.png\n", + "- https://storage.googleapis.com/cloud-samples-data/generative-ai/image/hurricane-ida.jpeg\n", + "\n", + "**Example prompts**\n", + "- Describe the image in detail in Chinese\n", + "- What is in the image? Tell me some fun facts about it.\n", + "- What is in the image? Write a poem in Shakespearean language in Chinese." + ], + "metadata": { + "id": "zqVa1AUsvycZ" + } + }, + { + "cell_type": "code", + "source": [ + "from urllib.parse import urlparse\n", + "\n", + "def convert_to_gs(url):\n", + " parsed = urlparse(url)\n", + " if parsed.netloc == \"storage.googleapis.com\":\n", + " path = parsed.path.lstrip(\"/\") # Remove leading slash from path\n", + " return f\"gs://{path}\"\n", + " else:\n", + " return url # Return original URL if not a storage.googleapis.com URL\n", + "\n", + "\n", + "your_image_url = \"\" # @param {type:\"string\"}\n", + "\n", + "!curl -o your_image.jpg {your_image_url}\n", + "your_image = PIL.Image.open('your_image.jpg')\n", + "display(Image('your_image.jpg', width=500))\n", + "\n", + "your_prompt = \"\" # @param {type:\"string\"}\n", + "\n", + "\n", + "image_file = Part.from_uri(\n", + " convert_to_gs(your_image_url), \"image/jpeg\"\n", + ")\n", + "\n", + "response = model.generate_content([your_prompt, image_file])\n", + "Markdown(response.text)" + ], + "metadata": { + "cellView": "form", + "id": "sTmK8odSwZrU" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M9IkzwizO6h7" + }, + "source": [ + "### ๐Ÿ“„ PDF and Document Summarization\n", + "\n", + "You can use Gemini to process PDF documents, and analyze content, retain information, and provide answers to queries regarding the documents.\n", + "\n", + "The PDF document example used here is the Gemini 1.5 paper (https://arxiv.org/pdf/2403.05530.pdf).\n", + "\n", + "![image.png](https://storage.googleapis.com/cloud-samples-data/generative-ai/image/gemini1.5-paper-2403.05530.png)" + ] + }, + { + "cell_type": "code", + "source": [ + "file_ref = Part.from_uri(uri = \"gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf\",\n", + " mime_type=\"application/pdf\")\n", + "\n", + "model.count_tokens([file_ref, 'Can you summarize this file as a bulleted list?'])" + ], + "metadata": { + "id": "6RfaobT8XJR0" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "response = model.generate_content(\n", + " [file_ref, 'Can you summarize this file as a bulleted list?']\n", + ")\n", + "\n", + "Markdown(response.text)" + ], + "metadata": { + "id": "76ClvKCkXUbE" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Identify and locate an image in the document" + ], + "metadata": { + "id": "1AcMJSH4a8a4" + } + }, + { + "cell_type": "code", + "source": [ + "!curl -o drawing1.png https://storage.googleapis.com/cloud-samples-data/generative-ai/image/drawing1.png\n", + "drawing = PIL.Image.open('drawing1.png')\n", + "display(Image('drawing1.png', width=200))" + ], + "metadata": { + "id": "a4_S24ujaOzs" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "image_file = Part.from_uri(\n", + " \"gs://cloud-samples-data/generative-ai/image/drawing1.png\", \"image/jpeg\"\n", + ")\n", + "\n", + "response = model.generate_content(\n", + " [file_ref, image_file, 'Find the drawing in the document and explain why the drawing appears there.']\n", + ")\n", + "\n", + "Markdown(response.text)" + ], + "metadata": { + "id": "mmNDbrA_XnW_" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## ๐Ÿ”Š Audio" + ], + "metadata": { + "id": "f6ViwXLbblzN" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "h3olx5jGpXCn" + }, + "source": [ + "#### Audio summarization" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "t1gnz_4TpD_f" + }, + "outputs": [], + "source": [ + "audio_file_url = \"https://storage.googleapis.com/cloud-samples-data/generative-ai/audio/pixel.mp3\"\n", + "\n", + "!curl -o audio.mp3 {audio_file_url}\n", + "\n", + "Audio(audio_file_url)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fv__QJHjpcKP" + }, + "outputs": [], + "source": [ + "prompt = \"\"\"\n", + " Please provide a short summary and title for the audio.\n", + " Provide chapter titles, be concise and short, no need to provide chapter summaries.\n", + " Provide each of the chapter titles in a numbered list.\n", + " Do not make up any information that is not part of the audio and do not be verbose.\n", + "\"\"\"\n", + "\n", + "audio_file = Part.from_uri(\"gs://cloud-samples-data/generative-ai/audio/pixel.mp3\",\n", + " mime_type=\"audio/mpeg\")\n", + "\n", + "response = model.generate_content(\n", + " [audio_file, prompt]\n", + ")\n", + "\n", + "Markdown(response.text)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ‘ Try it yourself ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ‘\n", + "\n", + "**Example audio**\n", + "- https://storage.googleapis.com/cloud-samples-data/generative-ai/audio/audio_summary_clean_energy.mp3\n", + "\n", + "**Example prompts**\n", + "- Transcribe this audio\n", + "- Transcribe this audio and translate it to Chinese" + ], + "metadata": { + "id": "a1h0CzbhdpIy" + } + }, + { + "cell_type": "code", + "source": [ + "your_audio_url = \"\" # @param {type:\"string\"}\n", + "\n", + "!curl -o test.mp3 {your_audio_url}\n", + "Audio(your_audio_url)\n", + "\n", + "your_prompt = \"\" # @param {type:\"string\"}\n", + "\n", + "audio_file = Part.from_uri(convert_to_gs(your_audio_url),\n", + " mime_type=\"audio/mpeg\")\n", + "\n", + "response = model.generate_content(\n", + " [audio_file, your_prompt]\n", + ")\n", + "\n", + "Markdown(response.text)" + ], + "metadata": { + "cellView": "form", + "id": "pqTGKykodFbo" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_U36v4TmswAG" + }, + "source": [ + "## ๐ŸŽฌ Video\n", + "\n", + "Native multimodal and long context capabilities on video interleaving with audio inputs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EDswcPI0tSRk" + }, + "outputs": [], + "source": [ + "video_file_url = \"https://storage.googleapis.com/cloud-samples-data/generative-ai/video/pixel8.mp4\"\n", + "\n", + "!curl -o video.mp4 {video_file_url}\n", + "Video(video_file_url, width=450)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "R9isZfjzCYxw" + }, + "outputs": [], + "source": [ + "prompt = \"\"\"\n", + "Look through each frame in the video carefully and answer the question.\n", + "Only base your answers strictly on what information is available in the video attached.\n", + "Do not make up any information that is not part of the video and do not be too verbose.\n", + "\n", + "Questions:\n", + "- When does a red lantern first appear and what is written in the lantern? Provide a timestamp.\n", + "- What language is the person speaking and what does the person say at that time?\n", + "\"\"\"\n", + "\n", + "video_file = Part.from_uri(uri = \"gs://cloud-samples-data/generative-ai/video/pixel8.mp4\",\n", + " mime_type=\"video/mp4\")\n", + "\n", + "response = model.generate_content([video_file, prompt])\n", + "\n", + "print(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sAO09UUcvL_L" + }, + "source": [ + "## Use Case: retail / e-commerce\n", + "\n", + "The customer shows you their living room:\n", + "\n", + "|Customer photo |\n", + "|:-----:|\n", + "| |\n", + "\n", + "\n", + "\n", + "Below are four wall art options that the customer is trying to decide between:\n", + "\n", + "|Art 1| Art 2 | Art 3 | Art 4 |\n", + "|:-----:|:----:|:-----:|:----:|\n", + "| ||||\n", + "\n", + "\n", + "How can you use Gemini 1.5, a multimodal model, to help the customer choose the best option?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4joag7vswfYw" + }, + "source": [ + "### Generating open recommendations\n", + "\n", + "Using the same image, you can ask the model to recommend a piece of furniture that would make sense in the space.\n", + "\n", + "Note that the model can choose any furniture in this case, and can do so only from its built-in knowledge." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Z82JDCTiwoxB" + }, + "outputs": [], + "source": [ + "# urls for room images\n", + "room_image_url = \"https://storage.googleapis.com/cloud-samples-data/generative-ai/image/living-room.png\"\n", + "\n", + "# load room images as Image Objects\n", + "def load_image_from_url(url):\n", + " file_name = url.split('/')[-1]\n", + " !curl -o {file_name} {url}\n", + " return Part.from_uri(convert_to_gs(url), \"image/jpeg\")\n", + "\n", + "room_image = load_image_from_url(room_image_url)\n", + "display(Image(room_image_url.split('/')[-1], width=300))\n", + "\n", + "prompt = \"Describe this room\"\n", + "response = model.generate_content([prompt, room_image])\n", + "Markdown(response.text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nnSwtNKbxizu" + }, + "outputs": [], + "source": [ + "prompt1 = \"Recommend a new piece of furniture for this room\"\n", + "prompt2 = \"Explain the reason in detail\"\n", + "contents = [prompt1, room_image, prompt2]\n", + "\n", + "response = model.generate_content(contents)\n", + "Markdown(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M8XU_0L_xB-_" + }, + "source": [ + "### Generating recommendations based on provided images\n", + "\n", + "Instead of keeping the recommendation open, you can also provide a list of items for the model to choose from. Here, you will download a few art images that the Gemini model can recommend. This is particularly useful for retail companies who want to provide product recommendations to users based on their current setup." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Dd05VvNHxGs1" + }, + "outputs": [], + "source": [ + "# Download and display sample artwork\n", + "art_image_urls = [\n", + " \"https://storage.googleapis.com/cloud-samples-data/generative-ai/image/room-art-1.png\",\n", + " \"https://storage.googleapis.com/cloud-samples-data/generative-ai/image/room-art-2.png\",\n", + " \"https://storage.googleapis.com/cloud-samples-data/generative-ai/image/room-art-3.png\",\n", + " \"https://storage.googleapis.com/cloud-samples-data/generative-ai/image/room-art-4.png\",\n", + "]\n", + "\n", + "\n", + "# Load wall art images as Image Objects\n", + "art_images = [load_image_from_url(url) for url in art_image_urls]" + ] + }, + { + "cell_type": "code", + "source": [ + "# To recommend an item from a selection, you will need to label the item number within the prompt.\n", + "# That way you are providing the model with a way to reference each image as you pose a question.\n", + "# Labeling images within your prompt also helps reduce hallucinations and produce better results.\n", + "prompt = \"\"\"\n", + " You are an interior designer.\n", + " For each piece of wall art, explain whether it would be appropriate for the style of the room.\n", + " Rank each piece according to how well it would be compatible in the room.\n", + "\"\"\"\n", + "contents = [\n", + " \"Consider the following art pieces:\",\n", + " \"art 1:\",\n", + " art_images[0],\n", + " \"art 2:\",\n", + " art_images[1],\n", + " \"art 3:\",\n", + " art_images[2],\n", + " \"art 4:\",\n", + " art_images[3],\n", + " \"room:\",\n", + " room_image,\n", + " prompt,\n", + "]\n", + "\n", + "display(Image(room_image_url.split('/')[-1], width=300))\n", + "print(\"\\n------Art1:-------\")\n", + "display(Image(art_image_urls[0].split('/')[-1], width=300))\n", + "print(\"\\n------Art2:-------\")\n", + "display(Image(art_image_urls[1].split('/')[-1], width=300))\n", + "print(\"\\n------Art3:-------\")\n", + "display(Image(art_image_urls[2].split('/')[-1], width=300))\n", + "print(\"\\n------Art4:-------\")\n", + "display(Image(art_image_urls[3].split('/')[-1], width=300))\n", + "\n", + "response = model.generate_content(contents)\n", + "Markdown(response.text)" + ], + "metadata": { + "id": "6uZYJkLPnWNk" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(\"\\nArt2: most appropriate!\")\n", + "display(Image(art_image_urls[1].split('/')[-1], width=300))" + ], + "metadata": { + "id": "S5k7ccJXQCTi" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Conclusions\n", + "\n", + "In this session, you've learned how to use Gemini 1.5's native multimodality and long context window capabilities to explore:\n", + "\n", + "- **Single modality**: Working with text, PDF, image, audio and video inputs individually.\n", + "- **Multimodality**: Combining different input types for more complex interactions.\n", + "- **Real-world use case**: A practical e-commerce example to demonstrate Gemini's capabilities.\n", + "\n", + "Next, you can continue to explore more examples on:\n", + "\n", + "- [Gemini API Cookbook](https://github.com/google-gemini/cookbook/)\n", + "- [Vertex AI Generative AI notebook samples](https://github.com/GoogleCloudPlatform/generative-ai)" + ], + "metadata": { + "id": "JZV7sMJ2n9PU" + } + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "Ase31I9QncRp" + }, + "execution_count": null, + "outputs": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "environment": { + "kernel": "python3", + "name": "common-cpu.m113", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m113" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file