fix: Fix PDF summarization prompt in Gemini 1.5 Pro Notebook (#1151)

# Description Edited PDF Summarization prompt to resolve issue where PDF was not read. Seems to be just a weird model quirk. Doesn't affect 1.5 Flash. Fixes #754 🦕
GoogleCloudPlatform · Sep 23, 2024 · ee24cb8 · ee24cb8
1 parent 94f8200
commit ee24cb8
Showing 1 changed file with 30 additions and 31 deletions.
diff --git a/gemini/getting-started/intro_gemini_1_5_pro.ipynb b/gemini/getting-started/intro_gemini_1_5_pro.ipynb
@@ -29,7 +29,7 @@
         "id": "7yVV6txOmNMn"
       },
       "source": [
-        "# Getting started with the Vertex AI Gemini 1.5 Pro\n",
+        "# Getting started with Vertex AI Gemini 1.5 Pro\n",
         "\n",
         "\n",
         "<table align=\"left\">\n",
@@ -105,7 +105,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 21,
+      "execution_count": null,
       "metadata": {
         "id": "tFy3H3aPgx12"
       },
@@ -195,7 +195,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 26,
+      "execution_count": 1,
       "metadata": {
         "id": "Nqwi-5ufWp_B"
       },
@@ -220,7 +220,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 27,
+      "execution_count": 2,
       "metadata": {
         "id": "lslYAvw37JGQ"
       },
@@ -253,7 +253,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 28,
+      "execution_count": 3,
       "metadata": {
         "id": "U7ExWmuLBdIA"
       },
@@ -277,7 +277,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 29,
+      "execution_count": 4,
       "metadata": {
         "id": "FhFxrtfdSwOP"
       },
@@ -286,41 +286,41 @@
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "total_tokens: 14\n",
-            "total_billable_characters: 29\n",
+            "total_tokens: 32\n",
+            "total_billable_characters: 108\n",
             "\n",
             "\n",
             "Answer:\n",
             "J'aime les bagels. \n",
             "\n",
             "\n",
             "Usage metadata:\n",
-            "{'prompt_token_count': 14, 'candidates_token_count': 8, 'total_token_count': 22}\n",
+            "{'prompt_token_count': 32, 'candidates_token_count': 8, 'total_token_count': 40}\n",
             "\n",
             "Finish reason:\n",
             "1\n",
             "\n",
             "Safety settings:\n",
             "[category: HARM_CATEGORY_HATE_SPEECH\n",
             "probability: NEGLIGIBLE\n",
-            "probability_score: 0.15077754855155945\n",
+            "probability_score: 0.155273438\n",
             "severity: HARM_SEVERITY_NEGLIGIBLE\n",
-            "severity_score: 0.07821886986494064\n",
+            "severity_score: 0.0737304688\n",
             ", category: HARM_CATEGORY_DANGEROUS_CONTENT\n",
             "probability: NEGLIGIBLE\n",
-            "probability_score: 0.06730107963085175\n",
+            "probability_score: 0.0727539062\n",
             "severity: HARM_SEVERITY_NEGLIGIBLE\n",
-            "severity_score: 0.09089674800634384\n",
+            "severity_score: 0.0913085938\n",
             ", category: HARM_CATEGORY_HARASSMENT\n",
             "probability: NEGLIGIBLE\n",
-            "probability_score: 0.1252792477607727\n",
+            "probability_score: 0.134765625\n",
             "severity: HARM_SEVERITY_NEGLIGIBLE\n",
-            "severity_score: 0.08525123447179794\n",
+            "severity_score: 0.0815429688\n",
             ", category: HARM_CATEGORY_SEXUALLY_EXPLICIT\n",
             "probability: NEGLIGIBLE\n",
-            "probability_score: 0.21060390770435333\n",
+            "probability_score: 0.232421875\n",
             "severity: HARM_SEVERITY_NEGLIGIBLE\n",
-            "severity_score: 0.11260009557008743\n",
+            "severity_score: 0.125\n",
             "]\n"
           ]
         }
@@ -606,7 +606,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 30,
+      "execution_count": 8,
       "metadata": {
         "id": "JgKDIZUstYwV"
       },
@@ -615,19 +615,18 @@
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "## Summary of \"Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context\"\n",
-            "\n",
-            "**Gemini 1.5 Pro** is a new large language model (LLM) from Google DeepMind capable of processing and understanding extremely long sequences of information across various modalities like text, code, images, audio, and video. It utilizes a mixture-of-experts architecture and achieves state-of-the-art performance on many tasks while being significantly more efficient than previous models. \n",
+            "This is a technical report introducing Gemini 1.5 Pro, Google's latest multi-modal model. The model is built upon the mixture-of-experts (MoE) architecture and exhibits impressive performance on reasoning, multi-modality, and long context understanding. Gemini 1.5 Pro distinguishes itself by expanding the context window size to several million tokens, a significant leap beyond the 200k tokens offered by its predecessor, Claude 2.1. This expanded capacity allows for processing nearly five days of audio, entire books, or extensive code repositories. \n",
             "\n",
-            "**Key advancements and findings:**\n",
+            "The report highlights the model's abilities through: \n",
+            "* **Qualitative examples:** Showcasing impressive feats such as pinpointing specific code within the complete JAX codebase, learning to translate a new language from a single grammar book and dictionary, and identifying a scene from Les Misérables based on a hand-drawn sketch. \n",
+            "* **Quantitative evaluations:** \n",
+            "    * **Diagnostic:** demonstrating near-perfect recall in \"needle-in-a-haystack\" tasks across text, video, and audio, even maintaining high recall with context lengths extending to 10 million tokens. \n",
+            "    * **Realistic:** excelling in long-document QA using Les Misérables as context, outperforming competitors on long-video QA tasks, and showing significant progress in long-context automatic speech recognition. \n",
+            "    * **Core Capabilities:** Surpassing the performance of its predecessor (Gemini 1.0) and rivaling or exceeding the performance of a state-of-the-art model, Gemini 1.0 Ultra, on core benchmarks related to coding, math, science, reasoning, and instruction following. \n",
             "\n",
-            "* **Unprecedented context length:** Gemini 1.5 Pro can handle up to 10 million tokens of context, enabling it to process information like entire books, days-long audio recordings, and hours of video. This opens up new possibilities for applications like analyzing large datasets, summarizing documents, and understanding complex video content.\n",
-            "* **Improved performance across modalities:** The model surpasses its predecessors and even matches or exceeds the performance of state-of-the-art models like Gemini 1.0 Ultra on various benchmarks across text (e.g., reasoning, math, coding), vision, and audio understanding.\n",
-            "* **In-context learning:** Gemini 1.5 Pro showcases the ability to learn new skills like translating languages (e.g., English to Kalamang) with very limited data by providing the necessary reference materials directly in the context. This has implications for supporting low-resource languages and facilitating cross-lingual communication.\n",
-            "* **Diagnostic and realistic evaluations:** The researchers developed new benchmarks and evaluation methodologies to assess the long-context capabilities of the model, including \"needle-in-a-haystack\" tasks for different modalities and question answering from long documents and videos.\n",
-            "* **Responsible AI practices:** Google DeepMind emphasizes its commitment to responsible deployment by conducting impact assessments, implementing model safety mitigations, and evaluating potential risks and biases. \n",
+            "The report also delves into the responsible development and deployment of the model, emphasizing their approach to impact assessment, model mitigations, and ongoing safety evaluations. \n",
             "\n",
-            "**Overall, Gemini 1.5 Pro represents a significant leap forward in LLM research, demonstrating the potential of long-context understanding and multimodal capabilities for various applications while emphasizing the importance of responsible development and deployment.** \n",
+            "In conclusion, Gemini 1.5 Pro represents a significant advancement in AI, showcasing unprecedented capabilities in long-context understanding across multiple modalities. The report emphasizes the need for novel evaluation methods to better assess the potential of such models and suggests promising avenues for future research. \n",
             "\n"
           ]
         }
@@ -636,12 +635,12 @@
         "pdf_file_uri = \"gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf\"\n",
         "\n",
         "prompt = \"\"\"\n",
-        "  Your are a very professional document summarization specialist.\n",
-        "  Please summarize the given document.\n",
+        "  You are a very professional document summarization specialist.\n",
+        "  Summarize the given document.\n",
         "\"\"\"\n",
         "\n",
         "pdf_file = Part.from_uri(pdf_file_uri, mime_type=\"application/pdf\")\n",
-        "contents = [pdf_file, prompt]\n",
+        "contents = [prompt, pdf_file]\n",
         "\n",
         "response = model.generate_content(contents)\n",
         "print(response.text)"