[Feat]: how to do multi-turn and add few-shots when chatting with a file #924

NielsRogge · 2024-08-01T08:39:54Z

Is your feature request related to a problem? Please describe.

I'm working on 2 use cases with Gemini, a chat with PDF use case and a PDF to JSON use case.

for the chat with PDF use case, it would be great for the model to maintain context of the conversation as this would allow to answer follow-up questions. It's unclear how to do multi-turn conversations using the API when a file is involved.
for the PDF to JSON use case, adding few-shot examples in the prompt would improve the results.

Describe the solution you'd like

For both problems, there is currently no documentation or tutorials. Would be great to add them here: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/document-understanding.

for multi-turn with a file, there's no documentation, the multi-turn docs do not involve other data besides text: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/send-chat-prompts-gemini. Hence it's unclear how to leverage start_chat and send_message when there's a file involved. Or should we leverage generate_content?
for adding few-shots to the prompt, there's currently also no tutorial besides a high-level description of what few-shot learning is: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/few-shot-examples

Describe alternatives you've considered

Multi-turn

I have tried the following to perform multi-turn with a file:

from vertexai.generative_models import Content, GenerativeModel, GenerationConfig, Part
from dataclasses import dataclass
import enum
from typing import List

GCS_BUCKET_NAME = "your-gcs-bucket"
MODEL_NAME = "gemini-1.5-pro-001"

chat_model = GenerativeModel(MODEL_NAME)

@dataclass
class Role(enum.StrEnum):
    """
    The role of a given message.
    `assistant` are messages generated by the model.
    `user` are messages sent by the end user.
    `system` are hidden messages to perform prompt engineering.
    """

    USER = enum.auto()
    ASSISTANT = enum.auto()
    SYSTEM = enum.auto()


@dataclass
class Message:
    """A single message in a conversation."""

    content: str
    role: Role


def create_history_stateful(
        self, messages: List[Message], pdf_part: Part
    ) -> list[Content]:
      """
      Create a history of chat messages, excluding the last message because
      this will be the new query.

      Importantly, we also add the PDF part to each user message.
      """
      history = []

      for idx, message in enumerate(messages[:-1]):
          text_part = Part.from_text(message.content)
          if message.role == Role.USER and idx == 0:
              # we also add the PDF part to the user message
              gemini_message = Content(
                  role=message.role, parts=[pdf_part, text_part]
              )
          else:
              gemini_message = Content(role=message.role, parts=[text_part])
          history.append(gemini_message)

      return history


def chat_with_pdf_stateful(messages: List[Message], filename: str) -> Message:
    gcs_path = f"gs://{GCS_BUCKET_NAME}/{filename}"
    pdf_part = Part.from_uri(gcs_path, mime_type="application/pdf")
    history = create_history_stateful(messages, pdf_part)
    
    chat = chat_model.start_chat(history=history)
    
    # step 2. add PDF along with the text prompt as a list of 2 parts
    text_part = Part.from_text(messages[-1].content)
    
    # step 3. generate
    response = chat.send_message(text_part).text
    
    new_message = Message(content=response, role=Role.ASSISTANT)
    return new_message

It doesn't look like this works, as when I ask it to repeat the first user question, it fails.

Adding few-shots

It's also unclear how to add few-shot examples when interacting with a file. Should we leverage generate_content and just send a list of alternating Parts (a file part followed by a text part, followed by another file part, etc)? Cause the docs showcase that you should send the file_part and text_part altogether in one Content.

I found the ManyICL repo by Stanford which does few-shot learning with Gemini: https://github.com/stanfordmlgroup/ManyICL/blob/89c03be019d3c1bc94860012310a74f31f76e771/ManyICL/LMM.py#L206. They leverage generate_content and send a list of alternating strings and images to the model.

Additional context

/

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

NielsRogge · 2024-10-22T09:40:53Z

I shared about it here: https://www.linkedin.com/posts/niels-rogge-a3b7a3127_documentunderstanding-ai-artificialintelligence-activity-7233776843680600064-3llj?utm_source=share&utm_medium=member_desktop

NielsRogge closed this as completed Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: how to do multi-turn and add few-shots when chatting with a file #924

[Feat]: how to do multi-turn and add few-shots when chatting with a file #924

NielsRogge commented Aug 1, 2024 •

edited

Loading

NielsRogge commented Oct 22, 2024

[Feat]: how to do multi-turn and add few-shots when chatting with a file #924

[Feat]: how to do multi-turn and add few-shots when chatting with a file #924

Comments

NielsRogge commented Aug 1, 2024 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Multi-turn

Adding few-shots

Additional context

Code of Conduct

NielsRogge commented Oct 22, 2024

NielsRogge commented Aug 1, 2024 •

edited

Loading