Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: how to do multi-turn and add few-shots when chatting with a file #924

Closed
1 task done
NielsRogge opened this issue Aug 1, 2024 · 1 comment
Closed
1 task done

Comments

@NielsRogge
Copy link

NielsRogge commented Aug 1, 2024

Is your feature request related to a problem? Please describe.

I'm working on 2 use cases with Gemini, a chat with PDF use case and a PDF to JSON use case.

  • for the chat with PDF use case, it would be great for the model to maintain context of the conversation as this would allow to answer follow-up questions. It's unclear how to do multi-turn conversations using the API when a file is involved.
  • for the PDF to JSON use case, adding few-shot examples in the prompt would improve the results.

Describe the solution you'd like

For both problems, there is currently no documentation or tutorials. Would be great to add them here: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/document-understanding.

Describe alternatives you've considered

Multi-turn

I have tried the following to perform multi-turn with a file:

from vertexai.generative_models import Content, GenerativeModel, GenerationConfig, Part
from dataclasses import dataclass
import enum
from typing import List

GCS_BUCKET_NAME = "your-gcs-bucket"
MODEL_NAME = "gemini-1.5-pro-001"

chat_model = GenerativeModel(MODEL_NAME)

@dataclass
class Role(enum.StrEnum):
    """
    The role of a given message.
    `assistant` are messages generated by the model.
    `user` are messages sent by the end user.
    `system` are hidden messages to perform prompt engineering.
    """

    USER = enum.auto()
    ASSISTANT = enum.auto()
    SYSTEM = enum.auto()


@dataclass
class Message:
    """A single message in a conversation."""

    content: str
    role: Role


def create_history_stateful(
        self, messages: List[Message], pdf_part: Part
    ) -> list[Content]:
      """
      Create a history of chat messages, excluding the last message because
      this will be the new query.

      Importantly, we also add the PDF part to each user message.
      """
      history = []

      for idx, message in enumerate(messages[:-1]):
          text_part = Part.from_text(message.content)
          if message.role == Role.USER and idx == 0:
              # we also add the PDF part to the user message
              gemini_message = Content(
                  role=message.role, parts=[pdf_part, text_part]
              )
          else:
              gemini_message = Content(role=message.role, parts=[text_part])
          history.append(gemini_message)

      return history


def chat_with_pdf_stateful(messages: List[Message], filename: str) -> Message:
    gcs_path = f"gs://{GCS_BUCKET_NAME}/{filename}"
    pdf_part = Part.from_uri(gcs_path, mime_type="application/pdf")
    history = create_history_stateful(messages, pdf_part)
    
    chat = chat_model.start_chat(history=history)
    
    # step 2. add PDF along with the text prompt as a list of 2 parts
    text_part = Part.from_text(messages[-1].content)
    
    # step 3. generate
    response = chat.send_message(text_part).text
    
    new_message = Message(content=response, role=Role.ASSISTANT)
    return new_message

It doesn't look like this works, as when I ask it to repeat the first user question, it fails.

Adding few-shots

It's also unclear how to add few-shot examples when interacting with a file. Should we leverage generate_content and just send a list of alternating Parts (a file part followed by a text part, followed by another file part, etc)? Cause the docs showcase that you should send the file_part and text_part altogether in one Content.

I found the ManyICL repo by Stanford which does few-shot learning with Gemini: https://github.com/stanfordmlgroup/ManyICL/blob/89c03be019d3c1bc94860012310a74f31f76e771/ManyICL/LMM.py#L206. They leverage generate_content and send a list of alternating strings and images to the model.

Additional context

/

Code of Conduct

  • I agree to follow this project's Code of Conduct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant