You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm working on 2 use cases with Gemini, a chat with PDF use case and a PDF to JSON use case.
for the chat with PDF use case, it would be great for the model to maintain context of the conversation as this would allow to answer follow-up questions. It's unclear how to do multi-turn conversations using the API when a file is involved.
for the PDF to JSON use case, adding few-shot examples in the prompt would improve the results.
I have tried the following to perform multi-turn with a file:
fromvertexai.generative_modelsimportContent, GenerativeModel, GenerationConfig, PartfromdataclassesimportdataclassimportenumfromtypingimportListGCS_BUCKET_NAME="your-gcs-bucket"MODEL_NAME="gemini-1.5-pro-001"chat_model=GenerativeModel(MODEL_NAME)
@dataclassclassRole(enum.StrEnum):
""" The role of a given message. `assistant` are messages generated by the model. `user` are messages sent by the end user. `system` are hidden messages to perform prompt engineering. """USER=enum.auto()
ASSISTANT=enum.auto()
SYSTEM=enum.auto()
@dataclassclassMessage:
"""A single message in a conversation."""content: strrole: Roledefcreate_history_stateful(
self, messages: List[Message], pdf_part: Part
) ->list[Content]:
""" Create a history of chat messages, excluding the last message because this will be the new query. Importantly, we also add the PDF part to each user message. """history= []
foridx, messageinenumerate(messages[:-1]):
text_part=Part.from_text(message.content)
ifmessage.role==Role.USERandidx==0:
# we also add the PDF part to the user messagegemini_message=Content(
role=message.role, parts=[pdf_part, text_part]
)
else:
gemini_message=Content(role=message.role, parts=[text_part])
history.append(gemini_message)
returnhistorydefchat_with_pdf_stateful(messages: List[Message], filename: str) ->Message:
gcs_path=f"gs://{GCS_BUCKET_NAME}/{filename}"pdf_part=Part.from_uri(gcs_path, mime_type="application/pdf")
history=create_history_stateful(messages, pdf_part)
chat=chat_model.start_chat(history=history)
# step 2. add PDF along with the text prompt as a list of 2 partstext_part=Part.from_text(messages[-1].content)
# step 3. generateresponse=chat.send_message(text_part).textnew_message=Message(content=response, role=Role.ASSISTANT)
returnnew_message
It doesn't look like this works, as when I ask it to repeat the first user question, it fails.
Adding few-shots
It's also unclear how to add few-shot examples when interacting with a file. Should we leverage generate_content and just send a list of alternating Parts (a file part followed by a text part, followed by another file part, etc)? Cause the docs showcase that you should send the file_part and text_part altogether in one Content.
Is your feature request related to a problem? Please describe.
I'm working on 2 use cases with Gemini, a chat with PDF use case and a PDF to JSON use case.
Describe the solution you'd like
For both problems, there is currently no documentation or tutorials. Would be great to add them here: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/document-understanding.
start_chat
andsend_message
when there's a file involved. Or should we leveragegenerate_content
?Describe alternatives you've considered
Multi-turn
I have tried the following to perform multi-turn with a file:
It doesn't look like this works, as when I ask it to repeat the first user question, it fails.
Adding few-shots
It's also unclear how to add few-shot examples when interacting with a file. Should we leverage
generate_content
and just send a list of alternating Parts (a file part followed by a text part, followed by another file part, etc)? Cause the docs showcase that you should send thefile_part
andtext_part
altogether in oneContent
.I found the ManyICL repo by Stanford which does few-shot learning with Gemini: https://github.com/stanfordmlgroup/ManyICL/blob/89c03be019d3c1bc94860012310a74f31f76e771/ManyICL/LMM.py#L206. They leverage generate_content and send a list of alternating strings and images to the model.
Additional context
/
Code of Conduct
The text was updated successfully, but these errors were encountered: