You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following code failed with "FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG" error:
#Extract text and image metadata from the PDF documenttext_metadata_df, image_metadata_df=get_document_metadata(
multimodal_model, # we are passing gemini 1.0 pro vision modelpdf_folder_path,
image_save_dir="images",
image_description_prompt=image_description_prompt,
embedding_size=1408,
)
print("\n\n --- Completed processing. ---")
:
Processing page: 1
Processing page: 2
Processing page: 3
Processing page: 4
:
FzErrorArgument Traceback (most recent call last)
[<ipython-input-8-96bfa690e8cb>](https://localhost:8080/#) in <cell line: 14>()
12
13 # Extract text and image metadata from the PDF document
---> 14 text_metadata_df, image_metadata_df = get_document_metadata(
15 multimodal_model, # we are passing gemini 1.0 pro vision model
16 pdf_folder_path,
4 frames
~/.local/lib/python3.10/site-packages/pymupdf/mupdf.py in fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
47578 Write a pixmap as a JPEG.
47579 """
> 47580 return _mupdf.fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
47581
47582 def fz_write_pixmap_as_jpx(out, pix, quality):
FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG
Relevant log output
I think get_image_for_gemini() function in
gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py should be modified as below:
importfitzimportosfromPILimportImagedefget_image_for_gemini(
doc: fitz.Document,
image: tuple,
image_no: int,
image_save_dir: str,
file_name: str,
page_num: int,
) ->Tuple[Image, str]:
""" Extracts an image from a PDF document, converts it to JPEG format, saves it to a specified directory, and loads it as a PIL Image Object. Parameters: - doc (fitz.Document): The PDF document from which the image is extracted. - image (tuple): A tuple containing image information. - image_no (int): The image number for naming purposes. - image_save_dir (str): The directory where the image will be saved. - file_name (str): The base name for the image file. - page_num (int): The page number from which the image is extracted. Returns: - Tuple[Image.Image, str]: A tuple containing the Gemini Image object and the image filename. """# Extract the image from the documentxref=image[0]
pix=fitz.Pixmap(doc, xref)
# Convert the image to JPEG formatpix.tobytes("jpeg")
# Create the image file nameimage_name=f"{image_save_dir}/{file_name}_image_{page_num}_{image_no}_{xref}.jpeg"# Create the image save directory if it doesn't existos.makedirs(image_save_dir, exist_ok=True)
# Save the image to the specified locationpix.save(image_name)
# Load the saved image as a Gemini Image Objectimage_for_gemini=Image.load_from_file(image_name)
returnimage_for_gemini, image_name
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Hey all, i tried the above code as I was facing the same issue, the above code is giving the following error
Unexpected item type: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=270x184 at 0x7C60195D9CC0>.Only types that represent a single Content or a single Part are supported here.
File Name
https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb
What happened?
Following code failed with "FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG" error:
Relevant log output
I think get_image_for_gemini() function in
gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py should be modified as below:
Code of Conduct
The text was updated successfully, but these errors were encountered: