Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG #676

Open
1 task done
myoshimu opened this issue May 13, 2024 · 2 comments
Open
1 task done
Assignees

Comments

@myoshimu
Copy link
Member

myoshimu commented May 13, 2024

File Name

https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb

What happened?

Following code failed with "FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG" error:

#Extract text and image metadata from the PDF document
text_metadata_df, image_metadata_df = get_document_metadata(
    multimodal_model,  # we are passing gemini 1.0 pro vision model
    pdf_folder_path,
    image_save_dir="images",
    image_description_prompt=image_description_prompt,
    embedding_size=1408,
)

print("\n\n --- Completed processing. ---")
:



Processing page: 1
Processing page: 2
Processing page: 3
Processing page: 4

:

FzErrorArgument                           Traceback (most recent call last)
[<ipython-input-8-96bfa690e8cb>](https://localhost:8080/#) in <cell line: 14>()
     12 
     13 # Extract text and image metadata from the PDF document
---> 14 text_metadata_df, image_metadata_df = get_document_metadata(
     15     multimodal_model,  # we are passing gemini 1.0 pro vision model
     16     pdf_folder_path,

4 frames
~/.local/lib/python3.10/site-packages/pymupdf/mupdf.py in fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
  47578         Write a pixmap as a JPEG.
  47579     """
> 47580     return _mupdf.fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
  47581 
  47582 def fz_write_pixmap_as_jpx(out, pix, quality):

FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG

Relevant log output

I think get_image_for_gemini() function in
gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py should be modified as below:

import fitz
import os
from PIL import Image


def get_image_for_gemini(
    doc: fitz.Document,
    image: tuple,
    image_no: int,
    image_save_dir: str,
    file_name: str,
    page_num: int,
) -> Tuple[Image, str]:
    """
    Extracts an image from a PDF document, converts it to JPEG format, saves it to a specified directory,
    and loads it as a PIL Image Object.

    Parameters:
    - doc (fitz.Document): The PDF document from which the image is extracted.
    - image (tuple): A tuple containing image information.
    - image_no (int): The image number for naming purposes.
    - image_save_dir (str): The directory where the image will be saved.
    - file_name (str): The base name for the image file.
    - page_num (int): The page number from which the image is extracted.

    Returns:
    - Tuple[Image.Image, str]: A tuple containing the Gemini Image object and the image filename.
    """

    # Extract the image from the document
    xref = image[0]
    pix = fitz.Pixmap(doc, xref)

    # Convert the image to JPEG format
    pix.tobytes("jpeg")

    # Create the image file name
    image_name = f"{image_save_dir}/{file_name}_image_{page_num}_{image_no}_{xref}.jpeg"

    # Create the image save directory if it doesn't exist
    os.makedirs(image_save_dir, exist_ok=True)

    # Save the image to the specified location
    pix.save(image_name)

    # Load the saved image as a Gemini Image Object
    image_for_gemini = Image.load_from_file(image_name)

    return image_for_gemini, image_name

Code of Conduct

  • I agree to follow this project's Code of Conduct
@krupalsmart97
Copy link

Hey all, i tried the above code as I was facing the same issue, the above code is giving the following error

Unexpected item type: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=270x184 at 0x7C60195D9CC0>.Only types that represent a single Content or a single Part are supported here.

not sure if I am doing something wrong

@rocpoc
Copy link
Contributor

rocpoc commented May 16, 2024

@holtskinner +1, I am seeing this issue too.

I've also been hitting numerous quota issues despite adding:

add_sleep_after_page = True
sleep_time_after_page = 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants