Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/partition-pdf-with-infer_table_structure #3252

Closed
DeepKariaX opened this issue Jun 19, 2024 · 13 comments
Closed

bug/partition-pdf-with-infer_table_structure #3252

DeepKariaX opened this issue Jun 19, 2024 · 13 comments
Labels
awaiting-response bug Something isn't working pdf

Comments

@DeepKariaX
Copy link

Describe the bug
Giving (ValueError: max() arg is an empty sequence) error when using partition pdf. When i keep the infer_table_structure = True parameter it is giving me this error and after removing this parameter it is working perfectly.

File which received bug
unstructured_inference/models/tables.py", line 667, in fill_cells
table_rows_no = max({row for cell in cells for row in cell["row_nums"]})

Expected behavior
Even if we keep the infer_table_structure = True parameter it should be able to partition the pdf without any errors. (Maybe add error handling when receiving the none value)

@DeepKariaX DeepKariaX added the bug Something isn't working label Jun 19, 2024
@vav1lo
Copy link

vav1lo commented Jun 19, 2024

we got the same issue too. is there any solution ?

@DeepKariaX
Copy link
Author

@vav1lo Currently, I have changed to another reader. Also can you attach the pdf which you are testing coz mine is bit confidential to share and with a sample pdf it would be easy for them to diagnose the error.

@christinestraub
Copy link
Collaborator

Hi @vav1lo, Can you please attach the pdf that you are testing?

@hackpointt
Copy link

hackpointt commented Jun 20, 2024

uber_10q_march_2022.pdf
same problem with this file

`import os
from unstructured.partition.pdf import partition_pdf
from unstructured.staging.base import elements_to_json

filename = "uber_10q_march_2022.pdf"

elements = partition_pdf(
filename=filename,
strategy="hi_res",
infer_table_structure=True,
model_name="yolox",
)`

@hackpointt
Copy link

@christinestraub

@vav1lo
Copy link

vav1lo commented Jun 20, 2024

@christinestraub Here is the pdf that i am testing
1b4c03d6-f6f5-462d-8bd6-0b9e411bc33d.pdf

@Nidhi2497
Copy link

I am also getting the error while partitioning pdf , and the error is with particularly this argument infer_table_structure=True,

 9 import torch
 10 import transformers

---> 11 from cv2.typing import MatLike
12 from PIL.Image import Image
13 from transformers import DonutProcessor, VisionEncoderDecoderModel

ModuleNotFoundError: No module named 'cv2.typing'; 'cv2' is not a package

@vav1lo
Copy link

vav1lo commented Jun 20, 2024

I am also getting the error while partitioning pdf , and the error is with particularly this argument infer_table_structure=True,

 9 import torch
 10 import transformers

---> 11 from cv2.typing import MatLike 12 from PIL.Image import Image 13 from transformers import DonutProcessor, VisionEncoderDecoderModel

ModuleNotFoundError: No module named 'cv2.typing'; 'cv2' is not a package

I think this has to do with the opencv installation

@nikklavzar
Copy link

we got the same issue too. is there any solution ?

This started happening to me when I upgraded from 0.12.6 to 0.14.6

@Nidhi2497
Copy link

I am also getting the error while partitioning pdf , and the error is with particularly this argument infer_table_structure=True,

 9 import torch
 10 import transformers

---> 11 from cv2.typing import MatLike 12 from PIL.Image import Image 13 from transformers import DonutProcessor, VisionEncoderDecoderModel
ModuleNotFoundError: No module named 'cv2.typing'; 'cv2' is not a package

I think this has to do with the opencv installation

i installed it as well, but what is being imported there needs to be changed actually

@christinestraub
Copy link
Collaborator

christinestraub commented Jun 24, 2024

Hi @DeepKariaX, @vav1lo, @hackpointt, @Nidhi2497, @nikklavzar

Addressed on Unstructured-IO/unstructured-inference#359. You'll need to upgrade unstructured-inference to 0.7.36. I tested your code with the provided pdf documents and it worked as expected.

@christinestraub
Copy link
Collaborator

Closing this since it's assumed to be resolved, but feel free to reopen if you're still having this issue.

@DeepKariaX
Copy link
Author

@christinestraub This is resolved, thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-response bug Something isn't working pdf
Projects
None yet
Development

No branches or pull requests

6 participants