Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/error with 'fake' file and assigned name in partition_pdf function #1644

Closed
Guiforge opened this issue Oct 4, 2023 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@Guiforge
Copy link

Guiforge commented Oct 4, 2023

Describe the bug
When we use not real io file with name attribute partition_pdf failed with no such file or directory.
This can happen, for example, when you extract files from a zip file with ZipFile and open

To Reproduce

import io

# Create a 'fake' file
file = io.BytesIO()
file.name = 'tmp.pdf'

# Call the partition_pdf function
result = partition_pdf(file=file)

Expected behavior
Same experience than spooltempfile with no attribute name

Screenshots
...

Environment Info
...

Additional context

  • This issue occurs when using a file object with an assigned name.
  • The bug can be reproduced when extracting a file from a ZIP archive, where the file has a name but does not exist in the filesystem.
@KristianMischke
Copy link

KristianMischke commented May 22, 2024

I think this is the same issue as Unstructured-IO/unstructured-inference#303. Unless you have other evidence, this is a windows-only issue

The inference library has a process_data_with_model which creates a temp file and calls process_file_with_model

Similarly process_data_with_ocr creates a temp file and calls process_file_with_ocr

As in Unstructured-IO/unstructured-inference#323, Python 3.12 seemingly has a one-liner solution

Edit: I noticed this is focused on the name aspect, so this is probably a separate issue

@scanny
Copy link
Collaborator

scanny commented May 22, 2024

I believe this was fixed by #2617, which was a fix for #2308.

Closing as fixed. Feel free to reopen if I've misunderstood the bug.

@scanny scanny closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants