Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/windows reopen temp file (pdf hi_res) #3076

Closed
KristianMischke opened this issue May 22, 2024 · 2 comments
Closed

bug/windows reopen temp file (pdf hi_res) #3076

KristianMischke opened this issue May 22, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@KristianMischke
Copy link

Describe the bug
Same issue as Unstructured-IO/unstructured-inference#303, I couldn't find an equivalent ticket on this project. Temp files run into an issue in Windows when they are opened/closed within the scope of the NamedTemporaryFile()

In line:

with tempfile.NamedTemporaryFile() as tmp_file:

is a temp file created to pass as filename to process_file_with_ocr -> pdf2image.convert_from_path which then invokes pdfinfo on the tempfile yielding an error like

pdf2image.exceptions.PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file <temp file path here>: No error.

To Reproduce
On Windows

Note: first the issue outlined in Unstructured-IO/unstructured-inference#303 will occur, but once that is fixed (e.g. by applying Unstructured-IO/unstructured-inference#323) it will error on the ocr code as mentioned above

import tempfile

# print operating system name
import os
print(os.name)


# Create a temporary file
with tempfile.NamedTemporaryFile() as tmp_file:
    # Write some data to the file
    tmp_file.write(b'Hello, world!')
    tmp_file.flush()  # Flush the buffer to make sure data is written

    # Get the name of the file
    file_name = tmp_file.name

    # Since the file is closed after the with block, we need to open it again for reading
    with open(file_name, 'r') as file:
        # Read the data from the file
        content = file.read()
        print("Content of the temp file:", content)

Expected behavior
Expected not to error, and to be able to support tempfiles on Windows

@KristianMischke KristianMischke added the bug Something isn't working label May 22, 2024
@MthwRobinson
Copy link
Contributor

Thanks @KristianMischke ! We'll take a look at this.

@scanny
Copy link
Collaborator

scanny commented Dec 16, 2024

Fixed by #3395.

@scanny scanny closed this as completed Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants