You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Same issue as Unstructured-IO/unstructured-inference#303, I couldn't find an equivalent ticket on this project. Temp files run into an issue in Windows when they are opened/closed within the scope of the NamedTemporaryFile()
is a temp file created to pass as filename to process_file_with_ocr -> pdf2image.convert_from_path which then invokes pdfinfo on the tempfile yielding an error like
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file <temp file path here>: No error.
importtempfile# print operating system nameimportosprint(os.name)
# Create a temporary filewithtempfile.NamedTemporaryFile() astmp_file:
# Write some data to the filetmp_file.write(b'Hello, world!')
tmp_file.flush() # Flush the buffer to make sure data is written# Get the name of the filefile_name=tmp_file.name# Since the file is closed after the with block, we need to open it again for readingwithopen(file_name, 'r') asfile:
# Read the data from the filecontent=file.read()
print("Content of the temp file:", content)
Expected behavior
Expected not to error, and to be able to support tempfiles on Windows
The text was updated successfully, but these errors were encountered:
Describe the bug
Same issue as Unstructured-IO/unstructured-inference#303, I couldn't find an equivalent ticket on this project. Temp files run into an issue in Windows when they are opened/closed within the scope of the NamedTemporaryFile()
In line:
unstructured/unstructured/partition/pdf_image/ocr.py
Line 79 in d3a404c
is a temp file created to pass as filename to
process_file_with_ocr
->pdf2image.convert_from_path
which then invokes pdfinfo on the tempfile yielding an error likeTo Reproduce
On Windows
Note: first the issue outlined in Unstructured-IO/unstructured-inference#303 will occur, but once that is fixed (e.g. by applying Unstructured-IO/unstructured-inference#323) it will error on the ocr code as mentioned above
Expected behavior
Expected not to error, and to be able to support tempfiles on Windows
The text was updated successfully, but these errors were encountered: