You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
is a tmp file created to pass as filename to process_file_with_model -> DocumentLayout.from_file -> load_pdf -> extract_pages (pdf_miner).
The extract_pages tries to read the file again with open_filename(pdf_file, "rb") as fp:.
Which results in a PermissionError: [Errno 13] Permission denied: 'C:\\Users\\...\\AppData\\Local\\Temp\\tmpf9flca30' under windows.
import tempfile
# print operating system name
import os
print(os.name)
# Create a temporary file
with tempfile.NamedTemporaryFile() as tmp_file:
# Write some data to the file
tmp_file.write(b'Hello, world!')
tmp_file.flush() # Flush the buffer to make sure data is written
# Get the name of the file
file_name = tmp_file.name
# Since the file is closed after the with block, we need to open it again for reading
with open(file_name, 'r') as file:
# Read the data from the file
content = file.read()
print("Content of the temp file:", content)
def process_data_with_model(
data: BinaryIO,
model_name: Optional[str],
**kwargs,
) -> DocumentLayout:
"""Processes pdf file in the form of a file handler (supporting a read method) into a
DocumentLayout by using a model identified by model_name."""
with tempfile.TemporaryDirectory() as td:
f_name = os.path.join(td, "tmp_file")
with open(f_name, "w") as tmp_file:
tmp_file.write(data.read())
tmp_file.flush()
layout = process_file_with_model(
f_name,
model_name,
**kwargs,
)
return layout
or another solution by gpt:
import tempfile
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
tmp_file.write(b'Hello, world!')
# Get the name of the file before closing
file_name = tmp_file.name
# Now the file is closed, you can open it again
with open(file_name, 'r') as file:
content = file.read()
print("Content of the temp file:", content)
# Optionally, delete the file if you don't need it anymore
import os
os.remove(file_name)
Not sure which is better.
The latter one probably requires a try catch final with the removal and then reraise the error.
The text was updated successfully, but these errors were encountered:
Describe the bug
In line:
unstructured-inference/unstructured_inference/inference/layout.py
Line 362 in d4785df
is a tmp file created to pass as filename to
process_file_with_model
->DocumentLayout.from_file
->load_pdf
->extract_pages
(pdf_miner).The extract_pages tries to read the file again with
open_filename(pdf_file, "rb") as fp:
.Which results in a
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\...\\AppData\\Local\\Temp\\tmpf9flca30'
under windows.Same error here:
https://github.com/Unstructured-IO/unstructured/blob/d3a404cfb541dae8e16956f096bac99fc05c985b/unstructured/partition/pdf_image/ocr.py#L79
To Reproduce
Expected behavior
I expect it not to crash :)
Additional context
Possible solution taken from here: https://stackoverflow.com/questions/39983886/python-writing-and-reading-from-a-temporary-file
or another solution by gpt:
Not sure which is better.
The latter one probably requires a try catch final with the removal and then reraise the error.
The text was updated successfully, but these errors were encountered: