You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I just upgraded to 0.10.15 and have noticed that PDF's that previously worked for me are no longer partitioning successfully and
result in a ValueError: Invalid coordinates. being thrown.
To Reproduce
A common PDF I use to test new versions below.
Expected behavior
To return a list of elements. Screenshots
If applicable, add screenshots to help explain your problem.
Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.
OS version: macOS-13.4-arm64-arm-64bit
Python version: 3.11.4
unstructured version: 0.10.15
unstructured-inference version: 0.5.28
pytesseract version: 0.3.10
Torch version: 2.0.1
Detectron2 is not installed
[notice] A new release of pip is available: 23.1.2 -> 23.2.1
[notice] To update, run: pip install --upgrade pip
[notice] A new release of pip is available: 23.1.2 -> 23.2.1
[notice] To update, run: pip install --upgrade pip
PaddleOCR is not installed
Libmagic version: file-5.41
magic file from /usr/share/file/magic
Traceback (most recent call last):
File "/Users/charlespierse/Documents/tactic/genie/../../unstructured/scripts/collect_env.py", line 251, in <module>
main()
File "/Users/charlespierse/Documents/tactic/genie/../../unstructured/scripts/collect_env.py", line 243, in main
libreoffice_version = get_libreoffice_version()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/charlespierse/Documents/tactic/genie/../../unstructured/scripts/collect_env.py", line 171, in get_libreoffice_version
result = subprocess.run(
^^^^^^^^^^^^^^^
File "/Users/charlespierse/.pyenv/versions/3.11.4/lib/python3.11/subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/charlespierse/.pyenv/versions/3.11.4/lib/python3.11/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Users/charlespierse/.pyenv/versions/3.11.4/lib/python3.11/subprocess.py", line 1950, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'libreoffice'
I don't have libreoffice installed and can't seem to figure out how to, but I don't think that's the cause for this anyway.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Addresses: #1460
We were raising an error with invalid coordinates, which prevented us
from continuing to return the element and continue parsing the pdf. Now
instead of raising the error we'll return early.
to test:
```
from unstructured.partition.auto import partition
elements = partition(url='https://www.apple.com/environment/pdf/Apple_Environmental_Progress_Report_2022.pdf', strategy="fast")
```
---------
Co-authored-by: cragwolfe <[email protected]>
Describe the bug
I just upgraded to
0.10.15
and have noticed that PDF's that previously worked for me are no longer partitioning successfully andresult in a
ValueError: Invalid coordinates.
being thrown.To Reproduce
A common PDF I use to test new versions below.
Expected behavior
To return a list of elements.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment Info
Please run
python scripts/collect_env.py
and paste the output here.This will help us understand more about the environment in which the bug occurred.
I don't have
libreoffice
installed and can't seem to figure out how to, but I don't think that's the cause for this anyway.Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: