Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/ Invalid coordinates since upgrading to 0.10.15 #1460

Closed
cdpierse opened this issue Sep 19, 2023 · 1 comment
Closed

bug/ Invalid coordinates since upgrading to 0.10.15 #1460

cdpierse opened this issue Sep 19, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@cdpierse
Copy link
Contributor

cdpierse commented Sep 19, 2023

Describe the bug
I just upgraded to 0.10.15 and have noticed that PDF's that previously worked for me are no longer partitioning successfully and
result in a ValueError: Invalid coordinates. being thrown.

To Reproduce
A common PDF I use to test new versions below.

from unstructured.partition.auto import partition

elements = partition(
    url="https://www.apple.com/environment/pdf/Apple_Environmental_Progress_Report_2022.pdf",
    strategy="fast",
)
narrative_texts = [e.text for e in elements if e.category == "NarrativeText"]

Expected behavior
To return a list of elements.
Screenshots
If applicable, add screenshots to help explain your problem.

Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.

OS version:  macOS-13.4-arm64-arm-64bit
Python version:  3.11.4
unstructured version:  0.10.15
unstructured-inference version:  0.5.28
pytesseract version:  0.3.10
Torch version:  2.0.1
Detectron2 is not installed

[notice] A new release of pip is available: 23.1.2 -> 23.2.1
[notice] To update, run: pip install --upgrade pip

[notice] A new release of pip is available: 23.1.2 -> 23.2.1
[notice] To update, run: pip install --upgrade pip
PaddleOCR is not installed
Libmagic version: file-5.41
magic file from /usr/share/file/magic
Traceback (most recent call last):
  File "/Users/charlespierse/Documents/tactic/genie/../../unstructured/scripts/collect_env.py", line 251, in <module>
    main()
  File "/Users/charlespierse/Documents/tactic/genie/../../unstructured/scripts/collect_env.py", line 243, in main
    libreoffice_version = get_libreoffice_version()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charlespierse/Documents/tactic/genie/../../unstructured/scripts/collect_env.py", line 171, in get_libreoffice_version
    result = subprocess.run(
             ^^^^^^^^^^^^^^^
  File "/Users/charlespierse/.pyenv/versions/3.11.4/lib/python3.11/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charlespierse/.pyenv/versions/3.11.4/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/charlespierse/.pyenv/versions/3.11.4/lib/python3.11/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'libreoffice'

I don't have libreoffice installed and can't seem to figure out how to, but I don't think that's the cause for this anyway.

Additional context
Add any other context about the problem here.

@cdpierse cdpierse added the bug Something isn't working label Sep 19, 2023
amanda103 added a commit that referenced this issue Sep 20, 2023
Addresses: #1460

We were raising an error with invalid coordinates, which prevented us
from continuing to return the element and continue parsing the pdf. Now
instead of raising the error we'll return early.

to test:
```
from unstructured.partition.auto import partition

elements = partition(url='https://www.apple.com/environment/pdf/Apple_Environmental_Progress_Report_2022.pdf', strategy="fast")
```

---------

Co-authored-by: cragwolfe <[email protected]>
@cdpierse
Copy link
Contributor Author

@amanda103 thanks for the speedy fix 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant