-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug/PDF elements out of order #2448
Comments
@christinestraub - Would your recent reading order updates have fixed this? |
@ron-unstructured Can you please share the PDF document used to reproduce this bug? |
Software_Engineering_for_ML.pdf Hi @christinestraub please see the attachment for the PDF file |
Hi @christinestraub do you have any updates on this? I am still seeing this bug on version 0.15.12. This pdf produces this output (note the wrong order of the titles):
|
Closing as inactive. |
Describe the bug
There is a discrepancy in the element order when partitioning a PDF. From the screenshots, the blue and red circles intended to highlight text are switched in position in the output image, compared to their correct placement in the original PDF.
To Reproduce
Run PDF partition using Python SDK with
auto
,fast
, andhi_res
strategy.Expected behavior
The expected behavior is that the element order in the output image should match the placement and color coding (blue and red circles) as they are in the original PDF document.
Screenshots
Environment Info
OS version: macOS-14.2.1-arm64-arm-64bit
Python version: 3.10.12
unstructured version: 0.12.1.dev11
unstructured-inference version: 0.7.18
pytesseract version: 0.3.10
Torch version: 2.1.1
Detectron2 is not installed
PaddleOCR is not installed
Libmagic version: ==> libmagic: stable 5.45 (bottled)
LibreOffice version: ==> libreoffice: 7.6.4
Additional context
similar issue: #2208
The text was updated successfully, but these errors were encountered: