Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: improve reading order #2208

Closed
christinestraub opened this issue Dec 4, 2023 · 0 comments · Fixed by #2219
Closed

Enhancement: improve reading order #2208

christinestraub opened this issue Dec 4, 2023 · 0 comments · Fixed by #2219
Assignees
Labels
enhancement New feature or request

Comments

@christinestraub
Copy link
Collaborator

christinestraub commented Dec 4, 2023

Is your feature request related to a problem? Please describe.

PDF: 124_PDFsam_Basel III - Finalising post-crisis reforms.pdf

elements = partition_pdf(
    filename="124_PDFsam_Basel III - Finalising post-crisis reforms.pdf",
    strategy="hi_res",
    infer_table_structure=True,
)

There are 2 cases of formulas appear at the beginning of the element for some reason:

= . % kRW 2 25 . Risk weights for both risk-free yield curve and inflation rate are set at
...
= kRW RW ⋅ 6 , where Risk weights for both interest rate and inflation volatilities are set to σRW is set at 55%. σ

Describe the solution you'd like
Merge with XY-cut, with a preference of walking horizontally instead of vertically like the current default.

Describe alternatives you've considered
Pre-process bboxes before merge to alter size to Y-axis is aligned for tiny differences.

Additional context

  • Extracted elements by pdfminer
    124_PDF_sam_Basel_1_extracted
@christinestraub christinestraub added the enhancement New feature or request label Dec 4, 2023
@christinestraub christinestraub self-assigned this Dec 4, 2023
cragwolfe pushed a commit that referenced this issue Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant