Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert PDF to PNG for OCR #381

Merged
merged 5 commits into from
Jul 10, 2023
Merged

Convert PDF to PNG for OCR #381

merged 5 commits into from
Jul 10, 2023

Conversation

alexk307
Copy link
Contributor

Describe the change
Adds the ability to take a snapshot of the PDF file's first page as a png in order to run OCR on a PDF

Describe testing procedures
Exploded a PDF file and checked the OCR response contained the text.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of and tested my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

@phutelmyer
Copy link
Contributor

@alexk307 This is a great idea - had no clue about get_page_pixmap. There's probably not much value in adding in the ability convert additional pages, right?

@phutelmyer phutelmyer self-requested a review July 3, 2023 20:40
@phutelmyer phutelmyer added the enhancement New feature or request label Jul 3, 2023
@alexk307
Copy link
Contributor Author

alexk307 commented Jul 5, 2023

This is a great idea - had no clue about get_page_pixmap. There's probably not much value in adding in the ability convert additional pages, right?

Your call! I think for my purposes, it's fine to grab the first page. But we could add another config option in the future to set the max number of pages to scan.

@phutelmyer phutelmyer merged commit 0d8cdae into target:master Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants