Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR - remove PDFBox dependencies #130

Closed
rsoika opened this issue Nov 6, 2020 · 0 comments
Closed

OCR - remove PDFBox dependencies #130

rsoika opened this issue Nov 6, 2020 · 0 comments

Comments

@rsoika
Copy link
Member

rsoika commented Nov 6, 2020

We can remove the apache pdfBox dependencies as we can switch to Tika support only.

The advantage is the we reduce complexity. For this technology stack there is no need to assume any setup without a tika server instance. If someone needs this feature a custom implementation based on pdfBox can be used by the project.

So in future the OCRService will throw a exception if no Tika Service Endpoint is defined. All ocr functionality is handed over to tika only!

@rsoika rsoika changed the title OCR - determine DPIs for parsing OCR - remove PDFBox dependencies Nov 6, 2020
@rsoika rsoika added this to the 2.2.7 milestone Nov 6, 2020
rsoika added a commit that referenced this issue Nov 6, 2020
@rsoika rsoika added the testing label Nov 14, 2020
@rsoika rsoika closed this as completed Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant