OCR - remove PDFBox dependencies #130

rsoika · 2020-11-06T14:11:21Z

We can remove the apache pdfBox dependencies as we can switch to Tika support only.

The advantage is the we reduce complexity. For this technology stack there is no need to assume any setup without a tika server instance. If someone needs this feature a custom implementation based on pdfBox can be used by the project.

So in future the OCRService will throw a exception if no Tika Service Endpoint is defined. All ocr functionality is handed over to tika only!

Issue #130

rsoika changed the title ~~OCR - determine DPIs for parsing~~ OCR - remove PDFBox dependencies Nov 6, 2020

rsoika added enhancement feature OCR labels Nov 6, 2020

rsoika added this to the 2.2.7 milestone Nov 6, 2020

rsoika added a commit that referenced this issue Nov 6, 2020

refactoring, documentation

427584e

Issue #130

rsoika added the testing label Nov 14, 2020

rsoika closed this as completed Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR - remove PDFBox dependencies #130

OCR - remove PDFBox dependencies #130

rsoika commented Nov 6, 2020 •

edited

Loading

OCR - remove PDFBox dependencies #130

OCR - remove PDFBox dependencies #130

Comments

rsoika commented Nov 6, 2020 • edited Loading

rsoika commented Nov 6, 2020 •

edited

Loading