Skip to content

Latest commit

 

History

History
29 lines (18 loc) · 992 Bytes

README.md

File metadata and controls

29 lines (18 loc) · 992 Bytes

OCRTool

When applying fancy new AI models on top of documents, the first step in NLP pipelines is often to extract data out of myriad formats like PDFs and image files.

With the release of the Live Text feature on iOS and macOS, Apple also unveiled the VisionKit APIs for extracting text programmaticaly from documents using their same industry-leading OCR.

We find anecdotally that it's superior to running the same documents through Tesseract, so it seemed worth it to wrap into a little CLI tool.

Using

Grab the latest Universal macOS binary from the Releases page.

From there, you can run it on any document. Example:

$ OCRTool us_passport.jpg
$ OCRTool invoice.pdf

Building

Either open the project in XCode and click Build, or checkout the repo and from the root, run xcodebuild.

The output files will be generated in the build/Release/OCRTool directory.