11 MB Tesseract (with English training data) to fit inside AWS Lambda compressed with Brotli
Inspired by chrome-aws-lambda & lambda-scanner-ocr
$ yarn add @shelf/aws-lambda-tesseract
This package contains an archive with Tesseract 4.0 beta compiled for usage in AWS Lambda environment.
When a Lambda starts, it unpacks an archive with a binary to the /tmp
folder and makes sure it's done only once per Lambda cold start.
const {getTextFromImage, isSupportedFile} = require('@shelf/aws-lambda-tesseract');
module.exports.handler = async event => {
// assuming there is a photo.jpg inside /tmp dir
// original file will be deleted afterwards
if (!isSupportedFile('/tmp/photo.jpg')) {
return false;
}
return getTextFromImage('/tmp/photo.jpg');
};
isSupportedFile
checks that file has image-like file extension and it's not in the list of
unsupported by Tesseract file extensions.
See compile-tesseract.sh & compress-with-brotli.sh files
MIT © Shelf