Skip to content

11 MB Tesseract (with English training data) to fit inside AWS Lambda compressed with Brotli

License

Notifications You must be signed in to change notification settings

Droplr/aws-lambda-tesseract

 
 

Repository files navigation

aws-lambda-tesseract Tesseract

11 MB Tesseract (with English training data) to fit inside AWS Lambda compressed with Brotli

Inspired by chrome-aws-lambda & lambda-scanner-ocr

Install

$ yarn add @shelf/aws-lambda-tesseract

How does it work?

This package contains an archive with Tesseract 4.0 beta compiled for usage in AWS Lambda environment.

When a Lambda starts, it unpacks an archive with a binary to the /tmp folder and makes sure it's done only once per Lambda cold start.

Usage

const {getTextFromImage, isSupportedFile} = require('@shelf/aws-lambda-tesseract');

module.exports.handler = async event => {
  // assuming there is a photo.jpg inside /tmp dir
  // original file will be deleted afterwards

  if (!isSupportedFile('/tmp/photo.jpg')) {
    return false;
  }

  return getTextFromImage('/tmp/photo.jpg');
};

isSupportedFile checks that file has image-like file extension and it's not in the list of unsupported by Tesseract file extensions.

Compile It Yourself

See compile-tesseract.sh & compress-with-brotli.sh files

See Also

License

MIT © Shelf

About

11 MB Tesseract (with English training data) to fit inside AWS Lambda compressed with Brotli

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 53.2%
  • Shell 46.8%