This repository demonstrates implementations of Optical Character Recognition (OCR) for text in the wild, designed for both cloud and edge deployment scenarios. Our goal is to provide flexible, efficient OCR solutions that can be adapted to various use cases and hardware constraints.
Our OCR system employs a two-step process for accurate text recognition in natural scenes:
-
Text Detection: This step identifies and locates all instances of text within an image. It produces bounding boxes around detected text areas.
-
Text Recognition: After detection, the identified text regions are cropped and batched. These batches are then passed through a recognition model that converts the image of text into machine-readable characters.
This repository is divided into two main sections:
- Cloud OCR: A scalable, high-performance OCR system designed for cloud deployment.
- Edge OCR: A lightweight, efficient OCR model suitable for deployment on edge devices.
Our cloud-based OCR system leverages pre-trained models from PaddlePaddleOCR, converted to ONNX format for improved performance and compatibility. It uses Triton Inference Server for efficient model serving and exposes its functionality through a Flask API.
Key features of the cloud implementation:
- Uses state-of-the-art PaddlePaddleOCR models for text detection and recognition
- ONNX-format models for cross-platform compatibility
- Triton Inference Server for scalable model serving
- Flask API for easy integration with various applications
- Designed for deployment in cloud environments
- Efficient batching of detected text regions for recognition
For more details, see the Cloud OCR README.
Our edge-based OCR system is designed to run efficiently on resource-constrained devices. It uses custom, lightweight model architectures for both detection and recognition that prioritize speed and low memory footprint without significantly compromising accuracy.
Key features of the edge implementation:
- Custom CNN architectures optimized for edge devices
- Uses only convolutional operations for wide hardware compatibility
- Text detection model designed for quick region proposals
- Text recognition model trained with CTC loss for efficient sequence learning
- Configurable for different input sizes and character sets
- Easily convertible to ONNX and TensorRT for further optimization
- Optimized pipeline for efficient processing of detected text regions
For more details, see the Edge OCR README.
Feature | Cloud OCR | Edge OCR |
---|---|---|
Deployment Target | High-performance cloud servers | Resource-constrained edge devices |
Model Source | Pre-trained PaddlePaddleOCR | Custom-trained lightweight models (Compatible ops) |
Text Detection | Advanced detection model | Yolov8 |
Text Recognition | Advanced LSTM-based model | Compact CNN with CTC loss |
Inference Server | Triton Inference Server | Direct inference |
Primary Advantages | High accuracy, scalability, centralized | Low latency, minimal resource usage, privacy, offline, cost |
To get started with either the cloud or edge implementation:
- Clone this repository:
git clone https://github.com/your-username/ocr-implementations.git
- Navigate to either the
cloud
oredge
directory:cd ocr-implementations/cloud # or cd ocr-implementations/edge
- Follow the setup instructions in the respective README files.