Skip to content

learn to build OCR and image understanding system

Notifications You must be signed in to change notification settings

cuongngm/text-in-image

Repository files navigation

Quickstart

pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install --upgrade ultocr  # install our project with package

# for inference phase
from ultocr.inference import OCR
from PIL import Image
model = OCR(det_model='DB', reg_model='MASTER')
image = Image.open('..')  # ..is the path of image
result = model.get_result(image)

Or view in google colab demo

Install

git clone https://github.com/cuongngm/text-in-image
pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
bash scripts/download_weights.sh

Prepare data

Pretrained model

Model size(MB)
DB 140
MASTER 261

Train

Custom params in each config file of config folder then:

Single gpu training:

python train.py --config config/db_resnet50.yaml --use_dist False
# tracking with mlflow
mlflow run text-in-image -P config=config/db_resnet50.yaml -P use_dist=False -P device=1

Multi gpu training:

# assume we have 2 gpu
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=2 --master_addr=127.0.0.1 --master_post=5555 train.py --config config/db_resnet50.yaml

Serve and Inference

python run.py

Then, open your browser at http://127.0.0.1:8000/docs. Request url of the image, the result is as follows:

Todo

  • Multi gpu training
  • Tracking experiments with Mlflow
  • Model serving with FastAPI
  • Add more text detection and recognition model

Reference

About

learn to build OCR and image understanding system

Resources

Stars

Watchers

Forks

Packages

No packages published