pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install --upgrade ultocr # install our project with package
# for inference phase
from ultocr.inference import OCR
from PIL import Image
model = OCR(det_model='DB', reg_model='MASTER')
image = Image.open('..') # ..is the path of image
result = model.get_result(image)
Or view in google colab demo
git clone https://github.com/cuongngm/text-in-image
pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
bash scripts/download_weights.sh
Model | size(MB) |
---|---|
DB | 140 |
MASTER | 261 |
Custom params in each config file of config folder then:
Single gpu training:
python train.py --config config/db_resnet50.yaml --use_dist False
# tracking with mlflow
mlflow run text-in-image -P config=config/db_resnet50.yaml -P use_dist=False -P device=1
Multi gpu training:
# assume we have 2 gpu
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=2 --master_addr=127.0.0.1 --master_post=5555 train.py --config config/db_resnet50.yaml
python run.py
Then, open your browser at http://127.0.0.1:8000/docs. Request url of the image, the result is as follows:
- Multi gpu training
- Tracking experiments with Mlflow
- Model serving with FastAPI
- Add more text detection and recognition model