Skip to content

Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting

Notifications You must be signed in to change notification settings

yeungchenwa/OCR-SAM

Repository files navigation

Optical Character Recognition with Segment Anything (OCR-SAM)

🐇 Introduction 🐙

Can SAM be applied to OCR? We take a simple try to combine two off-the-shelf OCR models in MMOCR with SAM to develop some OCR-related application demos, including SAM for Text, Text Removal and Text Inpainting. And we also provide a WebUI by gradio to give a better interaction.

📅 Updates 👀

  • 2023.08.23: 🔥 We create a repo yeungchenwa/Recommendations-Diffusion-Text-Image to provide a paper collection of recent diffusion models for text-image generation tasks.
  • 2023.04.14: 📣 Our repository is migrated to open-mmlab/playground.
  • 2023.04.12: Repository Release
  • 2023.04.12: Supported the Inpainting combined with DBNet++, SAM and Stable-Diffusion.
  • 2023.04.11: Supported the Erasing combined with DBNet++, SAM and Latent-Diffusion / Stable-Diffusion.
  • 2023.04.10: Supported the SAM for text combined tieh DBNet++ and SAM.
  • 2023.04.09: How effective is the SAM used on OCR Text Image, we've discussed it in the Blog.

📸 Demo Zoo 🔥

This project includes:

🚧 Installation 🛠️

Prerequisites(Recommended)

  • Linux | Windows
  • Python 3.8
  • Pytorch 1.12
  • CUDA 11.3

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/OCR-SAM.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n ocr-sam python=3.8 -y
conda activate ocr-sam

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

Step 3: Install the mmengine, mmcv, mmdet, mmcls, mmocr.

pip install -U openmim
mim install mmengine
mim install mmocr
# In Window, the following symbol ' should be changed to "
mim install 'mmcv==2.0.0rc4'
mim install 'mmdet==3.0.0rc5'
mim install 'mmcls==1.0.0rc5'


# Install sam
pip install git+https://github.com/facebookresearch/segment-anything.git

# Install required packages
pip install -r requirements.txt

Step 4: Prepare for the diffusers and latent-diffusion.

# Install Gradio
pip install gradio

# Install the diffusers
pip install diffusers

# Install the pytorch_lightning for ldm
pip install pytorch-lightning==2.0.1.post0

📒 Model checkpoints 🖥

We retrain DBNet++ with Swin Transformer V2 as the backbone on a combination of multiple scene text datsets (e.g. HierText, TextOCR). Checkpoint for DBNet++ on Google Drive (1G).

And you should make dir following:

mkdir checkpoints
mkdir checkpoints/mmocr
mkdir checkpoints/sam
mkdir checkpoints/ldm
mv db_swin_mix_pretrain.pth checkpoints/mmocr

Download the rest of the checkpoints to the related path (If you've done so, ignore the following):

# mmocr recognizer ckpt
wget -O checkpoints/mmocr/abinet_20e_st-an_mj_20221005_012617-ead8c139.pth https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_20e_st-an_mj/abinet_20e_st-an_mj_20221005_012617-ead8c139.pth

# sam ckpt, more details: https://github.com/facebookresearch/segment-anything#model-checkpoints
wget -O checkpoints/sam/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

# ldm ckpt
wget -O checkpoints/ldm/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

🏃🏻‍♂️ Run Demo 🏊‍♂️

SAM for Text🧐

Run the following script:

python mmocr_sam.py \
    --inputs /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
  • --inputs: the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.

Erasing🤓

In this application demo, we use the latent-diffusion-inpainting to erase, or the Stable-Diffusion-inpainting with text prompt to erase, which you can choose one of both by the parameter --diffusion_model. Also, you can choose whether to use the SAM output mask to erase by the parameter --use_sam. More implementation details are listed here

Run the following script:

python mmocr_sam_erase.py \ 
    --inputs /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
    --use_sam True \ 
    --dilate_iteration 2 \ 
    --diffusion_model \ 
    --sd_ckpt None \ 
    --prompt None \ 
    --img_size (512, 512) \ 
  • --inputs : the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.
  • --use_sam: whether to use sam for segment.
  • --dilate_iteration: iter to dilate the SAM's mask.
  • --diffusion_model: choose 'latent-diffusion' or 'stable-diffusion'.
  • --sd_ckpt: path to the checkpoints of stable-diffusion.
  • --prompt: the text prompt when use the stable-diffusion, set 'None' if use the default for erasing.
  • --img_size: image size of latent-diffusion.

Run the WebUI: see here

Note: The first time you run may cost some time, because downloading the stable-diffusion ckpt costs a lot, so wait patiently👀

Inpainting

More implementation details are listed here

Run the following script:

python mmocr_sam_inpainting.py \
    --img_path /YOUR/INPUT/IMG_PATH \ 
    --outdir /YOUR/OUTPUT_DIR \ 
    --device cuda \ 
    --prompt YOUR_PROMPT \ 
    --select_index 0 \ 
  • --img_path: the path to your input image.
  • --outdir: the dir to your output.
  • --device: the device used for inference.
  • --prompt: the text prompt.
  • --select_index: select the index of the text to inpaint.

Run WebUI

This repo also provides the WebUI(decided by gradio), including the Erasing and Inpainting.

Before running the script, you should install the gradio package:

pip install gradio

Erasing

python mmocr_sam_erase_app.py
  • Example:

Detector and Recognizer WebUI Result

Erasing WebUI Result

In our WebUI, users can interactly choose the SAM output and the diffusion model. Especially, users can choose which text to be erased.

Inpainting🥸

python mmocr_sam_inpainting_app.py
  • Example:

Inpainting WebUI Result

Note: Before you open the web, it may take some time, so wait patiently👀

💗 Acknowledgement

About

Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages