Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai.
git clone --recursive [email protected]:ZJULearning/pixel_link.git
Denote the root directory path of pixel_link by ${pixel_link_root}
.
Add the path of ${pixel_link_root}/pylib/src
to your PYTHONPATH
:
export PYTHONPATH=${path_to_pixel_link}/pylib/src:$PYTHONPATH
(Only tested on) Ubuntu14.04 and 16.04 with:
- Python 2.7
- Tensorflow-gpu >= 1.1
- opencv2
- setproctitle
- matplotlib
Anaconda is recommended to for an easier installation:
- Install Anaconda
- Create and activate the required virtual environment by:
conda env create --file pixel_link_env.txt
source activate pixel_link
- PixelLink + VGG16 4s, trained on IC15
- PixelLink + VGG16 2s, trained on IC15
Unzip the downloaded model. It contains 4 files:
- config.py
- model.ckpt-xxx.data-00000-of-00001
- model.ckpt-xxx.index
- model.ckpt-xxx.meta
Denote their parent directory as ${model_path}
.
The reported results on ICDAR2015 are:
Model | Recall | Precision | F-mean |
---|---|---|---|
PixelLink+VGG16 2s | 82.0 | 85.5 | 83.7 |
PixelLink+VGG16 4s | 81.7 | 82.9 | 82.3 |
Suppose you have downloaded the ICDAR2015 dataset, execute the following commands to test the model on ICDAR2015:
cd ${pixel_link_root}
./scripts/test.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${path_to_icdar2015}/ch4_test_images
For example:
./scripts/test.sh 3 ~/temp/conv3_3/model.ckpt-38055 ~/dataset/ICDAR2015/Challenge4/ch4_test_images
The program will create a zip file of detection results, which can be submitted to the ICDAR2015 server directly.
The detection results can be visualized via scripts/vis.sh
.
Put the images to be tested in a single directory, i.e., ${image_dir}
. Then:
cd ${pixel_link_root}
./scripts/test_any.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${image_dir}
For example:
./scripts/test_any.sh 3 ~/temp/conv3_3/model.ckpt-38055 ~/dataset/ICDAR2015/Challenge4/ch4_training_images
The program will visualize the detection results directly on images. If the detection result is not satisfying, try to:
- Adjust the inference parameters like
eval_image_width
,eval_image_height
,pixel_conf_threshold
,link_conf_threshold
. - Or train your own model.
Scripts for converting ICDAR2015 and SynthText datasets have been provided in the datasets
directory.
It not hard to write a converting script for your own dataset.
- Modify
scripts/train.sh
to configure your dataset name and dataset path like:
DATASET=icdar2015
DATASET_DIR=$HOME/dataset/pixel_link/icdar2015
- Start training
./scripts/train.sh ${GPU_IDs} ${IMG_PER_GPU}
For example, ./scripts/train.sh 0,1,2 8
.
The existing training strategy in scripts/train.sh
is configured for icdar2015, modify it if necessary. A lot of training or model options are available in config.py
, try it yourself if you are interested.