Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

The official code of LPV

LPV proposes a Cascade Position Attention (CPA) strategy and a Global Linguistic Reconstruction Module to aggregate linguistic information in both query and features. The pipeline is shown in the following figure.

ToDo List

Install requirements

This work was tested with PyTorch 1.7.0, CUDA 10.1, python 3.6 and Ubuntu 16.04.
To install other dependencies:
```
pip install -r requirements.txt
```

Datasets

Download lmdb dataset from Scene Text Recognition with Permuted Autoregressive Sequence Models.

The structure of data folder as below.

dataset
├── evaluation
│   ├── CUTE80
│   ├── IC13_857
│   ├── IC15_1811
│   ├── IIIT5k
│   ├── SVT
│   └── SVTP
├── training
│   ├── MJ
│   │   ├── MJ_test
│   │   ├── MJ_train
│   │   └── MJ_valid
│   └── ST

Pretrained Models

Available model weights:

Tiny	Small	Base
best_tiny_model	best_small_model	best_base_model

Train

The training is divided into two stages. 4 3090 GPUs are used in this implementation.

Stage 1 (w/o mask in GLRM)

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_port 29501 train_final_dist.py \
--isrand_aug --backbone svtr_tiny --trans_ln 2 --exp_name svtr-tiny-exp \
--batch_size 96 --num_iter 413940 --drop_iter 240000

Stage 2 (with mask in GLRM)

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_port 29501 train_final_dist.py \
--isrand_aug --backbone svtr_tiny --trans_ln 2 --exp_name svtr-tiny-exp-mask \
--batch_size 96 --num_iter 413940 --drop_iter 240000 \
--mask --saved_model [dir_to_checkpoint_of_the_first_stage]

Explanation of parameters:

--backbone：	Can be choosed in [svtr_tiny, svtr_small, svtr_base]
--trans_ln：	The layer of number in GLRM. We set to 2 in LPV-Tiny and 3 in LPV-Small and  LPV-Base.
--exp_name：	The name of experiment folder to save logs and checkpoints.
--batch_size：	The batch size of each GPU. Default is 96.
--num_iter：	The total steps in training. Default is 413940, which equals to 10 epoches when training on MJ and ST.
--drop_iter：	The drop position. Default is 240000.
--mask：	Whether to use mask in GLRM.
--saved_model：	Resume the training.
--imgH:		The height of input image.
--imgW:		The width of input image.

The image size is set to 48*160 for LPV-Base, so it is necessary to add two parameters: --imgH 48 and --imgW 160 when training.

Evaluation

CUDA_VISIBLE_DEVICES=0 python test_final.py --benchmark_all_eval \
--exp_name [the_exp_name] --backbone svtr_tiny --trans_ln 2 \ 
--model_dir [dir_to_your_checkpoint] --eval_data [dir_to_your_evaluated_data] \
--batch_size 96 --mask --show attn --fast_acc

Explanation of parameters:

--exp_name：	The name of experiment folder.
--backbone：	Can be choosed in [svtr_tiny, svtr_small, svtr_base]
--trans_ln：	The layer of number in GLRM. We set to 2 in LPV-Tiny and 3 in LPV-Small and  LPV-Base.
--model_dir：	The direction of the checkpoint.
--eval_data：	The direction of the evaluated data.
--fast_acc：	To test on six benchmarks.

Citation

If you find our method useful for your reserach, please cite

@article{zhang2023linguistic,
  title={Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition},
  author={Zhang, Boqiang and Xie, Hongtao and Wang, Yuxin and Xu, Jianjun and Zhang, Yongdong},
  journal={arXiv preprint arXiv:2305.05140},
  year={2023}
}

Acknowledgements

This implementation has been based on these repository CLOVA AI: deep text recognition benchmark, Advanced Literate Machinery: MGP-STR

Feedback

Suggestions and discussions are greatly welcome. Please contact the authors by sending email to [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
augmentation		augmentation
images		images
modules		modules
LICENSE		LICENSE
README.md		README.md
create_lmdb_dataset.py		create_lmdb_dataset.py
dataset.py		dataset.py
models.py		models.py
requirements.txt		requirements.txt
test_final.py		test_final.py
train_final_dist.py		train_final_dist.py
utils.py		utils.py
utils_dist.py		utils_dist.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

ToDo List

Install requirements

Datasets

Pretrained Models

Train

Evaluation

Citation

Acknowledgements

Feedback

About

Releases

Packages

Languages

License

CyrilSterling/LPV

Folders and files

Latest commit

History

Repository files navigation

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

ToDo List

Install requirements

Datasets

Pretrained Models

Train

Evaluation

Citation

Acknowledgements

Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages