- We collected labeled data from 16 publicly available real datasets to construct REBU-Syn. The details of these datasets are listed in the following table.
- We collected labeled data from 4 publicly available synthetic datasets to construct REBU-Syn. The details of these datasets are listed in the following table.
Data file name | Size | Link | License |
---|---|---|---|
MJ | 6M | https://www.robots.ox.ac.uk/~vgg/data/text/ | Unknown |
ST | 9M | https://www.robots.ox.ac.uk/~vgg/data/scenetext/ | Unknown |
Curved SynthText | 1.7M | https://github.com/Jyouhou/ICDAR2019-ArT-Recognition-Alchemy | Apache License 2.0 |
SynthAdd | 1.2M | https://github.com/wangpengnorman/SAR-Strong-Baseline-for-Text-Recognition | Unknown |
Download the training dataset from the following links:
- LMDB archives for MJ, ST, IIIT5k, SVT, SVTP, IC13, IC15, CUTE80, ArT, RCTW17, ReCTS, LSVT, MLT19, COCO-Text, and Uber-Text.
- LMDB archives for TextOCR and OpenVINO.
- LMDB archives for Union14M_L_lmdb_format.
- CTW1500
- Total-Text
- SynthAdd
- Curved SynthText
Then, organize the data as follows:
├── REBU-Syn
├── train
│ └── synth_and_real
│ ├── Curved_SynthText
│ │ ├── syntext1
│ │ └── syntext2
│ ├── SynthAdd
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── Union14M_L_lmdb_format
│ │ ├── difficult
│ │ ├── hard
│ │ ├── hell
│ │ ├── medium
│ │ └── simple
│ ├── benchmark
│ │ ├── ICDAR2013
│ │ ├── ICDAR2015
│ │ ├── IIIT5K
│ │ └── SVT
│ ├── extra
│ │ ├── CTW1500
│ │ └── total_text
│ └── real_data
│ │ ├── ArT
│ │ ├── COCOv2.0
│ │ ├── LSVT
│ │ ├── MLT19
│ │ ├── OpenVINO
│ │ ├── RCTW17
│ │ ├── ReCTS
│ │ ├── TextOCR
│ │ └── Uber
│ └── mj_st
│ ├── data.mdb
│ └── lock.mdb
└── val
│ ├── CUTE80
│ ├── IC13_1015
│ ├── IC15_1811
│ ├── IIIT5k
│ ├── SVT
│ └── SVTP
├── test
│ ├── CUTE80
│ ├── IC13_1015
│ ├── IC13_857
│ ├── IC15_1811
│ ├── IIIT5k
│ ├── SVT
│ └── SVTP
We generated MJST+(60M) using TextRecognitionDataGenerator and SynthText. For specific generation methods, please refer to GenData.md
We sincerely thank all the constructors of the 20 datasets used in REBU-Syn.