forked from pxnguyen/videotext
-
Notifications
You must be signed in to change notification settings - Fork 0
/
readme.txt
242 lines (200 loc) · 8.72 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
In the spirit of repeatability, we have released the code used to
generate the main results in our paper, "End-to-end Scene Text
Recognition," K. Wang, B. Babenko, S. Belongie. ICCV 2011. We hope you
find this helpful!
Email contact: [email protected]
Project site: http://vision.ucsd.edu/project/grocr
This document walks through how to do three things [estimated time]:
I. [1 minute] QUICK DEMO. A simple demo of running our pre-trained
system on an image. Note: our system was trained in the ICDAR and
SVT settings. If your images are significantly different than the
images found in those datasets, then re-training the system should
make big difference in performance.
II. [30 minutes] EVALUATION CODE. A walk through of the evaluation
code. You can run your method on the same datasets, format your
output in the same way, and use our evaluation code. The code
creates precision/recall curves and does non max suppression at the
word-level.
III. [overnight] REPRODUCE RESULTS. A guide on how to train our system
from scratch and reproduce the most of the results presented in the
paper.
======================================================================
Prerequisites
======================================================================
- Install libsvm
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
- Install Piotr Dollar's Matlab Toolbox
http://vision.ucsd.edu/~pdollar/toolbox/doc/
======================================================================
I. QUICK DEMO
======================================================================
This will display the result of running our system on an image.
>> demoImg
======================================================================
II. EVALUATION CODE
======================================================================
This describes how to use our evaluation code. Some amount of
preparation is needed to set up the ground truth labels, etc. This is
designed so that one can more easily compare results on the same
datasets. To see how the output should be formatted, observe the
pre-generated results that we have posted.
1. Download relevant data and run 'prep' scripts to get it into a
common format (supported by Piotr's toolbox).
- Identify a folder that will store all the data. We will refer to
this as dPath. Update the globals.m file to reflect this.
- Download ICDAR ROBUST READING (ICDAR) from,
http://algoval.essex.ac.uk/icdar/Datasets.html#RobustReading.html
Move downloaded files here,
[dPath]/icdar/raw/
After moving, the folder should look like,
[dPath]/icdar/raw/SceneTrialTest/.
[dPath]/icdar/raw/SceneTrialTrain/.
- Download STREET VIEW TEXT (SVT) from,
http://vision.ucsd.edu/~kai/svt/
Move the img folder and xml files here,
[dPath]/svt/raw/
After moving, your folder should look like,
[dPath]/svt/raw/img/.
[dPath]/svt/raw/test.xml
[dPath]/svt/raw/train.xml
- Prepare the raw folders to put them into a common format
>> prepIcdar
[An error is expected on image I00797. There is a missing
character leve bounding box in the word.]
>> prepSvt
- Download the pre-generated lexicons (alternatively, you can
generate these again -- but since they're generated randomly,
using the same lexicons will make for a direct comparison).
Download from,
http://vision.ucsd.edu/~kai/grocr/release/icdar_test_lex.zip
Move downloaded files here,
[dPath]/icdar/test/
After moving, your folder should look like,
[dPath]/icdar/test/lex5
[dPath]/icdar/test/lex20
[dPath]/icdar/test/lex50
- Download the pre-generated ICDAR and SVT output.
Download from,
http://vision.ucsd.edu/~kai/grocr/release/icdar_plex+r.zip
Move downloaded files here,
[dPath]/icdar/test/EZ/
After moving, your folder should look like,
[dPath]/icdar/test/EZ/plex+r/images
Download from,
http://vision.ucsd.edu/~kai/grocr/release/icdar_swt+plex+r.zip
Move downloaded files here,
[dPath]/icdar/test/EZ/
After moving, your folder should look like,
[dPath]/icdar/test/EZ/swt+plex+r/images
Download from,
http://vision.ucsd.edu/~kai/grocr/release/svt_plex+r.zip
Move downloaded files here,
[dPath]/svt/test/EZ/
After moving, your folder should look like,
[dPath]/svt/test/EZ/plex+r/images
- Finally, run eval code (this needs to be run separately for
ICDAR and SVT. See the comments in the code).
>> genPrCurvesEZ
======================================================================
III. REPRODUCE RESULTS
======================================================================
1. Download relevant data and run 'prep' scripts to get it into a
common format (supported by Piotr's toolbox).
- Identify a folder that will store all the data. We will refer to
this as dPath. Update the globals.m file to reflect this.
- Download ICDAR ROBUST READING (ICDAR) from,
http://algoval.essex.ac.uk/icdar/Datasets.html#RobustReading.html
Move downloaded files here,
[dPath]/icdar/raw/
After moving, the folder should look like,
[dPath]/icdar/raw/SceneTrialTest/.
[dPath]/icdar/raw/SceneTrialTrain/.
- Download STROKE WIDTH TRANSFORM (SWT) output from,
http://vision.ucsd.edu/~kai/grocr/release/swt_train.txt
http://vision.ucsd.edu/~kai/grocr/release/swt_test.txt
Move the swt.txt files into their respective train and test
directories.
[dPath]/icdar/raw/SceneTrialTrain/
[dPath]/icdar/raw/SceneTrialTest/
After moving, your folder should look like,
[dPath]/icdar/raw/SceneTrialTrain/swt.txt
[dPath]/icdar/raw/SceneTrialTest/swt.txt
- Download the pre-genereated ABBYY OCR results from,
http://vision.ucsd.edu/~kai/grocr/release/abbyyout.tar
Move the output files here,
[dPath]/icdar/train/abbyy
[dPath]/icdar/test/abbyy
[dPath]/svt/train/abbyy
[dPath]/svt/test/abbyy
After moving, the folder should look like,
[dPath]/icdar/train/abbyy/words/.
[dPath]/icdar/train/abbyy/wordsPad/.
[dPath]/icdar/train/abbyy/wordsSWT/.
[dPath]/icdar/train/abbyy/wordsSWTpad/.
[dPath]/icdar/test/abbyy/words/.
[dPath]/icdar/test/abbyy/wordsPad/.
[dPath]/icdar/test/abbyy/wordsSWT/.
[dPath]/icdar/test/abbyy/wordsSWTpad/.
[dPath]/svt/train/abbyy/wordsPad/.
[dPath]/svt/test/abbyy/wordsPad/.
- Download STREET VIEW TEXT (SVT) from,
http://vision.ucsd.edu/~kai/svt/
Move the img folder and xml files here,
[dPath]/svt/raw/
After moving, your folder should look like,
[dPath]/svt/raw/img/.
[dPath]/svt/raw/test.xml
[dPath]/svt/raw/train.xml
- Download the pre-rendered synthetic character training data
(SYNTH) from,
http://vision.ucsd.edu/~kai/grocr/release/synth_release.zip
Move data here,
[dPath]/synth/
After moving, the folder should look like,
[dPath]/synth/train/.
[dPath]/synth/test/.
[dPath]/synth/clfs/.
- Download the Microsoft Research Cambridge Object Recognition Image
Database from,
http://research.microsoft.com/en-us/downloads/b94de342-60dc-45d0-830b-9f6eff91b301/default.aspx
Move the scenes, buildings, and miscellaneous folders here,
[dPath]/msrc/raw/
After moving, the folder should look like,
[dPath]/msrc/raw/scenes/.
[dPath]/msrc/raw/scenes/countryside/.
[dPath]/msrc/raw/scenes/office/.
[dPath]/msrc/raw/scenes/urban/.
[dPath]/msrc/raw/buildings/.
[dPath]/msrc/raw/miscellaneous/.
- Prepare the raw folders to put them into a common format
>> prepIcdar
[An error is expected on image I00797. There is a missing
character level bounding box in the word.]
>> prepSvt
>> prepMsrc
2. Train character classifiers
>> trainChClfs
3. Generate results
- cropped word recognition. results will be output to a text file of
the form table2_<timestamp>.txt
>> createTable2
>> createTable2Abbyy
- full image results. results will be stored in mat files per image
to be used in the evaluation step. the workspace variables are
also stored.
>> precompFullImage
>> precompSwtPlex
>> precompSwtAbbyy
- generate various lexicons for icdar
>> genLexIcdar
- train the word-level SVM
>> trainWdClfs
- collect results and create figures:
>> genPrCurves
======================================================================
III. More demos
======================================================================
ICDAR DEMO: input file number from test set,
>> demoIcdar(23)
SVT DEMO: input file number from test set,
>> demoSVT(18)