-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
1549 lines (1358 loc) · 61.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!--
Copyright 1999-2002 Carnegie Mellon University.
Portions Copyright 2002 Sun Microsystems, Inc.
Portions Copyright 2002 Mitsubishi Electric Research Laboratories.
All Rights Reserved. Use is subject to license terms.
See the file "license.terms" for information on usage and
redistribution of this file, and for a DISCLAIMER OF ALL
WARRANTIES.
-->
<html>
<head>
<title>Sphinx-4 - A speech recognizer written entirely in the
Java(TM) programming language</title>
<STYLE TYPE="text/css">
pre { font-size: medium; background: #f0f8ff; padding: 2mm; border-style: ridge ; color: teal}
code {font-size: medium; color: teal}
s4keyword { color: red; font-weight: bold }
</STYLE></head>
<body bgcolor="white">
<center>
<table bgcolor="#99CBFF" border="0" width="100%">
<tbody>
<tr>
<td align="center" width="100%">
<h1><i>Sphinx-4</i><br><font size="+1">
A speech recognizer written entirely in the
Java<sup><font size="-1">TM</font></sup>
programming language</font></h1>
</td>
</tr>
</tbody>
</table>
</center>
<font face="Arial" size="2">
<table border="0" width="100%">
<tbody>
<tr>
<td bgcolor="#F0F8FF" valign="top" width="20%">
<br>
<b><font>Sphinx-4 Links</font></b>
<p>
<font size="-1">
SourceForge
<li><a href="http://sourceforge.net/projects/cmusphinx">Project Page</a>
<li><a href="https://sourceforge.net/forum/?group_id=1904">Forums</a>
<li><a href="http://sourceforge.net/project/showfiles.php?group_id=1904&package_id=117949">Download</a>
<li><a href="http://sourceforge.net/cvs/?group_id=1904">CVS Repository</a>
<p>
<a href="http://cmusphinx.org">CMU Sphinx</a>
</p>
<p>
<a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/index.html">Sphinx-4 Javadocs</a>
</p>
<hr>
<b>ZipCity</b> -
A demonstration of Sphinx-4 using Java Web Start technology. <a
href="demo/sphinx/zipcity/README.html"> Read more</a> about
the ZipCity demo, or <a
href="http://cmusphinx.sourceforge.net/sphinx4/zipcity/zipcity.jnlp"> Try it </a>. <p>
<a href="http://cmusphinx.sourceforge.net/sphinx4/zipcity/zipcity.jnlp"><img src="doc-files/zipcity.gif"/></a>
<hr>
<center><a href="http://www.sourceforge.net">
<img src="http://sourceforge.net/sflogo.php?group_id=1904&type=1"
width="88" height="31" border="0" alt="SourceForge Logo"></a>
<br>Hosted by SourceForge.net</center>
</font></b></td>
<td width="5%"><br>
</td>
<td valign="top"><br>
<h3>General Information</h3>
<ul>
<li><a href="#what_is_sphinx4">Introduction</a></li>
<li><a href="#capabilities">Capabilities</a></li>
<li><a href="#speed_and_accuracy">Performance</a></li>
</ul>
<h3>Installation</h3>
<ul>
<li><a href="#download_and_install">Required Software</a></li>
<li><a href="#source">Downloading Sphinx-4</a></li>
<li><a href="#how_build">Building Sphinx-4</a></li>
<li><a href="#create_javadocs">Creating Javadocs</a></li>
<li><a href="#faq">Troubleshooting</a></li>
</ul>
<h3>Demos</h3>
<ul>
<li><a href="#demos">Demos</a></li>
<li><a href="#demos_faq">Troubleshooting</a></li>
</ul>
<h3>Sphinx-4 in Detail</h3>
<ul>
<li><a href="#sphinx_properties">Understanding Sphinx-4
Configuration Management</a></li>
<li><a href="#sphinx_instrumentation">Understanding Sphinx-4
Instrumentation </a></li>
<li><a href="javadoc/edu/cmu/sphinx/frontend/doc-files/FrontEndFAQ.html">Front End</a></li>
<ul>
<li><a href="javadoc/edu/cmu/sphinx/frontend/doc-files/FrontEndConfiguration.html">Configuration</a></li>
<li><a href="javadoc/edu/cmu/sphinx/frontend/doc-files/FrontEndFAQ.html#create_cepstra">Creating spectrum/cepstrum</a></li>
<li><a href="javadoc/edu/cmu/sphinx/frontend/doc-files/FrontEndFAQ.html#decode_cepstra">Decoding cepstra</a></li>
<li><a href="javadoc/edu/cmu/sphinx/frontend/doc-files/FrontEndFAQ.html#enable_endpointer">Enabling the endpointer</a></li>
</ul>
<li><a href="#batch_tests">Running the Regression Tests</a></li>
<li><a href="#setup_test">Setting up a Regression Test</a>
<ul>
<li><a href="#batch_files">Batch Files</a>
<li><a href="#input_files">Input Audio/Cepstral Files</a>
<li><a href="#an4_walkthrough">Example: Setting up AN4 tests</a>
</ul>
</li>
<li><a href="#acoustic_models">Acoustic Model Package</a></li>
<li><a href="#language_models">Creating Language Models</a></li>
<li><a href="#bnf_grammars">BNF Style Grammars</a></li>
<li><a href="#architecture_and_api1">Architecture and API</a></li>
<li><a href="doc/ProgrammersGuide.html">Programmer's Guide</a></li>
</ul>
<p>
<br>
<font size="-1">
<b>NOTE</b>: This page contains links to javadocs that need to be
created. If the links are broken, you should either follow instructions
in <a href="#create_javadocs">Creating Javadocs</a>, or view this page
online at <a href="http://cmusphinx.sourceforge.net/sphinx4/">
http://cmusphinx.sourceforge.net/sphinx4/</a>.
</font>
<p>
</td>
</tr>
</tbody>
</table>
<hr>
<h2>General Information about Sphinx-4</h2>
<ul>
<li><a name="what_is_sphinx4"><b>Introduction</b></a>
<p>Sphinx-4 is a state-of-the-art speech recognition system
written entirely in the Java<sup><font size="-1">TM</font></sup>
programming language.
It was created via a joint collaboration between the Sphinx group
at Carnegie Mellon University, Sun Microsystems Laboratories,
Mitsubishi Electric Research Labs (MERL), and Hewlett Packard (HP),
with contributions from the University of California at Santa
Cruz (UCSC) and the Massachusetts Institute of Technology
(MIT).
<p>Sphinx-4 started out as a port of Sphinx-3 to the
Java(TM) programming language, but evolved into a recognizer
designed to be much more flexible than Sphinx-3, thus
becoming an excellent platform for speech research.
</li>
<br>
<li><a name="capabilities"><b>Capabilities</b></a>
<p>Live mode and batch mode speech recognizers, capable of recognizing
discrete and continuous speech.
<p>Generalized pluggable <a href="./javadoc/edu/cmu/sphinx/frontend/package-summary.html"><b>front end</b></a>
architecture.
Includes pluggable implementations of
<a href="./javadoc/edu/cmu/sphinx/frontend/filter/Preemphasizer.html">preemphasis</a>,
<a href="./javadoc/edu/cmu/sphinx/frontend/window/RaisedCosineWindower.html">Hamming window</a>,
<a href="./javadoc/edu/cmu/sphinx/frontend/transform/DiscreteFourierTransform.html">FFT</a>,
<a href="./javadoc/edu/cmu/sphinx/frontend/frequencywarp/MelFrequencyFilterBank.html">Mel frequency filter bank</a>,
<a href="./javadoc/edu/cmu/sphinx/frontend/transform/DiscreteCosineTransform.html">discrete cosine transform</a>,
<a href="./javadoc/edu/cmu/sphinx/frontend/feature/BatchCMN.html">cepstral mean normalization</a>, and
<a href="./javadoc/edu/cmu/sphinx/frontend/feature/DeltasFeatureExtractor.html">feature extraction</a> of cepstra,
delta cepstra, double delta cepstra features.
<p>Generalized pluggable <b>language model</b> architecture. Includes
pluggable language model support for
<a href="./javadoc/edu/cmu/sphinx/linguist/language/ngram/SimpleNGramModel.html">ASCII</a> and
<a href="./javadoc/edu/cmu/sphinx/linguist/language/ngram/large/LargeTrigramModel.html">binary</a>
versions of unigram, bigram, trigram,
<a href="./javadoc/edu/cmu/sphinx/jsapi/JSGFGrammar.html">Java Speech API Grammar Format (JSGF)</a>, and
<a href="./javadoc/edu/cmu/sphinx/linguist/language/grammar/FSTGrammar.html">ARPA-format FST grammars</a>.
<p>Generalized <a href="./javadoc/edu/cmu/sphinx/linguist/acoustic/package-summary.html"><b>acoustic model</b></a>
architecture. Includes pluggable support for
<a href="./javadoc/edu/cmu/sphinx/linguist/acoustic/tiedstate/Sphinx3Loader.html">Sphinx-3 acoustic models</a>.
<p>Generalized <a href="./javadoc/edu/cmu/sphinx/decoder/search/package-summary.html"><b>search management</b></a>.
Includes pluggable support for
<a href="./javadoc/edu/cmu/sphinx/decoder/search/SimpleBreadthFirstSearchManager.html">breadth first</a> and
<a href="./javadoc/edu/cmu/sphinx/decoder/search/WordPruningBreadthFirstSearchManager.html">word pruning</a> searches.
<p>Speech tools. Includes tools for
<a href="./javadoc/edu/cmu/sphinx/tools/audio/package-summary.html">displaying waveforms and spectrograms</a> and
<a href="./javadoc/edu/cmu/sphinx/tools/feature/package-summary.html">generating features from audio</a>.
<p>
(<b>NOTE:</b> The links in this section point to local files created by
javadoc. If they are broken, please follow the instructions on
<a href="#create_javadocs">Creating Javadocs</a> to create these links.)
</li>
<br>
<li><a name="speed_and_accuracy"><b>Performance</b></a>
<p>Sphinx-4 is a very flexible system capable of performing many
different types of recognition tasks. As such, it is
difficult to characterize the performance and accuracy of
Sphinx-4 with just a few simple numbers such as speed and
accuracy. Instead, we regularly run regression tests on
Sphinx-4 to determine how it performs under a variety of
tasks. These tasks and their latest results are as follows
(each task is progressively more difficult than the previous
task):
<ul>
<li><a href="http://cmusphinx.sourceforge.net/IsolatedDigitsResults.html">
Isolated Digits (TI46)</a>: Runs Sphinx-4 with pre-recorded
test data to gather performance metrics for recognizing
just one word at a time. The vocabulary is merely the
spoken digits from 0 through 9, with a single utterance
containing just one digit.
<br><i>(TI46 refers to the "NIST CD-ROM Version of the Texas
Instruments-developed 46-Word Speaker-Dependent Isolated Word
Speech Database".)</i>
</li>
<br>
<li><a href="http://cmusphinx.sourceforge.net/ConnectedDigitsResults.html">
Connected Digits (TIDIGITS)</a>: Extends the Isolated
Digits test to recognize more than one word at a time
(i.e., continuous speech). The vocabulary is merely the
spoken digits from 0 through 9, with a single utterance
containing a sequence of digits.
<br><i>(TIDIGITS refers to the "NIST CD-ROM Version of the Texas
Instruments-developed Studio Quality Speaker-Independent
Connected-Digit Corpus".)</i>
</li>
<br>
<li><a href="http://cmusphinx.sourceforge.net/SmallVocabResults.html">
Small Vocabulary (AN4)</a>: Extends the vocabulary
to approximately 100 words, with input data ranging
from speaking words as well as spelling words out
letter by letter.
</li>
<br>
<li><a href="http://cmusphinx.sourceforge.net/MediumVocabResults.html">
Medium Vocabulary (RM1)</a>: Extends the vocabulary
to approximately 1,000 words.
</li>
<br>
<li><a href="http://cmusphinx.sourceforge.net/MediumVocabResults.html">
Medium Vocabulary (WSJ5K)</a>: Extends the vocabulary
to approximately 5,000 words.
</li>
<br>
<li><a href="http://cmusphinx.sourceforge.net/LargeVocabResults.html">
Medium Vocabulary (WSJ20K)</a>: Extends the vocabulary
to approximately 20,000 words.
</li>
<br>
<li><a href="http://cmusphinx.sourceforge.net/LargeVocabResults.html">
Large Vocabulary (HUB4)</a>: Extends the vocabulary
to approximately 64,000 words.
</li>
</ul>
</li>
<p>
The following table compares the performance of Sphinx 3.3 with Sphinx-4.
<p>
<table border="1" cellspacing="0" cellpadding="1"> <tr><th bgcolor="#e0e8FF">
<strong>Test</strong> </th><th bgcolor="#e0e8FF"> <strong>S3.3 WER</strong> </th><th
bgcolor="#e0e8FF"> <strong>S4 WER</strong> </th><th bgcolor="#e0e8FF"> <strong>S3.3 RT</strong>
</th><th bgcolor="#e0e8ff"> <strong>S4 RT(1)</strong> </th><th bgcolor="#e0e8ff"> <strong>S4 RT
(2)</strong> </th><th bgcolor="#e0e8ff"> <strong>Vocabulary Size</strong> </th><th
bgcolor="#e0e8ff"> <strong>Language Model</strong> </th></tr>
<tr><th bgcolor="#F0F8FF"> <strong>TI46</strong> </th><td align="right"> 1.217 </td><td
align="right"> 0.168 </td><td align="right"> 0.14 </td><td align="right"> .03 </td><td
align="right"> .02 </td><td align="right"> 11 </td><td> isolated digits recognition
</td></tr>
<tr><th bgcolor="#F0F8FF"> <strong>TIDIGITS</strong> </th><td align="right"> 0.661 </td><td
align="right"> 0.549 </td><td align="right"> 0.16 </td><td align="right"> 0.07 </td><td
align="right"> 0.05 </td><td align="right"> 11 </td><td> continuous digits </td></tr>
<tr><th bgcolor="#F0F8FF"> <strong>AN4</strong> </th><td align="right"> 1.300 </td><td
align="right"> 1.192 </td><td align="right"> 0.38 </td><td align="right"> 0.25 </td><td
align="right"> 0.20 </td><td align="right"> 79 </td><td> trigram </td></tr>
<tr><th bgcolor="#F0F8FF"> <strong>RM1</strong> </th><td align="right"> 2.746 </td><td
align="right"> 2.739 </td><td align="right"> 0.50 </td><td align="right"> 0.50 </td><td
align="right"> 0.40 </td><td align="right"> 1,000 </td><td> trigram </td></tr>
<tr><th bgcolor="#F0F8FF"> <strong>WSJ5K</strong> </th><td align="right"> 7.323 </td><td
align="right"> 7.174 </td><td align="right"> 1.36 </td><td align="right"> 1.22
</td><td align="right"> 0.96 </td><td align="right"> 5,000 </td><td> trigram </td></tr>
<tr><th bgcolor="#F0F8FF"> <strong>HUB4</strong> </th><td align="right"> 18.845 </td><td
align="right"> 18.878 </td><td align="right"> 3.06 </td><td align="right"> ~4.4
</td><td align="right"> 3.8 </td><td align="right"> 60,000 </td><td> trigram </td></tr>
</table>
<p />
Note that performance work on the HUB4 test is not complete
<p />
<p />
Key:
<ul>
<li> <strong>WER</strong> - Word error rate (%) (lower is better)
</li>
<li> <strong>RT</strong> - Real Time - Ratio of processing time to audio time - (lower is better)
</li>
<li> <strong>S3.3 RT</strong> - Results for a single or dual CPU configuration
</li>
<li> <strong>S4 RT(1)</strong> - Results on a single-CPU configuration
</li>
<li> <strong>S4 RT(2)</strong> - Results for a dual-CPU configuration
</li>
</ul>
<p>
This data was collected on a dual CPU UltraSPARC-III running at 1015 MHz with 2G of memory.
</ul>
<hr>
<h2>Installation</h2>
<ul>
<li><a name="download_and_install"><b>Required Software</b></a>
<p>Sphinx-4 has been built and tested on the Solaris<sup> <font
size="-1">TM</font></sup> Operating Environment, Mac OS X, Linux
and Win32 operating systems. Running, building, and testing
Sphinx-4 requires additional software. Before you start,
you will need the following software available on your machine.</p>
<ul>
<li><b>Java 2 SDK, Standard Edition, v1.4</b> or better.
Go to <a href="http://java.sun.com">java.sun.com</a>,
and select "J2SE". At the time of writing, the latest release
version is 1.4.2, which is the one we recommend.
</li>
<br>
<li><b>Ant 1.6.0</b> or better, available at <a
href="http://ant.apache.org">ant.apache.org</a>. The site has a manual
with instructions on how to download, install, and use ant.
The gist of it, just to get started:
<ul>
<li> On <a href="http://ant.apache.org">ant.apache.org</a> click on
"binary distributions" under "Download", on the left-hand side.
</li>
<li> Go to the title "Current release of ant", and select the
archive file of your preference. The most common formats are
"tar.gz" and "zip". Clicking on any of them should start downloading
the file to your machine.
</li>
<li> Save the file locally, and extract the files from the
archive. On Windows machines, clicking on the archive file should
start "WinZip". On Unix/Linux machines, depending on which file you
chose, you'll need to "unzip" or "gunzip" and subsequently "tar xf"
the file.
</li>
<li> After extracting the files from the archive, you'll get a
directory named something like "apache-ant-1.6.0". Rename this
directory to "ant", and move it to a convenient location. On Windows
machines, this location is usually "c:\ant". On Unix/linux, this
location is usually "/usr/local/ant".
</li>
<li> Define variables to support ant and Java.
This will depend on your platform (assuming that your Java version
is 1.4.2):
<ul>
<li> Windows
<pre>
set JAVA_HOME=c:\j2sdk1.4.2
set ANT_HOME=c:\ant
set PATH=%PATH%;%ANT_HOME%\bin;%JAVA_HOME%\bin
</pre>
</li>
<li> Unix (bash)
<pre>
export JAVA_HOME=/usr/java/j2sdk1.4.2
export ANT_HOME=/usr/local/ant
export PATH=${PATH}:${ANT_HOME}/bin:${JAVA_HOME}/bin
</pre>
</li>
<li> Unix (csh)
<pre>
setenv JAVA_HOME /usr/java/j2sdk1.4.2
setenv ANT_HOME /usr/local/ant
setenv PATH ${PATH}:${ANT_HOME}/bin:${JAVA_HOME}/bin
</pre>
</li>
</ul>
</li>
<li> Installation of ant is complete.</li>
</ul>
</li>
<br>
<li> <b>CVS</b> and <b>SSH</b>, but only if you want to interact directly
with the cvs tree (which we recommend). The canonical places to get them
are <a href="http://www.cvshome.org">www.cvshome.org</a> and
<a href="http://openssh.org">openssh.org</a>. If you are using Windows,
your best choice is to install <a
href="http://cygwin.com">cygwin</a>, which will give you a
linux-like environment in a command prompt window. Make sure to
choose "ssh" and "cvs" when you install cygwin.
</li>
</ul>
</li>
<br>
<li>
<a name="source"><b>Downloading Sphinx-4</b></a>
<br>
<ul>
<li>
<b>Instructions for retrieving code from a release package.</b>
<p>
Sphinx-4 has two packages available for <a href="http://sourceforge.net/project/showfiles.php?group_id=1904&package_id=117949">download</a>:
<ul>
<li>
<b>sphinx4-{version}-bin.zip</b>:
provides the jar files, documentation, and demos
</li>
<li>
<b>sphinx4-{version}-src.zip</b>:
provides the sources, documentation, demos, unit tests
and regression tests.
</li>
</ul>
</p>
<p>
After you have downloaded these files, unjar the ZIP files using the
<code>jar</code> command which is in the <code>bin</code> directory of
your Java installation:
<pre>
jar xvf sphinx4-{version}-bin.zip
jar xvf sphinx4-{version}-src.zip</pre>
</p>
<p>
For both downloads, a directory called "sphinx4-{version}" will be
created.
</p>
<p>
There are also the RM1 acoustic model, and HUB4 acoustic and language
models, available for download at the same location on SourceForge.
Download them only if you want to run the regression tests for RM1 and
HUB4.
</p>
</li>
<li><a name="cvs"><b>Instructions for retrieving code from the cvs trees</b></a>
<p>If you want to be able to get the latest updates from the CVS source
tree, you should retrieve the code from the CVS source tree on
SourceForge. The Sphinx-4 code is located at
<a href="http://sourceforge.net/projects/cmusphinx">sourceforge.net
</a> as open source. Please follow the instructions below to retrieve
it.</p>
<ul>
<li>Make sure that you set the environment variable CVS_RSH to
ssh. See the <a href="#faq">troubleshooting</a> section for more
details.
</li>
<li>Get the code from sourceforge.net. If you are a developer in the
cmusphinx project, then do (assuming you use bash shell):
<pre>
% export CVS_RSH=ssh
% cvs -z3 -d:ext:<i>developername</i>@cvs.sourceforge.net:/cvsroot/cmusphinx co sphinx4
</pre>
where <i>developername</i> is your sourceforge developer name.
</li>
<li>If you are not a developer, you have to get the code
anonymously. When prompted for a password, simply hit
<ENTER>:
<pre>
% cvs -d:pserver:[email protected]:/cvsroot/cmusphinx login
% cvs -z3 -d:pserver:[email protected]:/cvsroot/cmusphinx co sphinx4
</pre>
</li>
</ul>
</li>
</ul>
</li>
<br>
<li><a name="how_build"><b>Building Sphinx-4</b></a>
<p>
Since the sphinx4-{version}-bin.zip distribution does not contain
the source code, you must download the sphinx4-{version}-src.zip, or
retrieved the code from SourceForge using CVS, in order to be able to
build from the sources. The software required for building Sphinx-4 are
listed in the <a href="#download_and_install">Required Software</a>
section.
</p>
<p><b>Setup JSAPI 1.0</b></p>
<p>
Before you build Sphinx-4, it is important to
<a href="doc/jsapi_setup.html">setup your JSAPI environment</a>,
because a number of tests and demos rely on having JSAPI
installed.
</p>
<p>
To build Sphinx-4, at the command prompt change to
the directory where you installed Sphinx-4 (usually, a simple
"cd sphinx4" will do). Set your <code>JAVA_HOME</code>,
<code>ANT_HOME</code> and <code>PATH</code> environment variables
as described above. Then type the following:
</p>
<pre>ant</pre>
<p>
This executes the <a href="http://ant.apache.org/">Apache Ant</a>
command to build the Sphinx-4 classes under the <code>bld</code>
directory, the jar files under the <code>lib</code> directory,
and the demo jar files under the <code>bin</code> directory.
</p>
<p>
To delete all the output from the build to give you a fresh start:
<pre>ant clean</pre>
</li>
<br>
<li><a name="create_javadocs"><b>Creating Javadocs</b></a>
<p>
The javadocs have already been built if you downloaded the
sphinx4-{version}-bin.zip. In order to build the javadocs yourself, you must
download the sphinx4-{version}-src.zip distribution instead. To build the
javadocs, go to the top level directory ("sphinx4-{version}"), and type:
</p>
<pre>ant javadoc</pre>
<p>This will build javadocs from public classes, displaying only the public methods and fields. In general, this is all the information you will need. If you need more details, such as private or protected classes, you can generate the corresponding javadoc by doing, for example:
</p>
<pre>ant -Daccess=private javadoc</pre>
</li>
<br>
<li><a name="faq"><b>Troubleshooting/FAQ</b></a><br>
<ol>
<li>How can I specify the location of the Java JDK?
<ul>
<li>Using the JAVA_HOME environment variable. For the different
versions of OS, do:
<ul>
<li>
Unix (t)csh: <code>setenv JAVA_HOME /lab/speech/java/j2sdk1.4.0</code>
</li>
<li>
Unix (ba)sh: <code>export JAVA_HOME='/lab/speech/java/jdk1.4.1_01'</code>
</li>
<li>
Windows/cygwin: <code>export JAVA_HOME='c:/Progra~1/J2SDK_Forte/jdk1.4.0'</code>
</li>
</ul>
</li>
</ul>
</li>
<br>
<li>How can I update the code in the current working copy that I have?
<ul>
<li>If you are using CVS, you can do a "cvs update -d". The
switch "-d" will also get new directories or files that have
been added to the cvs module since the last time you
downloaded.</li>
</ul>
</li>
<br>
<li>I always get a "timeout" message when I try to download/update
the cvs tree.
<ul>
<li>Make sure you defined the CVS_RSH environment variable as
'ssh'. If ssh is not on your path, make sure you specify the
full path. For the different versions of OS, do:
<ul>
<li>
Unix (t)csh: <code>setenv CVS_RSH ssh</code>
</li>
<li>
Unix (ba)sh: <code>export CVS_RSH='/usr/local/bin/ssh'</code>
</li>
<li>
Windows/cygwin: <code>export CVS_RSH='ssh'</code>
</li>
</ul>
</li>
</ul>
</li>
<br>
<li><a name="cygwin">What are the limitations/hacks if I am using
cygwin?</a></li>
<ul>
<li>Make sure that you install "ssh" and "cvs" when you install
cygwin. If you do not want to go through the list of packages
trying to decide what you need, installing everything is a good
solution.
</li>
<li>Cygwin does not define a home directory, which is used by
cvs. You have to explicitly define it. You are better off creating
an environment variable HOME, which will be used by every cygwin
window that you open. Follow the instructions below to create this
variable on Windows 2000.
<ul>
<li>Assume your userid is "johndoe", and you created a
directory named "/home/johndoe" in the cygwin environment
</li>
<li>Right click on the "My Computer" icon
</li>
<li>On the menu, click on "Properties"
</li>
<li>Click the "Advanced" tab
</li>
<li>Select "Environment Variables"
</li>
<li>Click "New" on the "User variables for johndoe" box
</li>
<li>On "Variable Name", type "HOME"
</li>
<li>On "Variable Value", type "c:\cygwin\home\johndoe"
</li>
</ul>
When you start a new cygwin window, if you type "echo $HOME", you
should get "/home/johndoe".
</li>
<li>Handling absolute paths: Cygwin interprets directory paths in
a unix-like way, relative to its own "root" directory (which
cygwin sees as "/" but Windows sees as "c:\cygwin"). However, the
Java platform interprets directory paths via the operating
system. Therefore, absolute paths, such as "/lab/speech", will be
interpreted differently by cygwin and by the Java platform. One
workaround is to keep the absolute path in Windows (e.g.
"c:\lab\") and create a link, in cygwin, from that location (such
as "ln -s /cygdrive/c/lab /lab"). If you use absolute paths to
identify data location and the directories are not setup this way,
you may encounter error messages such as:
</li>
<pre>
d:\work\sphinx4>ant clean
/c: Can't open /c: No such file or directory
</pre>
<li>Handling DOS 8.3 filename format: Since cygwin deals with unix
and DOS filename formats at the same time, filenames in different
files (e.g., build.xml, .props file) have different
requirements. Paths interpreted directly by cygwin can be as long
as you want. Paths passed to the program, though, conform to the
8.3 convention of DOS filenames. Therefore you have to define the
path to, say, junit as something like
/lab/speech/ThirdP~1/junit3.7, since this string is passed from
the environment to the Java code to the OS. </li>
</ul>
</ol>
</ul>
<hr>
<h2><a name="demos">Demos</a></h2>
<p>
Sphinx-4 contains a number of demo programs. If you downloaded the
binary distribution (sphinx4-{version}-bin.zip), the JAR files of the demos
are already built, so you can just run them directly. However, if you
downloaded the source distribution (sphinx4-{version}-src.zip or via CVS),
you need to build the demos. Click on the links below for instructions on
how to build and run the demos.
<ul>
<li>
<a href="demo/sphinx/helloworld/README.html">Hello World Demo</a>: a command line application that recognizes simple phrases
</li>
<li>
<a href="demo/sphinx/hellodigits/README.html">Hello Digits Demo</a>: a command line application that recognizes connected digits
</li>
<li>
<a href="demo/sphinx/hellongram/README.html">Hello N-Gram Demo</a>: a command line application using an N-gram language model for speech recognition
</li>
<li>
<a href="demo/sphinx/zipcity/README.html">ZipCity Demo</a>: a
Java Web Start technology application that recognizes spoken zip
codes and locates the associated city and state.
</li>
</ul>
</p>
<p>
There is also a <a href="tests/live/README.html">live-mode test program</a>
(this link only works if you downloaded the source distribution),
which is available if you download the sphinx-src-{version}.zip file
but not available in the sphinx-bin-{version}.zip file.
</p>
<p>
The <a href="javadoc/edu/cmu/sphinx/tools/audio/doc-files/HowToRunAudioTool.html">AudioTool</a> is a visual tool that records and displays the waveform
and spectrogram of an audio signal. It is available in both the binary
and source releases.
</p>
</li>
<h2><a name="demos_faq">Troubleshooting</a></h2>
<ol>
<li>Why do the demos not work on my Linux machine?
<ul>
<li>
There seems to be a significant difference in how different
versions of the JDK determine which audio resources are
available on Linux. This difference seems to affect different
machines in different ways. We are working with the Java Sound
folks to get to the root cause of the problem. In the mean
time, if you are having trouble getting the demos to work on
your Linux box try the following:
<ul>
<li> Try a native sound recording application (such as
gnome-sound-recorder) to ensure that you can actually capture
audio on your system.
</li>
<li> Try the <a
href="javadoc/edu/cmu/sphinx/tools/audio/doc-files/HowToRunAudioTool.html">AudioTool</a>
demo to see if you can record audio from a Java application.
</li>
<li>
Check to see if any sound daemons like esd, gstreamer or
artsd are running. These daemons may have exclusive access to the
sound device. If any of these are running, kill them
and try running again.
</li>
<li>
Try switching to another version of the JDK. If JDK 1.4 doesn't
work, try 1.5 and vice versa.
</li>
</ul>
</li>
</ol>
<hr>
<h2>Sphinx-4 in Detail</h2>
<ul>
<li><a name="sphinx_properties"><b>Understanding Sphinx-4
Configuration Management </b><p>
</a> The document <a
href="javadoc/edu/cmu/sphinx/util/props/doc-files/ConfigurationManagement.html"> Sphinx-4
Configuration Management</a> describes, in detail, how to configure a
Sphinx-4 system.
<br>
<li><a name="sphinx_instrumentation"><b>Understanding Sphinx-4
Instrumentation </b><p>
</a> The document <a
href="javadoc/edu/cmu/sphinx/instrumentation/doc-files/Instrumentation.html">
Sphinx-4 Instrumentation </a> describes, in detail, how to use the
instrumentation facilities of the Sphinx-4 system.
<br>
<li><a name="batch_tests"><b>Running the Regression Tests</b></a>
<p>
Sphinx-4 contains a number of regression tests using common speech databases.
Again, you have to download the source distribution or downloaded the
source tree using CVS in order to get the regression tests directory.
The regression tests we have are:
</p>
<ul>
<li><a href="#isolated_digits_test">Isolated Digits - TI46</a></li>
<li><a href="#connected_digits_test">Connected Digits - TIDIGITS</a></li>
<li><a href="#small_vocab_test">Small Vocabulary - AN4</a></li>
<li><a href="#medium_vocab_test">Medium Vocabulary - RM1</a></li>
<li><a href="#large_vocab_test">Large Vocabulary - HUB4</a></li>
</ul>
<p>
Before you run any of the tests, make sure that you have built Sphinx-4
already. To do so, go to the top level and type:
<pre>ant</pre>
</p>
<p>
You also need to make sure you have the appropriate acoustic model(s)
installed. More details below.
</p>
<p>
The Sphinx-4 regression tests have different directories for the
different tasks. The directory sphinx4/tests/performance contains
directories named ti46, tidigits, an4, rm1, hub4, and some other tests.
Each of these directories contains a build.xml with targets specific
to the particular task. The build.xml allows you to run a number of
different tests. Type:
<pre>ant -projecthelp</pre>
to list a help text with the possible targets.
</p>
<p>
<a name="isolated_digits_test"><br><b>Isolated Digits - TI46</b></a>
</p>
<p>
The TIDIGITS models are already included as part of the distribution.
Therefore, you do not need to download them separately.
You must have the TI46 test data, available from the
<a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S9">LDC TI46</a> website.
</p>
<p>
You need to edit the batch file called <code>ti46.batch</code>,
located in <code>tests/performance/ti46</code> directory.
You will need to change it such that
it matches where you stored the TI46 test files. Refer to the section
<a href="#batch_files">Batch Files</a> for detail about the format of
batch files.
</p>
<p>
To run the tests:
</p>
<pre>
% cd sphinx4/tests/performance/ti46
% ant -projecthelp # to see a list of possible targets
% ant ti46_wordlist
</pre>
<p>
<a name="connected_digits_test"><br><b>Connected Digits - TIDIGITS</b></a>
</p>
<p>
The TIDIGITS models are already included as part of the distribution.
Therefore, you do not need to download them separately.
<p>
You must have the TIDIGITS test data, available from the
<a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S10">LDC TIDIGITS</a> website.
<p>
You need to edit the batch file called <code>tidigits.batch</code>,
located in the <code>tests/performance/tidigits</code> directory.
You will need to change it such that
it matches where you stored the TIDIGITS test files. Refer to the section
<a href="#batch_files">Batch Files</a> for detail about the format of
batch files.
<p>
To run the tests:
<pre>
% cd sphinx4/tests/performance/tidigits
% ant -projecthelp # to see a list of possible targets
% ant tidigits_flat_unigram
</pre>
<p>
<a name="small_vocab_test"><br><b>Small Vocabulary - AN4</b></a>
<p>
The Wall Street Journal (WSJ) models are already included as part of
the distribution. Therefore, you do not need to download them separately.
<p>
Download the big endian raw audio format of the
<a href="http://www.speech.cs.cmu.edu/databases/an4/">AN4 Database</a>.
Unpack it at a directory of your choice:
<pre>
% gunzip an4_raw.bigendian.tar.gz
% tar -xvf an4_raw.bigendian.tar
</pre>
<p>
Then update the following batch files (located in the
<code>tests/performance/an4</code>
directory), so that they match up with where you unpacked the AN4 data.
You probably just need to replace all instances of the string
<code>"/lab/speech/sphinx4/data"</code> inside these batch files.
Please refer to the <a href="#batch_files">Batch Files</a> section for
details about batch files:
<p><code>an4_full.batch<br>an4_spelling.batch<br>an4_words.batch</code>
<p>
After you have updated the batch files, you can run the tests by:
<pre>
% cd sphinx4/tests/performance/an4
% ant -projecthelp # to see a list of possible targets
% ant an4_words_unigram
</pre>
<p>
<a name="medium_vocab_test"><br><b>Medium Vocabulary - RM1</b></a>
<p>
Make sure that you have downloaded the binary RM1 model file, called
<code>RM1_13dCep_16k_40mel_130Hz_6800Hz.jar</code>, located at the
<code>sphinx4</code> package in the <a href="http://sourceforge.net/project/showfiles.php?group_id=1904&package_id=117949">downloads page</a>.
<br>
Then in the build file for the RM1 tests,
<code>sphinx4/tests/performance/rm1/build.xml</code>,
changed the <code>classpath</code> property of the build file to point to
the location of your <code>RM1_13dCep_16k_40mel_130Hz_6800Hz.jar</code>.
<p>
You must have the RM1 test data, available from the
<a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S3B">LDC RM1</a> website.
<p>
You also need to prepare a batch file called <code>rm1.batch</code>,
by following instructions in the <a href="#batch_files">Batch Files</a>
section. There is already one in the RM1 test directory, but it will
not work for you, since the paths to test files will not match your setup.
<p>
To run the tests:
<pre>
% cd sphinx4/tests/performance/rm1
% ant -projecthelp # to see a list of possible targets
% ant rm1_bigram
</pre>
<p>
<a name="large_vocab_test"><br><b>Large Vocabulary - HUB4</b></a>
<p>
You must have the HUB4 test data, available from the
<a href="http://wave.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2000S88">LDC HUB4</a> website.
</p>
<p>
You must download the binary HUB4 model file, called
<code>HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.jar</code>, and the
binary HUB4 trigram language model, called <code>HUB4_trigram_lm.zip</code>,
both located at the <code>sphinx4</code> package in the
<a href="http://sourceforge.net/project/showfiles.php?group_id=1904&package_id=117949">
downloads page</a>. For the trigram language model file, unpack it by:
<pre>jar xvf HUB4_trigram_lm.zip</pre>
The trigram model file is called <code>language_model.arpaformat.DMP</code>.
Then, in the build file for the HUB4 tests,
<code>sphinx4/tests/performance/hub4/build.xml</code>,
changed the <code>classpath</code> property of the build file to
point to the location of your
<code>HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.jar</code>.
In the configuration file,
<code>tests/performance/hub4/hub4.config.xml</code>, change the 'location'
of the 'trigramModel' component to where your
<code>language_model.arpaformat.DMP</code>
file is located.
</p>
<p>
You also need to prepare a batch file, which is currently called
<code>f0_hub4.batch</code> in the build.xml file, by following instructions
in the <a href="#batch_files">Batch Files</a> section.
</p>
<p>
To run the test:
<pre>
% cd sphinx4/tests/performance/hub4
% ant -projecthelp # to see a list of possible targets
% ant hub4_trigram
</pre>
<p>
</li>
<br>
<li><a name="setup_test"><b>Setting up a Regression Test</b></a>