voice2json profiles

Speech models and supporting files for voice2json.

Data

Files are contained in <LANGUAGE>/<LOCALE> directories. Each locale directory should contain a SOURCE file describing where it was sourced from. The LICENSE file in each locale directory covers the artifacts for that specific profile.

Directories with pocketsphinx contain CMU Sphinx acoustic models
Directories with kaldi contain Kaldi acoustic models (either gmm or nnet3).
Directories with deepspeech contain Mozilla DeepSpeech acoustic models (version 0.6).
Directories with julius contain Julius acoustic models (DNN, version 4.5).

Some files are split into multiple parts so that they can be uploaded to GitHub. This is done with the split command:

split -d -b 25M FILE FILE.part-

They can be recombined simply with:

cat FILE.part-* > FILE

Supported Languages

voice2json supports the following languages/locales. I don't speak or write any language besides U.S. English very well, so please let me know if any profile is broken or could be improved!

Untested profiles (highlighted below) may work, but I don't have the necessary data or enough understanding of the language to test them.

		Language	Locale	System	Closed	Open
View	Download	Catalan	ca-es	pocketsphinx	UNTESTED	UNTESTED
View	Download	Czech	cs-cz	Kaldi	UNTESTED	UNTESTED
View	Download	Dutch (Nederlands)	nl	kaldi	★ ★ ★ ★ ★ (2x)	☹ (1x)
View	Download	Dutch (Nederlands)	nl	pocketsphinx	★ ★ ★ ★ (18x)	☹ (3x)
View	Download	English	en-in	pocketsphinx	☹ (4x)	☹ (4x)
View	Download	English	en-us	deepspeech	★ ★ ★ ★ ★ (1x)	★ ★ ★ ★ (1x)
View	Download	English	en-us	julius	★ ★ ★ ★ (1x)	UNTESTED
View	Download	English	en-us	kaldi	★ ★ ★ ★ ★ (3x)	★ ★ ★ ★ (1x)
View	Download	English	en-us	pocketsphinx	★ ★ ★ ★ ★ (9x)	★ ★ ★ ★ (2x)
View	Download	French (Français)	fr	kaldi	★ ★ ★ ★ (4x)	★ ★ ★ ★ (1x)
View	Download	French (Français)	fr	kaldi	★ ★ ★ ★ ★ (3x)	★ ★ ★ ★ ★ (0.5x)
View	Download	French (Français)	fr	pocketsphinx	★ ★ ★ ★ (23x)	☹ (3x)
View	Download	German (Deutsch)	de	pocketsphinx	★ ★ ★ ★ ★ (17x)	★ ★ ★ ★ ★ (3x)
View	Download	German (Deutsch)	de-DE	deepspeech	★ ★ ★ ★ ★ (1x)	★ ★ ★ ★ (1x)
View	Download	German (Deutsch)	de-DE	kaldi	★ ★ ★ ★ ★ (4x)	★ ★ ★ ★ (1x)
View	Download	Greek (Ελληνικά)	el-gr	pocketsphinx	★ ★ ★ ★ ★ (15x)	☹ (1x)
View	Download	Hindi (Devanagari)	hi	pocketsphinx	UNTESTED	UNTESTED
View	Download	Italian (Italiano)	it	pocketsphinx	★ ★ ★ ★ ★ (21x)	★ ★ ★ ★ ★ (7x)
View	Download	Italian (Italiano)	it	kaldi	★ ★ ★ ★ ★ (1x)	★ ★ ★ ★ ★ (1x)
View	Download	Kazakh (қазақша)	kz	pocketsphinx	UNTESTED	UNTESTED
View	Download	Korean	ko-kr	kaldi	☹ (4x)	☹ (4x)
View	Download	Mandarin	zh-cn	pocketsphinx	UNTESTED	UNTESTED
View	Download	Polish (polski)	pl	julius	UNTESTED	UNTESTED
View	Download	Portuguese (Português)	pt-br	pocketsphinx	★ ★ ★ ★ (51x)	☹ (11x)
View	Download	Russian (Русский)	ru	kaldi	★ ★ ★ ★ ★ (2x)	★ ★ ★ ★ ★ (0.5x)
View	Download	Russian (Русский)	ru	pocketsphinx	★ ★ ★ ★ ★ (17x)	☹ (1x)
View	Download	Spanish (Español)	es	kaldi	★ ★ ★ ★ ★ (4x)	★ ★ ★ ★ ★ (1x)
View	Download	Spanish (Español)	es	pocketsphinx	★ ★ ★ ★ (25x)	★ ★ ★ ★ (15x)
View	Download	Spanish	es-mexican	pocketsphinx	★ ★ ★ ★ ★ (9x)	★ ★ ★ ★ (2x)
View	Download	Swedish (svenska)	sv	kaldi	★ ★ ★ ★ (3x)	☹ (1x)
View	Download	Vietnamese (Tiếng Việt)	vi	kaldi	★ ★ ★ ★ ★ (4x)	☹ (1x)

Legend

Each profile is given a ★ rating, indicating how accurate it was at transcribing a set of test WAV files. I'm considering anything below 75% accuracy to be effectively unusable (☹).

Transcription Accuracy
★ ★ ★ ★ ★	[95%, 100%]
★ ★ ★ ★	[90%, 95%)
★ ★ ★	[85%, 90%)
★ ★	[80%, 85%)
★	[75%, 80%)
☹	[0%, 75%)

Profiles are tested in two conditions:

Closed
- All example sentences from the profile's sentences.ini are run through Google WaveNet to produce synthetic speech
- The profile is trained and tested on exactly the sentences it should recognize (ideal case)
- This resembles the intended use case of voice2json, though real world speech will be less perfect
Open
- Speech examples are provided by contributors, VoxForge, or Mozilla Common Voice
- The profile is tested using the sample WAV files with the --open flag
- This (usually) demonstrates why its best to define voice commands first!

Transcription speed-up is given as (Nx) where N is the average ratio of real-time to transcription time. A value of 2x means that voice2json was able to transcribe the test WAV files twice as fast as their real-time durations on average. The reported values come from an Intel Core i7-based laptop with 16GB of RAM, so expect slower transcriptions on Raspberry Pi's.

Acknowledgements

The acoustic models and pronunciation dictionaries come from one of:

When language models or grapheme-to-phoneme models were unavailable, they were generated using:

Data from Universal Dependencies
The Phonetisaurus G2P tool

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
bin		bin
catalan		catalan
czech		czech
dutch		dutch
english		english
french		french
german		german
greek		greek
hindi		hindi
italian		italian
kazakh		kazakh
korean		korean
mandarin		mandarin
polish		polish
portuguese		portuguese
russian		russian
spanish		spanish
swedish		swedish
vietnamese		vietnamese
.gitignore		.gitignore
.gitmodules		.gitmodules
.projectile		.projectile
AUTHORS		AUTHORS
LICENSE		LICENSE
Makefile		Makefile
PROFILES		PROFILES
README.md		README.md
__main__.py		__main__.py
dodo.py		dodo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice2json profiles

Data

Supported Languages

Legend

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

synesthesiam/voice2json-profiles

Folders and files

Latest commit

History

Repository files navigation

voice2json profiles

Data

Supported Languages

Legend

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages