Skip to content

synesthesiam/voice2json-profiles

Repository files navigation

voice2json profiles

Speech models and supporting files for voice2json.

Data

Files are contained in <LANGUAGE>/<LOCALE> directories. Each locale directory should contain a SOURCE file describing where it was sourced from. The LICENSE file in each locale directory covers the artifacts for that specific profile.

  • Directories with pocketsphinx contain CMU Sphinx acoustic models
  • Directories with kaldi contain Kaldi acoustic models (either gmm or nnet3).
  • Directories with deepspeech contain Mozilla DeepSpeech acoustic models (version 0.6).
  • Directories with julius contain Julius acoustic models (DNN, version 4.5).

Some files are split into multiple parts so that they can be uploaded to GitHub. This is done with the split command:

split -d -b 25M FILE FILE.part-

They can be recombined simply with:

cat FILE.part-* > FILE

Supported Languages

voice2json supports the following languages/locales. I don't speak or write any language besides U.S. English very well, so please let me know if any profile is broken or could be improved!

Untested profiles (highlighted below) may work, but I don't have the necessary data or enough understanding of the language to test them.

Language Locale System Closed Open
View Download Catalan ca-es pocketsphinx UNTESTED UNTESTED
View Download Czech cs-cz Kaldi UNTESTED UNTESTED
View Download Dutch (Nederlands) nl kaldi ★ ★ ★ ★ ★ (2x) ☹ (1x)
View Download Dutch (Nederlands) nl pocketsphinx ★ ★ ★ ★ (18x) ☹ (3x)
View Download English en-in pocketsphinx ☹ (4x) ☹ (4x)
View Download English en-us deepspeech ★ ★ ★ ★ ★ (1x) ★ ★ ★ ★ (1x)
View Download English en-us julius ★ ★ ★ ★ (1x) UNTESTED
View Download English en-us kaldi ★ ★ ★ ★ ★ (3x) ★ ★ ★ ★ (1x)
View Download English en-us pocketsphinx ★ ★ ★ ★ ★ (9x) ★ ★ ★ ★ (2x)
View Download French (Français) fr kaldi ★ ★ ★ ★ (4x) ★ ★ ★ ★ (1x)
View Download French (Français) fr kaldi ★ ★ ★ ★ ★ (3x) ★ ★ ★ ★ ★ (0.5x)
View Download French (Français) fr pocketsphinx ★ ★ ★ ★ (23x) ☹ (3x)
View Download German (Deutsch) de pocketsphinx ★ ★ ★ ★ ★ (17x) ★ ★ ★ ★ ★ (3x)
View Download German (Deutsch) de-DE deepspeech ★ ★ ★ ★ ★ (1x) ★ ★ ★ ★ (1x)
View Download German (Deutsch) de-DE kaldi ★ ★ ★ ★ ★ (4x) ★ ★ ★ ★ (1x)
View Download Greek (Ελληνικά) el-gr pocketsphinx ★ ★ ★ ★ ★ (15x) ☹ (1x)
View Download Hindi (Devanagari) hi pocketsphinx UNTESTED UNTESTED
View Download Italian (Italiano) it pocketsphinx ★ ★ ★ ★ ★ (21x) ★ ★ ★ ★ ★ (7x)
View Download Italian (Italiano) it kaldi ★ ★ ★ ★ ★ (1x) ★ ★ ★ ★ ★ (1x)
View Download Kazakh (қазақша) kz pocketsphinx UNTESTED UNTESTED
View Download Korean ko-kr kaldi ☹ (4x) ☹ (4x)
View Download Mandarin zh-cn pocketsphinx UNTESTED UNTESTED
View Download Polish (polski) pl julius UNTESTED UNTESTED
View Download Portuguese (Português) pt-br pocketsphinx ★ ★ ★ ★ (51x) ☹ (11x)
View Download Russian (Русский) ru kaldi ★ ★ ★ ★ ★ (2x) ★ ★ ★ ★ ★ (0.5x)
View Download Russian (Русский) ru pocketsphinx ★ ★ ★ ★ ★ (17x) ☹ (1x)
View Download Spanish (Español) es kaldi ★ ★ ★ ★ ★ (4x) ★ ★ ★ ★ ★ (1x)
View Download Spanish (Español) es pocketsphinx ★ ★ ★ ★ (25x) ★ ★ ★ ★ (15x)
View Download Spanish es-mexican pocketsphinx ★ ★ ★ ★ ★ (9x) ★ ★ ★ ★ (2x)
View Download Swedish (svenska) sv kaldi ★ ★ ★ ★ (3x) ☹ (1x)
View Download Vietnamese (Tiếng Việt) vi kaldi ★ ★ ★ ★ ★ (4x) ☹ (1x)

Legend

Each profile is given a ★ rating, indicating how accurate it was at transcribing a set of test WAV files. I'm considering anything below 75% accuracy to be effectively unusable (☹).

Transcription Accuracy
★ ★ ★ ★ ★ [95%, 100%]
★ ★ ★ ★ [90%, 95%)
★ ★ ★ [85%, 90%)
★ ★ [80%, 85%)
[75%, 80%)
[0%, 75%)

Profiles are tested in two conditions:

  1. Closed
    • All example sentences from the profile's sentences.ini are run through Google WaveNet to produce synthetic speech
    • The profile is trained and tested on exactly the sentences it should recognize (ideal case)
    • This resembles the intended use case of voice2json, though real world speech will be less perfect
  2. Open
    • Speech examples are provided by contributors, VoxForge, or Mozilla Common Voice
    • The profile is tested using the sample WAV files with the --open flag
    • This (usually) demonstrates why its best to define voice commands first!

Transcription speed-up is given as (Nx) where N is the average ratio of real-time to transcription time. A value of 2x means that voice2json was able to transcribe the test WAV files twice as fast as their real-time durations on average. The reported values come from an Intel Core i7-based laptop with 16GB of RAM, so expect slower transcriptions on Raspberry Pi's.

Acknowledgements

The acoustic models and pronunciation dictionaries come from one of:

When language models or grapheme-to-phoneme models were unavailable, they were generated using: