Skip to content

Latest commit

 

History

History
70 lines (58 loc) · 7.09 KB

README.md

File metadata and controls

70 lines (58 loc) · 7.09 KB

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

Abstract

We describe our development of CSS10, a collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts. To validate its quality we train two neural text-to-speech models on each dataset. Subsequently, we conduct Mean Opinion Score tests on the synthesized speech samples. We make our datasets, pretrained models, and test resources publicly available. We hope they will be used for future speech tasks.

For details, check our paper. Kyubyong gave a talk with this paper at the workshop of 2018 The Korean Society of Speech Sciences.

Environments & Dependencies

  • Linux
  • Python 2.X or 3.X
  • TensorFlow == 1.3
  • NumPy
  • Librosa
  • Matplotlib
  • tqdm
  • scipy

Audiobooks & Datasets

Code Language Audiobook Running Time Reader Dataset
de German 1. Meister Floh
2. Die acht Gesichter am Biwasee
3. Auswahl aus Die Serapionsbrüder
16:42:45 Hokuspokus CSS German
el Greek Παραμύθι χωρίς όνομα (Tale Without Name) 04:08:14 Rapunzelina CSS Greek
es Spanish 1. Bailén
2. El 19 de Marzo y el 2 de Mayo
3. La Batalla de los Arapiles
23:49:49 Tux CSS Spanish
fi Finnish 1. Gulliverin matkat kaukaisilla mailla
2. Ensimmäiset novellit
3. Kaleri-orja
4. Salmelan heinätalkoot
10:32:03 Harri Tapani Ylilammi CSS Finnish
fr French 1. Les Misérables - tome 5 .
2. Arsène Lupin contre Herlock Sholmès
19:09:03 Gilles G. Le Blanc CSS French
hu Hungarian Egri csillagok 10:00:25 Diana Majlinger CSS Hungarian
ja Japanese 明暗 (Meian) 14:55:36 ekzemplaro CSS Japanese
nl Dutch 20.000 Mijlen onder Zee 14:06:40 Bart de Leeuw CSS Dutch
ru Russian 1. Ice March - Ледяной поход
2. Early Short Stories
3. Short Stories for Children and Adults
21:22:10 Mark Chulsky CSS Russian
zh Chinese 1. 朝花夕拾 (Chao Hua Si She))2. 呐喊 (Call to Arms) 06:27:04 Jing Li CSS Chinese

Pretrained Models & Audio Samples

Code Lanuage Pretrained Models Audio Samples
de German DCTTS | TACOTRON DCTTS | TACOTRON
el Greek DCTTS DCTTS
es Spanish DCTTS | TACOTRON DCTTS | TACOTRON
fi Finnish DCTTS | TACOTRON DCTTS | TACOTRON
fr French DCTTS | TACOTRON DCTTS | TACOTRON
hu Hungarian DCTTS | TACOTRON DCTTS | TACOTRON
ja Japanese DCTTS | TACOTRON DCTTS | TACOTRON
nl Dutch DCTTS | TACOTRON DCTTS | TACOTRON
ru Russian DCTTS | TACOTRON DCTTS | TACOTRON
zh Chinese DCTTS | TACOTRON DCTTS | TACOTRON

Cite

@article{park2019css10,
  title={CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages},
  author={Park, Kyubyong and Mulc, Thomas},
  journal={Interspeech},
  year={2019}
}

By Kyubyong Park, Tommy Mulc