Landing page for data, code and publications for this project sponsored by an Imminent Research Grant.
In 2022, we launched the curation and recording of 40 hours of high-fidelity speech data for the Yorùbá language, the third most widely spoken language in Africa with over 40 million L1 speakers. We partner with the YorubaName organization in Nigeria to encourage volunteers both online and offline to record their voices.
- Official project blog → www.yorubavoice.com
- The dataset is published in the ELRA catalogue →
- ELRA Resource description page
- 012-405-700-001-6 → Corresponding unique ISLRN number to use in citations, publications
- The LREC-COLING 2024 paper → arXiv
- The Speech Recorder App we developed → yoruba-voice-speech-recorder
- Source code and various tools used can be found in this present repo
If you make use of our dataset, please cite the our paper.
@misc{ogunremi2023iroyinspeech,
title={\`{I}r\`{o}y\`{i}nSpeech: A multi-purpose Yor\`{u}b\'{a} Speech Corpus},
author={Tolulope Ogunremi and Kola Tubosun and Anuoluwapo Aremu and Iroro Orife and David Ifeoluwa Adelani},
year={2023},
eprint={2307.16071},
archivePrefix={arXiv},
primaryClass={cs.CL}
}