Build large audio corpora in various languages → {Yorùbá, Urhobo, Edo, Èʋe, Igbo}
Curate specific language corpora from the wealth of audio available in good quality on YouTube The process is as follows:
- Locate a list of existing playlists, e.g. OrisunTV Iroyin
- Alternatively, create a new playlist with a custom set of YouTube videos
- Update
yoruba_sources.yml
with the reference to the playlist - Execute
$ python download_youtube.py --output ./audio/
- Python 3.7 or later
pip install -r requirements.txt