This is a script that downloads the WAV files of the BookTubeSpeech dataset.
- Install pytube3:
pip3 install pytube3 --upgrade
- You must have
ffmpeg
to convert mp4 to wav - You must have
sox
to downsample the wav file
python3 download_data.py --output_dir=/path_to_download_dir
Some videos may have become unavailable since the publication of the original paper, e.g. deleted by the creator.
As of 2020.04.20, this script can download 8021 (out of 8450) WAV files successfully.