-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot load timit_asr data set #4422
Comments
Thanks for reporting, @bhaddow. I'm fixing it. |
Thanks for the quick fix! |
@bhaddow we have also made a fix so that you don't have to convert to uppercase the file extensions of the LDC data. Would you mind checking if it works OK now for you and reporting if there are any issues? Thanks. |
Hi @albertvillanova -It loads fine on a copy of the data from deepai - although I have to remove the copies of the .WAV files (with extension .WAV,wav). On a copy of the data that was obtained from the LDC, the glob still fails to find the files. The LDC copy looks like it was copied from CD, in 2004, so the structure may be different to a current download. |
Ah, if I change the train/ and test/ directories to TRAIN/ and TEST/ then it works! |
Thanks for your investigation and report, @bhaddow. I'm adding another fix for the TRAIN/train and TEST/test directory names. |
Describe the bug
I am trying to load the timit_asr data set. I have tried with a copy from the LDC, and a copy from deepai. In both cases they fail with a "duplicate key" error. With the LDC version I have to convert the file extensions all to upper-case before I can load it at all.
Steps to reproduce the bug
Expected results
The data set should load without error. It worked for me before the LDC url change.
Actual results
Environment info
datasets
version:datasets
version: 2.2.2The text was updated successfully, but these errors were encountered: