-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove insignificant test assets #764
Comments
I would like to work on this. Would we prefer doing this in one PR or okay to split across a 2-3 PR? |
Thanks for signing up. It is preferable to split PRs into ones simple enough to review and each PR works on one single aspect. For example
as such. so I think it's even better to split PRs into more than 3. |
Couple of comments: From this test run, I learned:
|
Are we sure we can remove In file |
Let me get back on this one.
This is my overlook. I only
You are right, we cannot remove the file because it's used by Update: The following applies to tests other than |
I did a grep for the mp3 file
I found the mp3 file in 4 places, but not in Second issue, I looked for |
I am sorry I replied you in very rushed manner and I gave you a wrong description which ended up confusing you.
Yes, that makes sense.
Yes, you are right. I updated the list. Thanks for checking the details I missed. |
Part of #764 - Replace `whitenoise.wav` with on-the-fly data generation - Replace `torchaudio.load` with `common_utils.load_wav` - Replace `steam-train-whistle-daniel_simon.mp3` with `.wav`
I looked at this one and found out that
Since we have data generation utilities and none of the dataset tests require a real data, I think we can remove these assets and move to on-the-fly generation. That way we can delete |
Hello, GTZAN indeed has a fixed number of samples but there are cases that paper change that. Some times people use e.g. different training-testing splits. Other times, papers remove some of the songs or move them to different genres to compensate to some of GTZAN's shortcomings. Of course now that you mention it there shouldn't be different genres, else it's not GTZAN anymore. What I could do is have it still traverse the files , but limit to the |
HI @mmxgn Thanks for response.
This sounds a reasonable approach. Let me know if you have time and would like to work on it, if you are busy I will file an issue and try to get help. One followup question: I downloaded
Let me work on this for another dataset, first. It will be easier for contributors if there is an example for doing that. |
Hello,
No problem at all, there is a little timezone difference but I think I can do it tommorow.
No, in reality GTZAN doesn't have such things, I just used it for testing. I took a noise sample already in the test assets and put it in an imaginary genre `noise', and since the dataset traverses the directory there was no problem in that.
Great, shall I keep then the noise sample, just with a different name? (e.g. blues.0000.wav) I didn't want to put an original GTZAN file for testing. |
I made an example PR to generate dataset on-the-fly for YESNO dataset. #792 |
Replacing the
https://gist.github.com/mthrok/b44af36d7724def6a27811f48f01d42f |
Do we just rename all occurrences of this file to above? (sox_compatibility_test.py, transforms_test.py, batch_consistency_test.py, io_test.py, README.md) |
Fixes error: `error: You need C++14 to compile PyTorch`.
@astaff had introduced guideline for test assets in #759 and we can get rid of the following existing assets.
100Hz_44100Hz_16bit_05sec.wav
sine wave, should be replaced by on-the-fly generation.440Hz_44100Hz_16bit_05sec.wav
sine wave, should be replaced by on-the-fly generation.CommonVoice/cv-corpus-4-2019-12-10/tt/clips/common_voice_tt_00000000.mp3
whitenoise, should be converted to wav so that test does not require mp3 decoder.dtmf_30s_stereo.mp3
not used.genres/noise/noise.0000.wav
should be replaced by on-the-fly generation.kaldi_file.wav
sine wave only contains 20 samples and I do not think this is appropriate for test.kaldi_file_8000.wav
sine wave, should prefer on-the-fly generation.sinewave.wav
sine wave, should prefer on-the-fly generation.steam-train-whistle-daniel_simon.mp3
should be replaced bysteam-train-whistle-daniel_simon.wav
test.wav
file generated duringtest_io.py
accidentally checked inwaves_yesno/0_1_0_1_0_1_1_0.wav
whitenoise_1min.mp3
should be replaced by on-the-fly generation.whitenoise.mp3
should be replaced by on-the-fly generation.whitenoise.wav
should be replaced by on-the-fly generation.General Direction for replacing assets with on-the-fly generation
The text was updated successfully, but these errors were encountered: