-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the popular GTZAN dataset: #668
Conversation
mmxgn
commented
May 29, 2020
- Added the GTZAN class in torchaudio.datasets using the same format as the rest of the datasets.
- Added the appropriate test function in test_datasets.py.
- Added the GTZAN class in the datasets.rst documentation file.
* Added the GTZAN class in torchaudio.datasets using the same format as the rest of the datasets. * Added the appropriate test function in test_datasets.py. * Added the GTZAN class in the datasets.rst documentation file.
Note The Tests seem to fail due to the dataset not being downloaded. I do not know how to mitigate it. Hi, The GTZAN is a very popular dataset for Genre classification first described in: While its popularity might have fallen recently due to larger and better datasets I believe it is still an essential dataset to have due to its small size and its ubiquity in the literature and otherwise. Lots of tutorials on the web make use of it. Example of usage: FFT_HOP = 256
FFT_SIZE = 512
N_MELS = 96
gtzan_ds = GTZAN('data',
filtered='training',
download=True,
transform=torchaudio.transforms.MelSpectrogram(
n_fft=FFT_SIZE,
hop_length=FFT_HOP,
n_mels=N_MELS)
)
gtzan_ds[0] Output: (tensor([[[2.4445e-01, 1.1175e-02, 2.2179e-02, ..., 3.6174e-02,
2.0766e-01, 4.3936e+00],
[5.1548e-01, 2.3564e-02, 4.6768e-02, ..., 7.6280e-02,
4.3789e-01, 9.2648e+00],
[4.2430e-02, 3.6839e+00, 4.0256e+00, ..., 7.7146e+00,
4.7242e-01, 7.4579e+00],
...,
[1.4867e-03, 2.7238e-07, 2.4675e-06, ..., 1.7559e-06,
1.2393e-06, 5.9337e-03],
[1.3977e-03, 1.1830e-07, 1.3188e-07, ..., 1.8985e-07,
2.2827e-07, 6.1415e-03],
[1.3772e-03, 1.3670e-07, 9.7792e-08, ..., 1.2337e-07,
1.6914e-07, 6.3171e-03]]]),
22050,
'blues') Additionally, since the original dataset does not provide a train,test split, I took the one from
Which mitigates some (all?) of the duplication issues in the GTZAN and allows for comparison with the method available in this repository. This split can be enabled with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this :)
@@ -55,6 +56,11 @@ def test_speechcommands(self): | |||
data = SPEECHCOMMANDS(self.path) | |||
data[0] | |||
|
|||
def test_gtzan(self): | |||
data = GTZAN(self.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other tests include a first dummy file with a directory structure to mimic the dataset in the repository. The default value of download
is False
and we do not want the tests to download files anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry it seems I missed this part. I will fix it.
torchaudio/datasets/gtzan.py
Outdated
filtered: bool = False, | ||
subset: str = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two parameters are redundant. We could just have subset
with a default value of None
, and then letting users pick one or many of train, validation, test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. I thought I had removed it. I will be more careful in the future. I will fix it for now.
torchaudio/datasets/gtzan.py
Outdated
transform: Any = None, | ||
target_transform: Any = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not been including transform
and target_transform
in datasets in audio. We can discuss adding this in a separate thread, but not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry about that. I will remove them.
I had started working on the dataset looking at the YESNO dataset as a template which has the transforms. Is that intentional? If not, is it a bug? Should I remove the transforms from there while I am at it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In YESNO, those parameters will give a deprecation warning in __init__
:)
torchaudio/datasets/gtzan.py
Outdated
assert ( | ||
not filtered or subset in ["training", "validation", "testing"] and filtered | ||
), ( | ||
"When `filtered` is True, subset must take a value from " | ||
+ "{'training', 'validation', 'testing'}, otherwise `filtered` must be False." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning will not be needed after removing filtered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. I will get to it.
# methods (e.g. the one in jordipons/sklearn-audio-transfer-learning). | ||
# | ||
# Those are used when GTZAN is initialised with the `filtered` keyword. | ||
# The split was taken from (github) jordipons/sklearn-audio-transfer-learning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other splits that people use instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK yes but this one has been used in a previous ISMIR late breaking demo by Jordi Pons and Xavier Serra:
Jordi Pons, Xavier Serra. “musicnn: pre-trained convolutional neural networks for music audio tagging”, Late Break-ing/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019.
As well as their transfer learning with MusiCNN:
https://github.com/jordipons/sklearn-audio-transfer-learning/
* Added dummy noise .wav in `test/assets/` * Removed transforms of input and output from the dataset `__init__` function, as well as the corresponding methods. * Replaced rendundant `filtered` and `subset` methods from class initialization and also changed the corresponding assertion message.
Hi, Thanks for being patient with me :) Some tests failed for reasons unknown when running curl while installing:
Is there a way to rerun the failed builds? |
And thanks for your contributions! :)
Unfortunately, we do not have a "bot" to trigger the build. However, as noted in pytorch/pytorch#17057, you can locally run |
fb161e3
to
32bbc7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! LGTM
…taset (#791) * Addressed review issues in PR #668 * Changed GTZAN so that it only traverses filenames belonging to the dataset Now, instead of walking the whole directory and subdirectories of the dataset GTZAN only looks for files under a `genre`/`genre`.`5 digit number`.wav format, where `genre` is an allowed GTZAN genre label. This allows moving or removing files from the dataset (e.g. for fixing duplication or mislabeling issues).