Added the popular GTZAN dataset: #668

mmxgn · 2020-05-29T14:04:49Z

Added the GTZAN class in torchaudio.datasets using the same format as the rest of the datasets.
Added the appropriate test function in test_datasets.py.
Added the GTZAN class in the datasets.rst documentation file.

* Added the GTZAN class in torchaudio.datasets using the same format as the rest of the datasets. * Added the appropriate test function in test_datasets.py. * Added the GTZAN class in the datasets.rst documentation file.

mmxgn · 2020-05-29T14:12:58Z

Note The Tests seem to fail due to the dataset not being downloaded. I do not know how to mitigate it.

Hi,

The GTZAN is a very popular dataset for Genre classification first described in:
" Musical genre classification of audio signals " by G. Tzanetakis and P. Cook in IEEE Transactions on Audio and Speech Processing 2002.

While its popularity might have fallen recently due to larger and better datasets I believe it is still an essential dataset to have due to its small size and its ubiquity in the literature and otherwise. Lots of tutorials on the web make use of it.

Example of usage:

FFT_HOP = 256
FFT_SIZE = 512
N_MELS = 96

gtzan_ds = GTZAN('data', 
                 filtered='training', 
                 download=True,
                 transform=torchaudio.transforms.MelSpectrogram(
                 n_fft=FFT_SIZE,
                 hop_length=FFT_HOP,
                 n_mels=N_MELS)
                )

gtzan_ds[0]

Output:

(tensor([[[2.4445e-01, 1.1175e-02, 2.2179e-02,  ..., 3.6174e-02,
           2.0766e-01, 4.3936e+00],
          [5.1548e-01, 2.3564e-02, 4.6768e-02,  ..., 7.6280e-02,
           4.3789e-01, 9.2648e+00],
          [4.2430e-02, 3.6839e+00, 4.0256e+00,  ..., 7.7146e+00,
           4.7242e-01, 7.4579e+00],
          ...,
          [1.4867e-03, 2.7238e-07, 2.4675e-06,  ..., 1.7559e-06,
           1.2393e-06, 5.9337e-03],
          [1.3977e-03, 1.1830e-07, 1.3188e-07,  ..., 1.8985e-07,
           2.2827e-07, 6.1415e-03],
          [1.3772e-03, 1.3670e-07, 9.7792e-08,  ..., 1.2337e-07,
           1.6914e-07, 6.3171e-03]]]),
 22050,
 'blues')

Additionally, since the original dataset does not provide a train,test split, I took the one from

https://github.com/jordipons/sklearn-audio-transfer-learning/tree/master/data/index/GTZAN

Which mitigates some (all?) of the duplication issues in the GTZAN and allows for comparison with the method available in this repository. This split can be enabled with the filtered=set where set is one of training, validation, testing, during initialization.

vincentqb

Thanks for working on this :)

vincentqb · 2020-05-29T14:40:19Z

test/test_datasets.py

@@ -55,6 +56,11 @@ def test_speechcommands(self):
        data = SPEECHCOMMANDS(self.path)
        data[0]

+    def test_gtzan(self):
+        data = GTZAN(self.path)


The other tests include a first dummy file with a directory structure to mimic the dataset in the repository. The default value of download is False and we do not want the tests to download files anyway.

I am sorry it seems I missed this part. I will fix it.

vincentqb · 2020-05-29T14:49:32Z

torchaudio/datasets/gtzan.py

+        filtered: bool = False,
+        subset: str = "",


These two parameters are redundant. We could just have subset with a default value of None, and then letting users pick one or many of train, validation, test.

This is interesting. I thought I had removed it. I will be more careful in the future. I will fix it for now.

vincentqb · 2020-05-29T14:49:46Z

torchaudio/datasets/gtzan.py

+        transform: Any = None,
+        target_transform: Any = None,


We have not been including transform and target_transform in datasets in audio. We can discuss adding this in a separate thread, but not here.

I am sorry about that. I will remove them.

I had started working on the dataset looking at the YESNO dataset as a template which has the transforms. Is that intentional? If not, is it a bug? Should I remove the transforms from there while I am at it?

In YESNO, those parameters will give a deprecation warning in __init__ :)

vincentqb · 2020-05-29T14:50:23Z

torchaudio/datasets/gtzan.py

+        assert (
+            not filtered or subset in ["training", "validation", "testing"] and filtered
+        ), (
+            "When `filtered` is True, subset must take a value from "
+            + "{'training', 'validation', 'testing'}, otherwise `filtered` must be False."
+        )


This warning will not be needed after removing filtered

You are correct. I will get to it.

vincentqb · 2020-05-29T15:53:32Z

torchaudio/datasets/gtzan.py

+#    methods (e.g. the one in jordipons/sklearn-audio-transfer-learning).
+#
+# Those are used when GTZAN is initialised with the `filtered` keyword.
+# The split was taken from (github) jordipons/sklearn-audio-transfer-learning.


Are there other splits that people use instead?

AFAIK yes but this one has been used in a previous ISMIR late breaking demo by Jordi Pons and Xavier Serra:

Jordi Pons, Xavier Serra. “musicnn: pre-trained convolutional neural networks for music audio tagging”, Late Break-ing/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019.

As well as their transfer learning with MusiCNN:
https://github.com/jordipons/sklearn-audio-transfer-learning/

* Added dummy noise .wav in `test/assets/` * Removed transforms of input and output from the dataset `__init__` function, as well as the corresponding methods. * Replaced rendundant `filtered` and `subset` methods from class initialization and also changed the corresponding assertion message.

mmxgn · 2020-05-29T18:14:53Z

Hi,

Thanks for being patient with me :)

Some tests failed for reasons unknown when running curl while installing:

Fetching liblame from https://downloads.sourceforge.net/project/lame/lame/3.99/lame-3.99.5.tar.gz
curl: (7) Failed to connect to 2607:f748:10:12::5f:2: Cannot assign requested address

Is there a way to rerun the failed builds?

vincentqb · 2020-05-29T21:28:22Z

Thanks for being patient with me :)

And thanks for your contributions! :)

Some tests failed for reasons unknown when running curl while installing:
Fetching liblame from https://downloads.sourceforge.net/project/lame/lame/3.99/lame-3.99.5.tar.gz
curl: (7) Failed to connect to 2607:f748:10:12::5f:2: Cannot assign requested address
Is there a way to rerun the failed builds?

Unfortunately, we do not have a "bot" to trigger the build. However, as noted in pytorch/pytorch#17057, you can locally run git commit --amend and then force push to the branch. This will change the hash of you latest commit (without changing the content or message), and trigger the tests again.

vincentqb

Thanks for working on this! LGTM

…taset (#791) * Addressed review issues in PR #668 * Changed GTZAN so that it only traverses filenames belonging to the dataset Now, instead of walking the whole directory and subdirectories of the dataset GTZAN only looks for files under a `genre`/`genre`.`5 digit number`.wav format, where `genre` is an allowed GTZAN genre label. This allows moving or removing files from the dataset (e.g. for fixing duplication or mislabeling issues).

Added the popular GTZAN dataset:

f18d102

* Added the GTZAN class in torchaudio.datasets using the same format as the rest of the datasets. * Added the appropriate test function in test_datasets.py. * Added the GTZAN class in the datasets.rst documentation file.

mmxgn marked this pull request as ready for review May 29, 2020 14:17

vincentqb suggested changes May 29, 2020

View reviewed changes

mmxgn force-pushed the master branch 7 times, most recently from fb161e3 to 32bbc7b Compare June 2, 2020 11:24

Fixed E303: too many blank lines error

0cb6fa9

mmxgn force-pushed the master branch from 32bbc7b to 0cb6fa9 Compare June 2, 2020 15:17

mmxgn and others added 4 commits June 2, 2020 18:21

Added GTZAN to __init__.__all__

1586dda

Fixed incorrectly not importing GTZAN

79b6aa3

removed duplicate warning

405cae2

lint

93f8397

vincentqb approved these changes Jun 2, 2020

View reviewed changes

vincentqb merged commit b036725 into pytorch:master Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the popular GTZAN dataset: #668

Added the popular GTZAN dataset: #668

mmxgn commented May 29, 2020

mmxgn commented May 29, 2020 •

edited

Loading

vincentqb left a comment •

edited

Loading

vincentqb May 29, 2020

mmxgn May 29, 2020

vincentqb May 29, 2020

mmxgn May 29, 2020

vincentqb May 29, 2020

mmxgn May 29, 2020

vincentqb May 29, 2020

vincentqb May 29, 2020

mmxgn May 29, 2020

vincentqb May 29, 2020

mmxgn May 29, 2020

mmxgn commented May 29, 2020

vincentqb commented May 29, 2020

vincentqb left a comment

Added the popular GTZAN dataset: #668

Added the popular GTZAN dataset: #668

Conversation

mmxgn commented May 29, 2020

mmxgn commented May 29, 2020 • edited Loading

vincentqb left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmxgn commented May 29, 2020

vincentqb commented May 29, 2020

vincentqb left a comment

Choose a reason for hiding this comment

mmxgn commented May 29, 2020 •

edited

Loading

vincentqb left a comment •

edited

Loading