Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the popular GTZAN dataset: #668

Merged
merged 7 commits into from
Jun 2, 2020
Merged

Added the popular GTZAN dataset: #668

merged 7 commits into from
Jun 2, 2020

Conversation

mmxgn
Copy link
Contributor

@mmxgn mmxgn commented May 29, 2020

  • Added the GTZAN class in torchaudio.datasets using the same format as the rest of the datasets.
  • Added the appropriate test function in test_datasets.py.
  • Added the GTZAN class in the datasets.rst documentation file.

* Added the GTZAN class in torchaudio.datasets using the same format as the rest of the datasets.
* Added the appropriate test function in test_datasets.py.
* Added the GTZAN class in the datasets.rst documentation file.
@mmxgn
Copy link
Contributor Author

mmxgn commented May 29, 2020

Note The Tests seem to fail due to the dataset not being downloaded. I do not know how to mitigate it.

Hi,

The GTZAN is a very popular dataset for Genre classification first described in:
" Musical genre classification of audio signals " by G. Tzanetakis and P. Cook in IEEE Transactions on Audio and Speech Processing 2002.

While its popularity might have fallen recently due to larger and better datasets I believe it is still an essential dataset to have due to its small size and its ubiquity in the literature and otherwise. Lots of tutorials on the web make use of it.

Example of usage:

FFT_HOP = 256
FFT_SIZE = 512
N_MELS = 96

gtzan_ds = GTZAN('data', 
                 filtered='training', 
                 download=True,
                 transform=torchaudio.transforms.MelSpectrogram(
                 n_fft=FFT_SIZE,
                 hop_length=FFT_HOP,
                 n_mels=N_MELS)
                )

gtzan_ds[0]

Output:

(tensor([[[2.4445e-01, 1.1175e-02, 2.2179e-02,  ..., 3.6174e-02,
           2.0766e-01, 4.3936e+00],
          [5.1548e-01, 2.3564e-02, 4.6768e-02,  ..., 7.6280e-02,
           4.3789e-01, 9.2648e+00],
          [4.2430e-02, 3.6839e+00, 4.0256e+00,  ..., 7.7146e+00,
           4.7242e-01, 7.4579e+00],
          ...,
          [1.4867e-03, 2.7238e-07, 2.4675e-06,  ..., 1.7559e-06,
           1.2393e-06, 5.9337e-03],
          [1.3977e-03, 1.1830e-07, 1.3188e-07,  ..., 1.8985e-07,
           2.2827e-07, 6.1415e-03],
          [1.3772e-03, 1.3670e-07, 9.7792e-08,  ..., 1.2337e-07,
           1.6914e-07, 6.3171e-03]]]),
 22050,
 'blues')

Additionally, since the original dataset does not provide a train,test split, I took the one from

https://github.com/jordipons/sklearn-audio-transfer-learning/tree/master/data/index/GTZAN

Which mitigates some (all?) of the duplication issues in the GTZAN and allows for comparison with the method available in this repository. This split can be enabled with the filtered=set where set is one of training, validation, testing, during initialization.

@mmxgn mmxgn marked this pull request as ready for review May 29, 2020 14:17
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this :)

@@ -55,6 +56,11 @@ def test_speechcommands(self):
data = SPEECHCOMMANDS(self.path)
data[0]

def test_gtzan(self):
data = GTZAN(self.path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other tests include a first dummy file with a directory structure to mimic the dataset in the repository. The default value of download is False and we do not want the tests to download files anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry it seems I missed this part. I will fix it.

Comment on lines 1000 to 1001
filtered: bool = False,
subset: str = "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two parameters are redundant. We could just have subset with a default value of None, and then letting users pick one or many of train, validation, test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I thought I had removed it. I will be more careful in the future. I will fix it for now.

Comment on lines 1002 to 1003
transform: Any = None,
target_transform: Any = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not been including transform and target_transform in datasets in audio. We can discuss adding this in a separate thread, but not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry about that. I will remove them.

I had started working on the dataset looking at the YESNO dataset as a template which has the transforms. Is that intentional? If not, is it a bug? Should I remove the transforms from there while I am at it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In YESNO, those parameters will give a deprecation warning in __init__ :)

Comment on lines 1016 to 1021
assert (
not filtered or subset in ["training", "validation", "testing"] and filtered
), (
"When `filtered` is True, subset must take a value from "
+ "{'training', 'validation', 'testing'}, otherwise `filtered` must be False."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning will not be needed after removing filtered

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I will get to it.

# methods (e.g. the one in jordipons/sklearn-audio-transfer-learning).
#
# Those are used when GTZAN is initialised with the `filtered` keyword.
# The split was taken from (github) jordipons/sklearn-audio-transfer-learning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other splits that people use instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK yes but this one has been used in a previous ISMIR late breaking demo by Jordi Pons and Xavier Serra:

Jordi Pons, Xavier Serra. “musicnn: pre-trained convolutional neural networks for music audio tagging”, Late Break-ing/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019.

As well as their transfer learning with MusiCNN:
https://github.com/jordipons/sklearn-audio-transfer-learning/

* Added dummy noise .wav in `test/assets/`
* Removed transforms of input and output from the dataset
  `__init__` function, as well as the corresponding methods.
* Replaced rendundant `filtered` and `subset` methods from
  class initialization and also changed the corresponding
  assertion message.
@mmxgn
Copy link
Contributor Author

mmxgn commented May 29, 2020

Hi,

Thanks for being patient with me :)

Some tests failed for reasons unknown when running curl while installing:

Fetching liblame from https://downloads.sourceforge.net/project/lame/lame/3.99/lame-3.99.5.tar.gz
curl: (7) Failed to connect to 2607:f748:10:12::5f:2: Cannot assign requested address

Is there a way to rerun the failed builds?

@vincentqb
Copy link
Contributor

Thanks for being patient with me :)

And thanks for your contributions! :)

Some tests failed for reasons unknown when running curl while installing:

Fetching liblame from https://downloads.sourceforge.net/project/lame/lame/3.99/lame-3.99.5.tar.gz
curl: (7) Failed to connect to 2607:f748:10:12::5f:2: Cannot assign requested address

Is there a way to rerun the failed builds?

Unfortunately, we do not have a "bot" to trigger the build. However, as noted in pytorch/pytorch#17057, you can locally run git commit --amend and then force push to the branch. This will change the hash of you latest commit (without changing the content or message), and trigger the tests again.

@mmxgn mmxgn force-pushed the master branch 7 times, most recently from fb161e3 to 32bbc7b Compare June 2, 2020 11:24
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! LGTM

@vincentqb vincentqb merged commit b036725 into pytorch:master Jun 2, 2020
mthrok pushed a commit that referenced this pull request Jul 17, 2020
…taset (#791)

* Addressed review issues in PR #668

* Changed GTZAN so that it only traverses filenames belonging to the dataset

Now, instead of walking the whole directory and subdirectories of the dataset
GTZAN only looks for files under a `genre`/`genre`.`5 digit number`.wav format, where `genre` is an allowed GTZAN genre label.
This allows moving or removing files from the dataset (e.g. for fixing duplication or mislabeling issues).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants