Starter kit for the WWW2018 challenge "Learning to Recognize Musical Genre" hosted on CrowdAI. The following overview paper summarizes our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results.
The data used for this challenge comes from the FMA dataset. You
are encouraged to check out that repository for Jupyter notebooks showing how
to use the data, exploring it, and training baseline models. This challenge
uses the rc1
version of the data, make sure to checkout that version of the
code. The associated paper describes the data.
Team | Round 1 leaderboard 35k clips log loss |
Round 1 subset 3k clips log loss |
Round 2 secret 3k clips log loss |
Rank | Round 1 leaderboard 35k clips F1 score |
Round 1 subset 3k clips F1 score |
Round 2 secret 3k clips F1 score |
---|---|---|---|---|---|---|---|
minzwon & jaehun | 0.55 | 0.67 | 1.31 | 1 | 85% | 80% | 63% |
hglim | 0.33 | 0.34 | 1.34 | 2 | 92% | 92% | 64% |
benjamin_murauer | 0.82 | 0.86 | 1.44 | 3 | 74% | 74% | 60% |
gg12 & check | 0.66 | 0.49 | 1.50 | 4 | 80% | 86% | 61% |
viper & algohunt | 0.66 | 0.65 | 1.52 | 5 | 80% | 81% | 60% |
mimbres | 0.41 | 0.43 | 2.08 | 6 | 90% | 90% | 60% |
The three columns per metric references:
- the best scores obtained on the public leaderboard during the first round,
- the scores obtained by the submitted systems on a subset of the public test set,
- the scores obtained by the submitted systems on a private test set collected for the second round.
Find more details in the slides used to announce the results and in the overview paper.
In the interest of reproducibility and transparency for interested researchers, you'll find below links to the source code repositories of all systems submitted by the participants for the second round of the challenge. Thanks to all the participants for making this happen!
- Transfer Learning of Artist Group Factors to Musical Genre Classification
- Jaehun Kim (@jaehun), TU Delft and Minz Won (@minzwon), Universitat Pompeu Fabra
- Code: https://gitlab.crowdai.org/minzwon/WWWMusicalGenreRecognitionChallenge
- Paper: https://doi.org/10.1145/3184558.3191823
- Ensemble of CNN-based Models using various Short-Term Input
- Hyungui Lim (@hglim), http://cochlear.ai
- Code: https://gitlab.crowdai.org/hglim/WWWMusicalGenreRecognitionChallenge
- Detecting Music Genre Using Extreme Gradient Boosting
- Benjamin Murauer (@benjamin_murauer), Universität Innsbruck
- Code: https://gitlab.crowdai.org/Benjamin_Murauer/WWWMusicalGenreRecognitionChallenge
- Paper: https://doi.org/10.1145/3184558.3191822
- ConvNet on STFT spectrograms
- Daniyar Chumbalov (@check), EPFL and Philipp Pushnyakov (@gg12), Moscow Institute of Physics and Technologies (MIPT)
- Code: https://gitlab.crowdai.org/gg12/WWWMusicalGenreRecognitionChallenge
- Xception on mel-scaled spectrograms
- Audio Dual Path Networks on mel-scaled spectrograms
- Sungkyun Chang (@mimbres), Seoul National University
- Code: https://gitlab.crowdai.org/mimbres/WWWMusicalGenreRecognitionChallenge
The repositories should be self-contained and easily executable. You can execute any of the systems on your own mp3s by following those steps:
- Clone the git repository.
- Build a docker image with
repo2docker
- Execute the docker image
Download and extract datasets such as:
- Training metadata
csv
files fromfma_metadata.zip
are accessible atdata/fma_metadata/*.csv
. - Training
mp3
files fromfma_medium.zip
are accessible atdata/fma_medium/*/*.mp3
. - Test
mp3
files fromfma_crowdai_www2018_test.tar.gz
are accessible atdata/crowdai_fma_test/*.mp3
.
git clone https://github.com/crowdAI/crowdai-musical-genre-recognition-starter-kit
cd crowdai-musical-genre-recognition-starter-kit
pip install -r requirements.txt
NOTE: This challenge requires crowdai
version 1.0.14 at least.
The code in this repository and the FMA repository has been tested with Python 3.6 only.
Run python convert.py
to convert data/fma_metadata/tracks.csv
to a simpler
data/train_labels.csv
file where the first column is the track_id
and the
second column is the target musical genre.
You can now load the training labels with:
import pandas as pd
labels = pd.read_csv('data/train_labels.csv', index_col=0)
The path to the training mp3 with a track_id
of 2 is given by:
import fma
path = fma.get_audio_path(2)
and can be loaded as a numpy array with:
import librosa
x, sr = librosa.load(path, sr=None, mono=False)
The list of testing file IDs can be obtained with:
import glob
test_ids = sorted(glob.glob('data/crowdai_fma_test/*.mp3'))
test_ids = [path.split('/')[-1][:-4] for path in test_ids]
and the path to a testing mp3 is given by:
path = 'data/crowdai_fma_test/{}.mp3'.format(test_ids[0])
The submission file can be created with:
CLASSES = ['Blues', 'Classical', 'Country', 'Easy Listening', 'Electronic',
'Experimental', 'Folk', 'Hip-Hop', 'Instrumental', 'International',
'Jazz', 'Old-Time / Historic', 'Pop', 'Rock', 'Soul-RnB', 'Spoken']
submission = pd.DataFrame(1/16, pd.Index(test_ids, name='file_id'), CLASSES)
submission.to_csv('data/submission.csv', header=True)
and then submitted with:
import crowdai
API_KEY = '<your_crowdai_api_key_here>'
challenge = crowdai.Challenge('WWWMusicalGenreRecognitionChallenge', API_KEY)
response = challenge.submit('data/submission.csv')
print(response['message'])
The random_submission.py script submits random predictions, to be run as:
python random_submission.py --round=1 --api_key=<YOUR CROWDAI API KEY>
The features.py script extracts many audio features (with the
help of librosa) from all training and testing mp3s. Extracted features are
stored in data/features.csv
. Script to be run as:
python features.py
Note that this script can take many hours to complete on the whole 60k tracks. For you to play with the data right away, you'll find those features pre-computed on the challenge's dataset page.
The baseline_svm.py script trains a support vector
classifier (SVC) with data/train_labels.csv
as target and
data/features.csv
as features. The predictions are stored in
data/submission_svm.csv
. Script to be run as:
python baseline_svm.py
Finally, a prediction can be submitted with the submit.py script:
python submit.py --api_key=<YOUR CROWDAI API KEY> data/submission.csv
The second round requires all participants to submit their code. It will be used by our grading orchestrator to predict the genres for all the files in a secret test set. The systems have to be submitted as binder compatible repositories. You'll find all the details to package and submit your code in the following documents:
Predictions will be made on an arbitrary number of mp3 files of at most 30 seconds each.
During the execution of the container, all the mp3 files will be mounted at /crowdai-payload
.
Execution of your container will be initiated by executing /home/run.sh
.
During the runtime, the container will not have access to the external Internet, and will have access to:
- 1 Nvidia GTX GeForce 1080 Ti (11 GB GDDR5X),
- 5 cores of an Intel Xeon E5-2650 v4 (2.20-2.90 GHz),
- 60 GB of RAM,
- 100 GB of disk,
- and a timeout of 10 hours.
At the end of the process, your model will simply be an "executable" git repository. Please provide an open-source license and a README with an executive summary of how your system works. At the end of the challenge, we'll make all these repositories public. The public list of repositories will allow anybody to easily reproduce and reuse any of your systems!
The content of this repository is released under the terms of the MIT license. Please cite our paper if you use it.
@inproceedings{fma_crowdai_challenge,
title = {Learning to Recognize Musical Genre from Audio},
author = {Defferrard, Micha\"el and Mohanty, Sharada P. and Carroll, Sean F. and Salath\'e, Marcel},
booktitle = {WWW '18 Companion: The 2018 Web Conference Companion},
year = {2018},
url = {https://arxiv.org/abs/1803.05337},
}