A subset of the MARD dataset was created for genre classification experiments. It contains 100 albums by genre from different artists, from 13 different genres (Alternative Rock, Classical, Country, Dance & Electronic, Folk, Jazz, Latin Music, Metal, New Age, Pop, R&B, Rap & Hip-Hop, Rock). All the albums have been mapped to MusicBrainz and AcousticBrainz. It contains linguistic and sentiment features. It is stored as a dictionary, where the keys are the amazon-ids. The file is called classification_dataset.json
We also provide all the necessary files to reproduce the experiments on genre classification in the paper referenced below. entity_features_dataset.json contains the entities and categories identified in the reviews for every album, entity_features_dataset_broader.json contains also the broader Wikipedia categories, genre_classification.py is the Python script used for the experiment. Finally, train_x.csv and test_x.csv contains the 5 different splits in the dataset used for cross validation.
If you use this code for research purposes, please cite our paper:
Oramas, S., Espinosa-Anke L., Lawlor A., Serra X., & Saggion H. (2016). Exploring Customer Reviews for Music Genre Classification and Evolutionary Studies. 17th International Society for Music Information Retrieval Conference (ISMIR16).
This project is licensed under the terms of the MIT license.