Skip to content

Repository for the visual experiments of our ISMIR 2017 paper (Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features)

License

Notifications You must be signed in to change notification settings

fvancesco/music_resnet_classification

Repository files navigation

ResNet training in Torch

Visual experiments of the paper: Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features. Repo of audio and textual experiments.

This is a modified version of Facebook resnet (torch).

Requirments and installation installation instructions.

See the training recipes for complete examples.

Finetuning on our dataset

Download the dataset. Note that you have to organize the files as follow:

  • train/
    • label1/
    • ...
    • labeln/
  • test/
    • label1/
    • ...
    • labeln/
  • val/
    • label1/
    • ...
    • labeln/

Download the Imagnet pretrained model (resnet-101).

To finetune a resnet-101 pretrained model on the dataset run:

th main.lua -save <PATH_TO_NEW_MODEL> -LR 0.0001 -batchSize 50 -retrain <PATH_TO_PRETRAINED_MODEL> -data <PATH_TO_DATA> -resetClassifier true -nClasses 250

For example:

th main.lua -save checkpoints/ -LR 0.0001 -batchSize 50 -retrain misc/resnet-101.t7 -data misc/dataset/ -resetClassifier true -nClasses 250

Extract visual features

Download our resnet model (model_best.t7).

th extract-features.lua misc/model_best.t7 30 <IMAGES_LIST>

it will be saved in the main directory a numpy array N x D, where N is the number of files and D the dimensions of the vectors (2048). <IMAGES_LIST> is a text file containing the list of N images (full path) that you want to process.

About

Repository for the visual experiments of our ISMIR 2017 paper (Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages