Chord predictor using a Convolutional Neural Network on image representations of audio\
This project was part of LighthouseLabs data science bootcamp and took two weeks to complete. It consisted of:
- research on audio transformation and manipulations in python
- get a dataset with chord annotations
- data processing
- feature extraction of chromagrams from audio data
- train deep learning model
- make prediction on a new audio sample: using the command line or through a streamlit web app.
The dataset
The dataset is composed of 180 tracks, each having one comp audio file and one solo audio file, making a total of 360 audio files of approximatively 30sec. This audio data comes with precise annotations about the chords. There are two informations regarding the chords: the chords instructed to the player and chord that they actually played (the two differ slightly since the musicians had some leeway to fit a speicif music style, they were improvising around a root chord on a given style).
After exploring the data, I noticed that in the solo audio files, the musicians got a bit far from the instructed chord, compared to the comp audio files. I narrowed down this dataset in the following way during the data processing phrase:
- use only comp audio tracks (180)
- remove very under representated classes (all hdim and 7 chords)
- keep only slices of audio where the chord played is "close enough" to the chord instructed
The chords that this model is capable of transcribing are the 24 most common chords:
A:maj, A:min, A#:maj, A#:min
B:maj, B:min
C:maj, C:min, C#:maj, C#:min
D:maj, D:min, D#:maj, D#:min
E:maj, E:min
F:maj, F:min, F#:maj, F#:min
G:maj, G:min, G#:maj, G#:min\
I used chomagrams CENS (Chroma Energy Normalized) using the Librosa pakage. This transformation smoothes out local deviations by taking statistics over large windows. It gave better performance for this particular dataset model because of the high variability around root chords in the dataset. I also used the magphase transformation to de-noise images.
I used Keras Tensorflow to build a Convolutional Neural network with two hidden layers and used Maxpooling layers.
-
Clone the repo and install requirements.txt\
-
Clone this repo
$ git clone https://github.com/Isabelle-Dr/WhatTheChord.git $ cd WhatTheChord
-
Install virtual environment using pip, activate it and install requirements.txt
$ pip install virtualenv\ $ virtualenv .venv\ $ source .venv/bin/activate\ $ pip install -r requirements.txt
-
Run the following code
$ streamlit run streamlit.py
-
The webapp is now launched in your browser! If not, you can open it at http://localhost:8501 Play aorund with it, you can either record your own sample or upload a file.
Make sure the audio files you feed the model only contain a variation of one chord for best results.
Put the audio files you want to predict in the prediction
folder (.wav or .mp3). There is already some files in that folder if you want to use them for prediction
Then, run this command from the comand line (make sure you're situated in the what-the-chord
directory)
$ python app.py <yourfilename.wav>
- That's it! You'll see the predicted chord right after this command :) In the prediciton folder, you'll also see the chromagram image of your song!
-
Clone this repo
$ git clone https://github.com/Isabelle-Dr/what-the-chord.git $ cd what-the-chord
-
Install virtual environment using pip, activate it and install requirements.txt
$ pip install virtualenv\ $ virtualenv .venv\ $ source .venv/bin/activate\ $ pip install -r requirements.txt
-
Dowbload the guitarset dataset, store all the audio files in the
audio
folder and the annotation files in theannotation
folders under thedata_prepared
folder -
run
python data_extraction.py
It might take a while. You'll see chomagrams being created in thedata_prepared/images
folder, exciting! -
run
python modeling.py
It might take 15-20min. After it, you're done! Your brand new model will be saved in thepickles
folder, and it's ready for prediction!
org/repo/
├── WhatTheChord/
| ├── data_prepared/
| | └── images/ # where chromagrams will be stored when running data_extraction.py
| ├── data_raw/
| | └── annotations/ # contains raw annotations data
| | └── audio/ # contains raw audio data
| ├── readme_images/ # contains images used in the readme
| ├── pickles/ # contains the model and label encoder files
| ├── temp_prediction/ # used to store audio files used for predictions from the command line
| ├── streamlit/
| | └── images/ # images for the streamlit app background and the chromagram image
| ├── notebooks/
| ├── app.py
| ├── data_extraction.py
| ├── modeling.py
| ├── settings.py
| ├── streamlit.py
| └── requirements.txt
The Music Information Retrieval Community for their help and resources, with a special thanks to napulen
The MIREX website
Streamlit app design by Paul Geiben\
Papers:
A FULLY CONVOLUTIONAL DEEP AUDITORY MODEL FOR MUSICAL CHORD
RECOGNITION
Audiolabs - Hidden Markov Models
Audiolabs - Chord Recognition