Mood classification using spectrograms of sound
Music streaming platforms such as Spotify and Apple Music recommend a list of songs that are generally known to be associated with a certain mood. We explore the possibilities of programmatically sorting out the songs based on the mood without a person individually tagging all the songs.
A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. We used these images of sound to train our convolutional neural network based model.
Each of the songs was sampled from minute 1 to 2 (1minute span), and the resulting image was converted to a black and white scale.
As implausible as it may sound, this project does not contain a single line of code of a neural network. Moodsic utilizes cloud-base machine intelligence, and in this case it employs AutoML of Google Cloud Platform (GCP). We fed the data with the labels tagged (supervised learning) in order to train some generic-purpose image processing model provided by GCP. Check AutoML for more information.
We manually tagged a hundred different songs to train the model. The names and labels of the songs are listed in DATA.md. With these songs, we trained our AutoML model with the precision of 75%.
This is a multi-module project that consists of the following.
- moodsic-web
- moodsic-web-backend
- spectrogram-creator
npm run dev To develop, runs `web` and `backend`
npm run spectrogram [audioPath?] [outputPath?] To run spectrogram generator
npm run web To run web-based visualizer
Most likely moodsic-web-backend
won't work on your machine right off you install. This is because you don't have a model on GCP and related token yet. The path to GOOGLE_APPLICATION_CREDENTIALS is set to be $HOME_PATH/.gcloud/key-1.json
. To learn how to train models on AutoML, check its website.
Sound may be processed by neural networks in different ways. These include treating it as a sequence of frequency data, where a value at a certain timeframe may be relevant to the ones before, or as a set of quanitites (pixels) in a coordinate system (spectrogram). In the case of songs, we know the exact length of data unlike the real-time audio recording. Treating sound as imagery did seem to yield some interesting observations on our experiment. Similar approaches have been proposed for a while.
By this time around, a lot of people are aware of the use and power of artificial neural network. That neural network is now more accessible through cloud AI platform. Various noble attempts in exploiting cloud AI are expected in the near future.
This is a product worked during HackSC 2020, by a team of the following.
- Elden Park [email protected]
- Victoria Shin
- Jennie Jeh
- Hyunjae Cho
- Prasanna Natarajan