Music Recommendation System Using Python
Team Members: Pragya Raghuvanshi, Sukhpreet Sahota, Dingkun Yang
This Github Repo supports a song recommendation system that is built by data pulled from Kaggle via Spotify API. The recommendation system enables users to get suggestions based on Popularity of the music/song
The dataset used for this project is from Kaggle and can be found here: Spotify dataset extracted by VATSAL MAVANI and posted on Kaggle, link: Dataset
Some features of the data:
Songs releasing years from 1921 - 2020
Songs Count: 133,638
Genres Count: 2,973
To create this recommendation system, we divided the project into 4 key steps:
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Building out a song popularity recommender
- Incorporating song features recommender into the song feature recommender
- Testing all components
Below is an illustrative flow chart outlining these 4 steps:
To replicate the results, please fork this git repository or clone it using git clone [email protected]:Yer1k/Song_Recommender.git
Upon doing so, please ensure all test files and source/code files are within the same directory. Please move all test files into your tests/directory if running this code locally. Please also install all libraries outlined within the requirements.txt
file.
In addition to the libraries specified within the requirements file, please also pip install pytest
before importing/executing the test files.
test_song_class and test_parse_data Example
Start by creating a set of songs before passing the Song class to ensure the song attributes are returned correctly. In testing the parse_data function, after creating these structures, pass this through the parse_data function and assert to make sure the song file returns are the same as dictionaries created when using the data.txt
file.
Example usage:
from parse_data import ( parse_data, ) from fake_files import fake_files def test_song_class() -> None: """Test Song class.""" song = Song( "4BJqT0PrAfrxzMOxytFOIz", "Piano Concerto No. 3 in D Minor", "Sergei Rachmaninoff & James Levine & Berliner Philharmoniker", "1921", "4", ) assert song.song_id == "4BJqT0PrAfrxzMOxytFOIz" assert song.song_name == "Piano Concerto No. 3 in D Minor" assert ( song.artist_name == "Sergei Rachmaninoff & James Levine & Berliner Philharmoniker" ) assert song.year == "1921" assert song.popularity == "4" assert song.__repr__() == ( "Song Name: Piano Concerto No. 3 in D Minor " + "by Sergei Rachmaninoff & James Levine & Berliner Philharmoniker, " + "Year: 1921" ) def test_parse_data() -> None: """Test parse_data function.""" with fake_files( [ [ "id", "name", "artists", "year", "popularity", ], [ "4BJqT0PrAfrxzMOxytFOIz", "Piano Concerto No. 3 in D Minor", "Sergei Rachmaninoff & James Levine & Berliner Philharmoniker", "1921", "4", ], ] ) as (song_file,): song_dict = parse_data(song_file) assert ( song_dict["4BJqT0PrAfrxzMOxytFOIz"].song_id == "4BJqT0PrAfrxzMOxytFOIz" ) assert song_dict["4BJqT0PrAfrxzMOxytFOIz"].song_name == ( "Piano Concerto No. 3 in D Minor" ) assert ( song_dict["4BJqT0PrAfrxzMOxytFOIz"].artist_name == "Sergei Rachmaninoff & James Levine & Berliner Philharmoniker" ) assert song_dict["4BJqT0PrAfrxzMOxytFOIz"].year == "1921" assert song_dict["4BJqT0PrAfrxzMOxytFOIz"].popularity == "4"
test_popularity_recommender Example
For testing the popularity recommender, we are analyzing the dictionary created after fake files develops the associated list of artists and songs. The assert statements ensure that the average popularity is returned and the number of songs provided back to the users are based on their preferred number of songs. Additionally, we want to make sure the user gets the summary stats on the number of songs and artists within a given file.
Example usage:
from fake_files import fake_files from popularity_recommender import * def test_calculate_artist_avg_popularity() -> None: # set up """Test parse_data function.""" with fake_files( [ [ "id", "name", "artists", "year", "popularity", ], [ "4BJqT0PrAfrxzMOxytFOIz", "Pragya is a genius", "Dingkun Yang", "1921", "4", ],... ] ) as (song_file,): # run s = SongRecommendationSystem(song_file) # assert assert s.calculate_artist_avg_popularity() == { "Sergei Rachmaninoff": 5, "James Levine": 5, "Berliner Philharmoniker": 5, "Dingkun Yang": 3, "Pragya R": 3, "Sukhpreet S": 6, } assert s.recommend_songs(2) == [ "Dingkun is a genius", "Piano Concerto No. 3 in C Major", ] assert ( s.__str__() == "SongRecommendationSystem with 5 songs and 6 artists." )
As we look to build out this song recommendation system and make it more comprehensive, we will incorporate the following recommendation suggestion abilities into the system: By Artist, and By Distinguishing musical characteristics of the song
To build on our exisiting recommendation system, we will incorporate the following 4 steps into this project:
- Building out a song similarity recommender based on artist
- Testing all components
- Combining into a single song recommendation system
Below is an illustrative flow chart depicting how our future work will be integrating into our existing system: