viagogo-personalization

############################################ Installation and Dependencies:

Python 2.7
Dato's GraphLab-Create. Download from here:

https://dato.com/download/[email protected]&key=0CA5-F126-6C7E-263B-A391-32E0-1DEB-8F66&utm_medium=email&utm_source=transactional&utm_campaign=beta_registration_confirmation

Numpy (python)
matplotlib (python)
R
reshape2 (R-library)
MASS (R-library)
RODBC (R-library)
Spotify (python)

############################################# Script Execution Order: Matrix Generation

Execute the R script: R\db-code.R This R script retrieves data from the SQL backend. To change the params (dates, categories etc), modify the SQL script: "query.sql" directly. The R script executes "query.sql" as is without modifications.
Execute the R script: R\process-data.R This R script converts all the retrieved SQL data in the desired formats. Change params in this scripts itself. Most important are the "K" and "M" params which control the size/density of the matrix.
Execute the Python script: python\tickets-gl.py This is the Python script which contains the Graphlab Collab Filtering code. Currently it uses the "item-similarity" algorithm which does not use any "side-features". The code to retrieve "side-data" for the Concert category is included and tested. The call to retrieve that data can be uncommented (as indicated in the script), when needed. The various constants being used by the script are currently at the beginning of the script itself.

############################################# Script Execution Order: Spotify Side-Data

Execute the Python script: python\spotify.py This is the Python script that retrieves meta-data (genre, popularity, similar artists) for each Concert-category in our database where we find a match on the Spotify db. Note that we are using both the spotipy library and the ECHO NEST wrappers which are more robust. Due to API rate-limits, currently there is a sleep period for 5 seconds (check the main() function to modify it). All constants for this file are set in the main() function. This script produces the data files that can be used for side-data by the python\tickets-gl.py script (see above).

############################################# Batch Execution

I have attempted at a Batch execution script to automate the process entirely. However, it can definitely be improved. The script is located under batch\ directory.

############################################# ############################################# CLUSTERING

Script Execution Order: Matrix Generation

Execute the R script: clustering\clustering-data-processing.R This R script retrieves data from the SQL backend. To change the params (dates etc), modify the SQL scripts under the clustering\ directory, directly. The R script executes the SQL scripts as is without modifications.
Execute the Python script: clustering\clustering.py This is the Python script which contains the Graphlab k-Means Clustering code. THe code first reads from the data extracts used in step (1). Next, it puts the data together to be provided to the Graphlab clustering library. It then prints the cluster centers etc. You can change the global variables (including "k") at the head of the file. All starting and intermediate data is under clustering\data\

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
R		R
batch		batch
clustering		clustering
data		data
output		output
python		python
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

viagogo-personalization

About

Releases

Packages

Languages

jehangiramjad/viagogo-personalization

Folders and files

Latest commit

History

Repository files navigation

viagogo-personalization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages