Skip to content
This repository has been archived by the owner on May 27, 2024. It is now read-only.

feat: Add initial implementation of data analysis tools #1

Merged
merged 21 commits into from
Feb 10, 2021

Conversation

AnkitRajSri
Copy link
Contributor

I have added the muser-data-app project to the branch of the same name.

The app currently offers the below functionalities:
• The app is now able to extract data from Spotify API, save a raw CSV file with the timestamp, and perform ETL operation to dump the extracted data in SQL Server table.
• In addition to the data extraction functionality, we have incorporated functionality to build a doc2vec NLP model and train it on the data dumped in the SQL table.
• There is one final functionality to expand the muser data, collected from FireBase, with metadata information collected from Spotify. The app first queries Spotify for an exact match with the muser record (artist, track, and album), and in case there is no exact match, the app utilizes the doc2vec model to predict the most similar match in the SQL table.

We are still working to make the UI more interactive.

@CLAassistant
Copy link

CLAassistant commented Dec 8, 2020

CLA assistant check
All committers have signed the CLA.

@AnkitRajSri AnkitRajSri closed this Dec 8, 2020
Copy link
Member

@barbeau barbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AnkitRajSri! Some comments are in-line below.

@pradeepsalunke after @AnkitRajSri makes changes to the hard-coded paths, could you please try cloning this project and running it from the command line? See the bottom of the PR for instructions how to do this. I want to make sure that anyone can clone this project and run it on their own machine without needing to edit any files (e.g., fixing hard-coded paths).

Also, the project needs a README in the root of the project that explains what the project does and how to run it. Please see https://github.com/CUTR-at-USF/muser-firebase-export for an example.

All the .pyc files should also be ignored and not included in the Git repository because they are generated by the Python compiler. You can add them to the .gitignore, either in the root directory or another one in the sub-directory.

You can make these changes to your local branch on your computer, commit them, and then push to GitHub and they will show up in this PR.

AI/spotifydataextractor.py Show resolved Hide resolved
AI/muserdatabuilder.py Show resolved Hide resolved
AI/models.py Show resolved Hide resolved
AI/models.py Outdated
from gensim.models import doc2vec
import os

os.chdir(r'C:\Users\sriva\Desktop\edu.usf.sas.pal.muser\SpotifyDataExtractor')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path is hard-coded and won't work on someone else's computer. Could you please change to a relative project path that will work across different computers?

AI/__init__.py Outdated Show resolved Hide resolved
app/connectionmanager.py Show resolved Hide resolved
app/view.py Show resolved Hide resolved
app/view.py Outdated Show resolved Hide resolved
config.py Outdated Show resolved Hide resolved
env Outdated Show resolved Hide resolved
@barbeau barbeau mentioned this pull request Dec 8, 2020
@pradeepsalunke
Copy link

pradeepsalunke commented Dec 8, 2020 via email

@barbeau
Copy link
Member

barbeau commented Dec 8, 2020

@AnkitRajSri Either way works - I'd prefer to continue in this pull request since I already left review comments here. So please re-open this and you'll just need to push new commits to the branch on your fork repository at AnkitRajSri to update it - https://github.com/AnkitRajSri/muser-data-analysis/tree/muser-data-app.

@AnkitRajSri
Copy link
Contributor Author

@barbeau Is it okay to continue working from the forked repository on my profile, as I just created it for testing and the pull request was created from that repository by mistake?

@barbeau
Copy link
Member

barbeau commented Dec 8, 2020

s it okay to continue working from the forked repository on my profile, as I just created it for testing and the pull request was created from that repository by mistake?

Yes, that's fine, as long as all the changes are merged back into this project via pull requests.

@AnkitRajSri
Copy link
Contributor Author

Reopening the pull request as per Sean's suggestion.

@AnkitRajSri AnkitRajSri reopened this Dec 11, 2020
@AnkitRajSri
Copy link
Contributor Author

Hey Sean,

I have implemented all the comments, could you review and let me know if it needs any further updates?

Thanks,
Ankit Raj

@barbeau barbeau changed the title Chore: Re-pushing the Muser data project to a fresh branch feat: Add initial implementation of data analysis tools Dec 16, 2020
Copy link
Member

@barbeau barbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AnkitRajSri. Another comment in-line.

Also, the .pyc files are still under version control - you'll need to delete them and commit that.

@pradeepsalunke Were you able to clone this repository and run it locally on your machine without changing any code?

config.py Outdated Show resolved Hide resolved
@pradeepsalunke
Copy link

pradeepsalunke commented Dec 17, 2020 via email

@AnkitRajSri
Copy link
Contributor Author

@barbeau I have deleted the .pyc files from the remote branch, hide the secret key, and added instructions for the config.py file in the README file.

@pradeepsalunke Let me know if you face any issues while testing the application on your local system.

@pradeepsalunke
Copy link

Hi Team,
I have tested the application locally and its working fine with few errors.

@pradeepsalunke
Copy link

  1. Previous message doesn't get hidden even after clicking another button
  2. @AnkitRajSri is taking muser data in the form of a csv that we provided to expedite, it will be better if we can access the data directly from the muser app database, build it with the spotify features and then save as a SQL table.

@pradeepsalunke
Copy link

pradeepsalunke commented Dec 30, 2020 via email

@AnkitRajSri
Copy link
Contributor Author

@barbeau @pradeepsalunke Sorry for the delayed response, I was traveling back to the US and couldn't look into your comment.
@pradeepsalunke thanks for the update, I am looking into the third functionality and will fix it asap.

@barbeau barbeau added this to the MVP milestone Jan 14, 2021
@AnkitRajSri
Copy link
Contributor Author

@barbeau @pradeepsalunke Seems like we were exhausting the Spotify API limit when we were using the muser data builder functionality after harvesting data from Spotify.
I wasn't able to replicate the issue when I tried to run just the data builder functionality.
I, however, have added a sleep condition to handle the API exhaustion, this should take care of the issue, even if we use all the functionalities together.

@barbeau
Copy link
Member

barbeau commented Feb 10, 2021

My understanding is that all the issues are resolved and @AnkitRajSri and @pradeepsalunke are both able to clone this PR and run from scratch on their own machines, with the exception of #4, which doesn't always happen because it's related to an API quota. So, I'm going to go ahead and merge so we can iterate on the work in this PR.

@barbeau barbeau merged commit 04fa9c3 into CUTR-at-USF:main Feb 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants