A Word-Trend Business Intelligence Dashboard That Provides Actionable Business Insights.
Business owners don’t have time to decode what people are saying about their business online - they just want to know what to improve - so our goal for Tally AI was to provide actionable suggestions to help businesses grow profit.
The app is currently piloting its functionality on hundreds of cafes and restaurants around the Phoenix, AZ area.
Tally is a one-stop snapshot for understanding your businesses' Yelp reviews.
"Data analytics is not just for big corporations.
Your small business can stay on top of an ever changing marketplace
with the power of Tally."
Wenjing Liu | Lily Su | Rohan Kulkarni |
---|---|---|
Patrick Stevenson | Steve Renner | Rohan Kulkarni | David Downes |
---|---|---|---|
Elizabeth Ter Sahakyan | Colton Mortensen |
---|---|
I might look into training my staff on customer service etiquette
since people are complaining about the service.
I'm relieved that my half-price bottle service is getting
buzz from the word trend chart.
Seeing a snapshot of trending phrases from my competitors
have made me realize that I might think about introducing happy hour.
This is a Django app for data science micro service,
... ... ... locally running on Windows 10, deployed on AWS Elastic Beanstalk.
【Tally AI Front End】 for work with Front End UI Design
【Tally AI Back End】 for additional repos regarding authentification
【Tally AI Documentation】 for technical details on our project.
【AWS EB deployment logs】 for logs of our AWS Elastic Beanstalk Deployments
【All SQLs used in this project】 for useful SQL queries we used
【A D3.js line chart】 for exploratory data visualization work prior to migrating to Recharts
React, Material UI, Recharts, Python, Django, Postgres, AWS
Spacy, Textrank, Scattertext
- Yelp Dataset Stored in an AWS RDS Database
- Release Canvas 1 & 2 January 14, 2020
- Release Canvas 3 February 3, 2020
Exploratory Data Analysis Yelp Dataset
NLP - BERT, word vectors, sentence vectors
Calculating Word Frequency Correlations with Ratings
NLP - Spacy Named Entity Recognition POS Tagging Exploration
Finding Context in Words Correlated with Highest and Lowest Ratings
Refactored Context in Words Correlated with Highest and Lowest Ratings
WordNet and Vader Sentiment Explorations
LDA Topic Modeling Explorations
Web Scraped Endpoints Returns 10 positive and 10 negative word phrases associated with a business http://django-tally-dev.n9ntucwqks.us-west-2.elasticbeanstalk.com/yelp/jga_2HO_j4I7tSYf5cCEnQ?viztype=0
{
viztype0: {
positive: [
{
term: "cool cats",
score: 0.08981400595659608
},
{
term: "rescued cats",
score: 0.08956279306536073
}
],
negative: [
{
term: "just bad business",
score: 0.0442848147595502
},
{
term: "a refund",
score: 0.03511932390225489
}
]
},
Cumulative average of review star ratings for the past 8 weeks vs the average rating per week . timespan 8 weeks e.g. 8 weeks ago: 1,1,1,1,1, weekly_avg_rating=1, cumulative_avg_rating=1 7 weeks ago: 2,2,2,2,2, weekly_avg_rating=2, cumulative_avg_rating=1.5 6 weeks ago: 3,3,3,3,3, weekly_avg_rating=3, cumulative_avg_rating=2 http://django-tally-dev.n9ntucwqks.us-west-2.elasticbeanstalk.com/yelp/jga_2HO_j4I7tSYf5cCEnQ?viztype=0
[
{
date: '2020-01-10’,
cumulative_avg_rating: 3,
weekly_avg_rating: 2
},
{
date: 'Date 2',
cumulative_avg_rating: 4,
weekly_avg_rating: 3
}
]
Endpoints Looking Through Yelp Dataset Returns “Trending” word phrases and their comparative fluctuations over segments of time. http://django-tally-dev.n9ntucwqks.us-west-2.elasticbeanstalk.com/yelp/jga_2HO_j4I7tSYf5cCEnQ?viztype=1
[
{
date: 'string with date',
data: [ { phrase: "phrase 1", rank: 1},
{ phrase: "phrase 2", rank: 1},
{ phrase: "phrase 3", rank: 1} ]
},
{
date: 'string with date',
data: [ { phrase: "phrase 1", rank: 2},
{ phrase: "phrase 2", rank: 2},
{ phrase: "phrase 3", rank: 1.5} ]
},
{
date: 'string with date',
data: [ { phrase: "phrase 1", rank: 2},
{ phrase: "phrase 2", rank: 4},
{ phrase: "phrase 3", rank: 2} ]
},
]
Review frequency - shows change in number of reviews over time http://django-tally-dev.n9ntucwqks.us-west-2.elasticbeanstalk.com/yelp/jga_2HO_j4I7tSYf5cCEnQ?viztype=2
[{"date": "2017-8-31", "reviews": 4}, {"date": "2017-12-31", "reviews": 2},
{"date": "2018-1-31", "reviews": 1}, {"date": "2018-2-28", "reviews": 2},
{"date": "2018-3-31", "reviews": 1}, {"date": "2018-4-30", "reviews": 4},
{"date": "2018-5-31", "reviews": 2}, {"date": "2018-6-30", "reviews": 1},
{"date": "2018-7-31", "reviews": 3}, {"date": "2018-8-31", "reviews": 1},
{"date": "2018-9-30", "reviews": 1}, {"date": "2018-11-30", "reviews": 1}]
【Testing URLs】
【Testing data documents】
【Testing script Colab】
Miniconda3 or Anaconda3 Python 3.7 【Logs】
(If you are using Python 3.6 or manage your enviroments in some other way, skip this step.)
$ conda create -n python3.6 python=3.6
$ pip install pipenv
$ conda activate python3.6
(base) PS D:\github\django-tally>
$ pipenv install
$ pipenv shell
Install dependencies:
(If you have downloaded the repo, you can skip this step.)
$ pipenv install django psycopg2-binary djangorestframework pyyaml lxml "spacy>=2.0.0,<3.0.0" https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz pytextrank "apscheduler>=3.6.3" django-apscheduler gensim sklearn
Generate requirements.txt
$ pip freeze > requirements.txt
Or $ pip freeze | Out-File -Encoding UTF8 requirements.txt
In the requirements.txt
file, remove entries for spacy and en_core_web_sm, and add the following lines.
spacy>=2.0.0,<3.0.0
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz
$ python manage.py runserver
$ python manage.py makemigrations
$ python manage.py migrate
$ python manage.py test --keepdb
$ python manage.py inspectdb > models.py
$ python manage.py collectstatic
$ python -m django --version
During the deployment, you may need to use the following AWS CLI commands.
$ eb init -p python-3.6 django-tally
$ eb create django-tally
$ eb status
$ eb deploy
$ eb open
$ eb logs
$ eb config
$ eb terminate django-tally
$ aws elasticbeanstalk restart-app-server --environment-name django-tally
- Install the AWS CLI version 1 Using Python and pip on Windows
- Configuring the AWS CLI
- Install Python, pip, and the EB CLI on Windows
- AWS eb config
【Logs】
(base) PS C:\Users\guido> aws2 --version
aws-cli/2.0.0dev3 Python/3.7.5 Windows/10 botocore/2.0.0dev2
(base) PS C:\Users\guido> python --version
Python 3.7.4
(base) PS C:\Users\guido> aws --version
File association not found for extension .py
aws-cli/1.17.5 Python/3.7.4 Windows/10 botocore/1.13.50
(base) PS C:\Users\guido> aws2 --version
aws-cli/2.0.0dev3 Python/3.7.5 Windows/10 botocore/2.0.0dev2
(base) PS C:\Users\guido> eb --version
EB CLI 3.17.0 (Python 3.7.4)
(django-tally-QTYVOJb0) (python3.6) D:\github\django-tally>python manage.py collectstatic
163 static files copied to 'D:\github\django-tally\static'.
【AWS Elastic Beanstalk Configuration】
All Applications
-> django-tally
-> Configuration
-> Software
-> Change
:
Set WSGIPath
= tally/wsgi.py
Set system environment variables here too
http://127.0.0.1:8000/admin
http://127.0.0.1:8000/admin/django_apscheduler/
Below links are for【tesing】.
http://127.0.0.1:8000/yelp/index
https://www.yelp.com/biz/aunt-jakes-new-york
http://127.0.0.1:8000/yelp/aunt-jakes-new-york (by business alias)
http://127.0.0.1:8000/yelp/I2lgw_7DUnwD92ND4PN-Ow?viztype=0 (by business ID)
http://127.0.0.1:8000/yelp/DR22QPe3A52diajwPuooVA?viztype=0
https://www.yelp.com/biz/Iq7NqQD-sESu3vr9iEGuTA (Butters Pancakes & Café)
http://127.0.0.1:8000/yelp/Iq7NqQD-sESu3vr9iEGuTA?viztype=1
https://www.yelp.com/biz/y0GZCNHDbFYr6Rjk3OzgYg (Jarrod's Coffee, Tea & Gallery)
http://127.0.0.1:8000/yelp/y0GZCNHDbFYr6Rjk3OzgYg?viztype=1
You should get trendy phrases such as "beautiful art"
, "art gallery"
, "downtown mesa"
, etc.
http://127.0.0.1:8000/jobs/logs/jga_2HO_j4I7tSYf5cCEnQ?num=20 (view job logs by business ID)
The links below are 【examples】.
http://127.0.0.1:8000/yelp/y0GZCNHDbFYr6Rjk3OzgYg?viztype=1
You should get monthly rating counts like below.
[{"date": "2017-8-31", "reviews": 4}, {"date": "2017-12-31", "reviews": 2},
{"date": "2018-1-31", "reviews": 1}, {"date": "2018-2-28", "reviews": 2},
{"date": "2018-3-31", "reviews": 1}, {"date": "2018-4-30", "reviews": 4},
{"date": "2018-5-31", "reviews": 2}, {"date": "2018-6-30", "reviews": 1},
{"date": "2018-7-31", "reviews": 3}, {"date": "2018-8-31", "reviews": 1},
{"date": "2018-9-30", "reviews": 1}, {"date": "2018-11-30", "reviews": 1}]
http://127.0.0.1:8000/bucketlists (create)
http://127.0.0.1:8000/bucketlists/1 (get, put, delete)
http://127.0.0.1:8000/jobs/example (APScheduler background job)
【Testing URLs】
【Testing data documents】
【Testing script Colab】
【Example】
$ cd C:\Users\guido\.virtualenvs\django-tally-QTYVOJb0\Scripts\
$ python django-admin.py startproject tally D:\github\django-tally
project name: tally
project created in directory: D:\github\django-tally
$ cd path/to/django-tally
$ python manage.py runserver
【Logs】
Watching for file changes with StatReloader
Performing system checks...
System check identified no issues (0 silenced).
You have 17 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.
January 07, 2020 - 01:05:29
Django version 3.0.2, using settings 'tally.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.
[07/Jan/2020 01:05:55] "GET / HTTP/1.1" 200 16351
[07/Jan/2020 01:05:55] "GET /static/admin/css/fonts.css HTTP/1.1" 200 423
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Light-webfont.woff HTTP/1.1" 200 85692
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Bold-webfont.woff HTTP/1.1" 200 86184
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Regular-webfont.woff HTTP/1.1" 200 85876
(If you have download the repo, you can skip this step.)
# Internationalization
# https://docs.djangoproject.com/en/3.0/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'US/Central' # 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
- AWS: Python RDS Connect: Connecting to a Database
- AWS: Adding a Database to Your Elastic Beanstalk Environment (Not in use)
In the tally/settings.py
file, edit the database connection configuration.
(If you have download the repo, you can skip this step.)
# Database
# https://docs.djangoproject.com/en/3.0/ref/settings/#databases
import os
if 'RDS_HOSTNAME' in os.environ:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': os.environ['RDS_DB_NAME'],
'USER': os.environ['RDS_USERNAME'],
'PASSWORD': os.environ['RDS_PASSWORD'],
'HOST': os.environ['RDS_HOSTNAME'],
'PORT': os.environ['RDS_PORT'],
'OPTIONS': {
'options': '-c search_path=django'
},
'TEST': {
'ENGINE': 'django.db.backends.sqlite3',
},
}
}
【Local Environment】
Add system environment variables in the Python virtual environment (NO quotation marks).
You can add a .env
file in the django-tally folder, then add the following lines to the file (replace *
with your credentials). Every time when you start the virtual environment, those variables will be set automatically. (Please make sure that in the .gitignore
file .env
has been added, or you are exposing the credentials to the Internet.)
RDS_DB_NAME=*
RDS_USERNAME=*
RDS_PASSWORD=*
RDS_HOSTNAME=*
RDS_PORT=*
【Manually】
Or you can manually add it every time after you start the virtual environment.
For Windows Powershell, use set VARNAME=value
.
For MacOS/Linux use export VARNAME=value
.
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_DB_NAME=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_USERNAME=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_PASSWORD=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_HOSTNAME=*.*.us-east-2.rds.amazonaws.com
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_PORT=*
【Verification】
To make sure the variables are properly created, type python
then print out os.environ[<varname>]
.
(django-tally-QTYVOJb0) (base) D:\github\django-tally>python
Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Warning:
This Python interpreter is in a conda environment, but the environment has
not been activated. Libraries may fail to load. To activate this environment
please see https://conda.io/activation
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['RDS_DB_NAME']
【Deployment】
To configure the instance deployed on AWS Elastic Beanstalk.
Go to the application Configuration
page, choose Software
.
Add system environment variables there.
- Grant permissions to the database users accordingly.
- For all the SQLs used in this project, refer to these documents.
If you have downloaded this repo, you can skip this step.
$ cd path/to/django-tally
$ python manage.py migrate
【Logs】
Operations to perform:
Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying admin.0002_logentry_remove_auto_add... OK
Applying admin.0003_logentry_add_action_flag_choices... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying auth.0007_alter_validators_add_error_messages... OK
Applying auth.0008_alter_user_username_max_length... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying sessions.0001_initial... OK
Django migration will create tables automatically in the database.
$ cd path/to/django-tally
$ python manage.py createsuperuser
【Logs】
Username (leave blank to use 'guido'): ***
Email address: [email protected]
Password:
Password (again):
This password is too short. It must contain at least 8 characters.
This password is too common.
This password is entirely numeric.
Bypass password validation and create user anyway? [y/N]: n
Password:
Password (again):
Superuser created successfully.
(If you have downloaded the repo, you can skip this step.)
PS D:\github\django-tally>
# D:\github\django-tally\tally\settings.py
...
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework', # Add this line; other app names are not allowed
'example', # Add this line; you can use app names other than "example"
'yelp', # Add this app as well for this project
]
Create an app called "example".
$ python manage.py startapp example
Setting up URL patterns
E.g. regular expression match UUID as primary key (?P<pk>[0-9a-f-]+)
:
urlpatterns = {
url(r'^yelp/$',
YelpYelpScrapingCreateView.as_view(), name="create"),
url(r'^yelp/(?P<pk>[0-9a-f-]+)/$',
YelpYelpScrapingDetailsView.as_view(), name="details"),
}
E.g. query strings
urlpatterns = {path('<slug:business_id>', home, name='home')}
def home(request, business_id):
viztype = request.GET.get('viztype')
if viztype == '1':
result = json.dumps(yelpTrendyPhrases(business_id))
elif viztype == '2':
result = json.dumps(yelpReviewCountMonthly(business_id))
else:
result = json.dumps(getDataViztype0(business_id))
return HttpResponse(result)
Follow this tutorial to build a REST API.
$ python manage.py inspectdb > models.py
After running this command, modify class names in the models.py
file.
Add to every class name. E.g.
For app "example", change class Bucketlist
-> class ExampleBucketlist
For app "yelp", change class Business
-> class YelpBusiness
Follow the instructions in the models.py
file, make sure model definitions are correct.
Then move the models.py
file to the corresponding app folder.
So every app would have their own models without conflicting with other apps.
This is an example of the Django data models created.
https://github.com/Nov05/django-tally/blob/master/example/models.py
You can query with or without Django data models. E.g.
https://github.com/Nov05/django-tally/blob/master/tallylib/sql.py
【Debug】
Issue: Django “ValueError: source code string cannot contain null bytes”
Solution: You can simply create a new .py file, copy and paste the models.py
content to it, then replace the models.py
file with it.
spaCy models
https://spacy.io/usage/models
How to install models
https://pypi.org/project/spacy/
Download spaCy model manually (Not in use)
https://github.com/explosion/spacy-models/releases
You can install spaCy models just like installing a Python package.
pipenv install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz
Then import the models in your code.
import en_core_web_sm
nlp = en_core_web_sm.load()
or
import spacy
nlp = spacy.load("en_core_web_sm")
【Deployment】
Make sure the following 2 lines are in the requirements.txt
.
spacy>=2.0.0,<3.0.0
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz
Make sure remove spacy==2.2.3
and en_core_web_sm==2.25
from the file, or you will get an error when delpoying saying "Could not find a version that satisfies the requirement en-core-web-sm==2.2.5".
【Manually】
Put the following folder in the repo (same level with manage.py
).
spacy.load("en_core_web_sm/en_core_web_sm-2.2.5") with __init__.py
CAUTION: You can do it this way, but deployment from Windows 10 to AWS Elastica Beanstalk might have UnicodeDecodeError when loading a model, while both launching server on Windows 10 locally or deployment from MacOS seem fine.
- Django Documentation
- Python Django Web Framework - Full Course for Beginners Jan 2, 2019
- Django Message Framework
- AWS: Deploying a Django Application to Elastic Beanstalk
- AWS: Configure the EB CLI
- AWS: My security credentials
- Build a REST API with Django – A Test Driven Approach: Part 1
- List of Useful URL Patterns
- Understanding file encoding in VSCode and PowerShell 02/27/2019
- 【Repo】flask-yelp-reviews
- 【Repo】yelp-dataset-challenge
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.
Please note we have a code of conduct. Please follow it in all your interactions with the project.
If you are having an issue with the existing project code, please submit a bug report under the following guidelines:
- Check first to see if your issue has already been reported.
- Check to see if the issue has recently been fixed by attempting to reproduce the issue using the latest master branch in the repository.
- Create a live example of the problem.
- Submit a detailed bug report including your environment & browser, steps to reproduce the issue, actual and expected outcomes, where you believe the issue is originating from, and any potential solutions you have considered.
We would love to hear from you about new features which would improve this app and further the aims of our project. Please provide as much detail and information as possible to show us why you think your new feature should be implemented.
If you have developed a patch, bug fix, or new feature that would improve this app, please submit a pull request. It is best to communicate your ideas with the developers first before investing a great deal of time into a pull request to ensure that it will mesh smoothly with the project.
Remember that this project is licensed under the MIT license, and by submitting a pull request, you agree that your work will be, too.
- Ensure any install or build dependencies are removed before the end of the layer when doing a build.
- Update the README.md with details of changes to the interface, including new plist variables, exposed ports, useful file locations and container parameters.
- Ensure that your code conforms to our existing code conventions and test coverage.
- Include the relevant issue number, if applicable.
- You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.
These contribution guidelines have been adapted from this good-Contributing.md-template.
See Project Documentation for technical details on our project.