MSDS: Data Management Final Project

This is the Github repo for Team 12's Final Project for the class, Special Topics in Data Science, aka Database Management class. This final project for that class and brings together the concepts from the class including how to setup a relational and No-SQL datastore, query and configure indices to optimize the performance. The app demos process, analyzing and searching a Twitter feed.

Stack

Backend

The stack is composed of a backend and frontend. The backend data stores are as follows

Redis: in-memory cache used to optimize certain queries

Mongo-DB: a doc-based database used to store the actual tweets. While the test data is given a-priori. The app demos ingestion one record at a time to simulate the streaming nature of the data.

Postgres: used for data that changes infrequently such as user/profile

Frontend & Search App

This is still being discussed and debated on the right option. The "low tech" solution would be ipywidgets in a Jupyter notebook and that is the "simple" fall back. But will also consider a Pythonic framework like Django which supports routing (endpoints) and web templates for a simple mostly static web page. The HTML front end will call the Python middleware which in turn interacts with the data stores.

Other Scripts

Also includes some notebooks for analyzing the data and will include a Python script for processing the input data one record at a time. Once again, simulating streaming data. In order to work with these notebooks, copy the *.ipynb files from the notebooks subdirectory to the input in the runtime directory as specified in the run.sh script.

Beyond the run script, will likely include some builds script for creating images with some database and other dependencies.

Accessing the Application and Services

All apps and services that are part of this stack use the port range, 25xxx where xxx is the used to differentiate among the services. To run the stack, simply cd into services and execute ./run.sh. Follow the prompt which ask for a password for the data stores and runtime directory where the app and database data will live, along with notebooks.

To access the programmatic interface via the integrated Jupyter service, type in http://:25888 in the browser and Jupyter Hub should come up. The notebooks will be in the /work.

The Docker hostname and ports for the respective services are as follows:

Service Host Port Purpose
Jupyter jupyterhub 25888 Programmatic access to data/app via Jupyter notebook

Postgres pgdb 25432 Store static/structured data like users/locations

MongoDB mongodb 25017 Store tweets and other dynamic data with flexible schema

Redis redisdb 25379 Cacheimportant data during ingestion as well as querying

Each service is containerized and the respective CLI tools for each Database can be access by executing docker exec -it bash. Once you see the bash prompt you can follow instructions on how to access the CLI tools for each of the respective DBs. Please use the password you passed in to the run script.

Postgress Shell PSQL: https://www.postgresql.org/docs/current/app-psql.html

Local Mongo On Default Port: https://www.mongodb.com/docs/v4.4/mongo/

Redis CLI: https://redis.io/docs/manual/cli/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
notebooks		notebooks
services		services
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSDS: Data Management Final Project

Stack

Backend

Frontend & Search App

Other Scripts

Accessing the Application and Services

About

Releases

Packages

Languages

Service	Host	Port	Purpose
Jupyter	jupyterhub	25888	Programmatic access to data/app via Jupyter notebook
Postgres	pgdb	25432	Store static/structured data like users/locations
MongoDB	mongodb	25017	Store tweets and other dynamic data with flexible schema
Redis	redisdb	25379	Cacheimportant data during ingestion as well as querying

License

Rahul-Rut/msds-data-mgmt-team12-finalproject

Folders and files

Latest commit

History

Repository files navigation

MSDS: Data Management Final Project

Stack

Backend

Frontend & Search App

Other Scripts

Accessing the Application and Services

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages