Project Summary of data ware house project

Sparkify is a startup company that has successfuly implemented an on-premise Data Warehouse in the past, satisfying all requirements from analytics team.

As time passed, more users joined Sparkify's streaming music service, leading the company to a more complex analytics and IT infrastructure.

The purpose of this project is to take the company to a whole new level where analyzing a massive amount of data is rapid and simple, and to enable worry-free of infrastructure scalability. Cloud Data Warehouse

In order to embrace Sparkify's growth, the data engineering team reassessed the entire data analytics environment and came up with a new design for Cloud Data Warehouse.

Data:

data/song_data : Contains metadata about a song and the artist of that song;
data/log_data : Consists of log files generated by the streaming app based on the songs in the dataset above;

Example data: Song data is present in. EG: song_data/A/B/C/TRABCEI128F424C983.json song_data/A/A/B/TRAABJL12903CDCF1A.json

Logs at. Eg: log_data/2018/11/2018-11-12-events.json log_data/2018/11/2018-11-13-events.json

sample data

{"num_songs": 1, "artist_id": "ARJIE2Y1187B994AB7", "artist_latitude": null, "artist_longitude": null, "artist_location": "", "artist_name": "Line Renaud", "song_id": "SOUPIRU12A6D4FA1E1", "title": "Der Kleine Dompfaff", "duration": 152.92036, "year": 0}

Scripts Usage:

etl.py: Responsible for the entire data flow pipeline that will execute and extract data from JSON source files, transform data and load into Redshift tables;
create_table.py : Database and tables creation
sql_queries.py : Contains the creation table DDL and inserts DML scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
create_tables.py		create_tables.py
etl.py		etl.py
sql_queries.py		sql_queries.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Summary of data ware house project

Data:

Scripts Usage:

About

Releases

Packages

Languages

Narengowda/udacity_data_engineering_datawarehouse

Folders and files

Latest commit

History

Repository files navigation

Project Summary of data ware house project

Data:

Scripts Usage:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages