Data Engineering Nanodegree

Udacity Nanodegree
Explore the repository»

postgres, cassandra, aws, redshift, s3, emr, spark, airflow, ETL, ELT, data modelling, database schema, data warehousing, data lakes, data engineering, udacity

About The Nanodegree

Data engineers are responsible for making data accessible to all the people who use it across an organization. That could mean creating a data warehouse for the analytics team, building a data pipeline for a front-end application, or summarizing massive datasets to be more user-friendly.

Certificate

Program Details

During this program, we will complete four courses and five projects. Throughout the projects, we will play the part of a data engineer at a music streaming company. We will work with the same type of data in each project, but with increasing data volume, velocity, and complexity. Here’s a course-by- course breakdown.

Course 1 – Data Modeling

In this course, we will learn to create relational and NoSQL data models to fit the diverse needs of data consumers. In the project, we will build SQL (Postgres) and NoSQL (Apache Cassandra) data models using user activity data for a music streaming app.

Associated notebooks for this course can be found here.

Project 1 can be found here.

Project 2 can be found here.

Course 2 – Cloud Data Warehouses

In this course, we will learn to create cloud-based data warehouses. In the project, we will build an ELT pipeline that extracts data from Amazon S3, stages it in Amazon Redshift, and transforms it into a set of dimensional tables.

Associated notebooks for this course can be found here.

Project 3 can be found here.

Course 3 – Data Lakes with Apache Spark

In this course, we will learn more about the big data ecosystem, how to work with massive datasets with Apache Spark, and how to store big data in a data lake. In the project, we will build an ETL pipeline for a data lake using Apache Spark and S3.

Associated notebooks for this course can be found here.

Project 4 can be found here.

Course 4 – Data Pipelines with Apache Airflow

In this course, we will learn to schedule, automate, and monitor data pipelines using Apache Airflow. In the project, they’ll continue your work on the music streaming company’s data infrastructure by creating and automating a set of data pipelines.

Associated notebooks for this course can be found here.

Project 5 can be found here.

Capstone Project

In the Capstone project, we combine Twitter data, World happiness index data and Earth surface temperature data data to explore whether there is any correlation between the above. The Twitter data is dynamic and the other two dataset are static in nature. The general idea of this project is to extract Twitter data, analyze its sentiment and use the resulting data to gain insights with the other datasets.

Capstone Project can be found here.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Vineeth S - [email protected]

Project Link: https://github.com/vineeths96/Data-Engineering-Nanodegree

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Notebook Exercises		Notebook Exercises
Project 1 Data Modeling with PostgreSQL		Project 1 Data Modeling with PostgreSQL
Project 2 Data Modeling with Apache Cassandra		Project 2 Data Modeling with Apache Cassandra
Project 3 Data Warehouse on AWS Redshift		Project 3 Data Warehouse on AWS Redshift
Project 4 Data Lake on AWS S3		Project 4 Data Lake on AWS S3
Project 5 Data Pipelines with Apache Airflow		Project 5 Data Pipelines with Apache Airflow
Project 6 Capstone Project		Project 6 Capstone Project
.gitignore		.gitignore
Certificate.jpg		Certificate.jpg
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Nanodegree

About The Nanodegree

Certificate

Program Details

Course 1 – Data Modeling

Course 2 – Cloud Data Warehouses

Course 3 – Data Lakes with Apache Spark

Course 4 – Data Pipelines with Apache Airflow

Capstone Project

License

Contact

About

Releases

Packages

Languages

License

venqics/Data-Engineering-Nanodegree

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Nanodegree

About The Nanodegree

Certificate

Program Details

Course 1 – Data Modeling

Course 2 – Cloud Data Warehouses

Course 3 – Data Lakes with Apache Spark

Course 4 – Data Pipelines with Apache Airflow

Capstone Project

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages