Data Pipelines with Airflow

Purpose

Sparkify a music streaming company wants to automate thier ETL pipleline using Apache Airflow

Goal Acheived

Built a Data pipeline that is optimized to support reusable tasks, can be monitored and also allow for backfills, also included are data quality checks which are run against the datasets after ETL steps are completed for data quality assurance.

Data Input

Load data from S3 into Amazon Redshift warehouse
2 S3 data sets are used, song CSV files s3://udacity-dend/song_data
Event data as json files of what users are listening to s3://udacity-dend/log_data.
Perform data quality checks on tables created on redshift

AirFlow Components - Project Structure

Dag: Contains all task to be run
Operators: Operator templates (dedicated tasks)
Helpers: SQL files with different trasformations

AirFlow Tasks

The following tasks will be created in Airflow

StageToRedshiftOperator: For stage data from s3 to tables in redshift
LoadDimensionOperator: For loading data to dimension tables on redshift
LoadFactOperator: For loading data into facts table on redshift
DataQualityOperator : For performing data quality checks.

Expected Airflow Dependency Graph

Below is the Sparkify Airflow dependency graph

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dags		dags
plugins		plugins
README.md		README.md
create_tables.sql		create_tables.sql
example-dag.png		example-dag.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Pipelines with Airflow

Purpose

Goal Acheived

Data Input

AirFlow Components - Project Structure

AirFlow Tasks

Expected Airflow Dependency Graph

About

Releases

Packages

Languages

austin047/udacity-data-eng-data-pipeline-airflow

Folders and files

Latest commit

History

Repository files navigation

Data Pipelines with Airflow

Purpose

Goal Acheived

Data Input

AirFlow Components - Project Structure

AirFlow Tasks

Expected Airflow Dependency Graph

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages