aws_de_project repo
This course provides an overview on using various AWS services for data engineering. The course is divided into 6 weeks, each covering a different aspect of data engineering on AWS. Each weeks consists of series of content, labs and homework questions. The course is designed to be completed in 6 weeks, with 8-10 hours of effort per week.
The following prequisites are required to complete the course:
- An AWS account
- Basic knowledge of Python
- Basic knowledge of SQL
- Terraform basics
Objective: Understanding the fundamentals of data ingestion and how to use AWS S3 for storing and retrieving data.
Objective: Understanding the concept of data warehousing and how to use AWS Redshift for data analysis.
Objective: To understand the need for orchestration in managing complex workflows and gain hands-on experience with an orchestration tool.
Objective: Understanding the concept of Analytics Engineering and how to use dbt and AWS ECS for data transformation.
Objective: Understanding the concept of batch processing and how to use AWS EMR for large scale data processing.
Objective: Understanding the concept of stream processing and how to use Apache Kafka on AWS for real-time data processing.
I've created this course as I've enjoyed the Datatalks.Club Data Engineering Zoomcamp but did not find one that focused primarily on AWS services.
The student will be using the NYC Taxi Trip dataset for the labs and homework questions. The dataset can be downloaded from the link for the DataTalksClub zoomcamp itself or from the link below:
https://github.com/DataTalksClub/nyc-tlc-data/releases/tag/yellow