Data Engineering Zoomcamp

Basics
- What is Kafka
- Internals of Kafka, broker
- Partitoning of Kafka topic
- Replication of Kafka topic
Consumer-producer
Schemas (avro)
Streaming
- Kafka streams
Kafka connect
Alternatives (PubSub/Pulsar)

Duration: 1.5h

Week 7, 8 & 9: Project

Putting everything we learned to practice

Duration: 2-3 weeks

Upcoming buzzwords
- Delta Lake/Lakehouse
- Databricks
- Apache iceberg
- Data mesh
- KSQLDB
- Streaming analytics
- Mlops

Duration: 30 mins

Overview

Architecture diagram

Technologies

Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
- Google Cloud Storage (GCS): Data Lake
- BigQuery: Data Warehouse
Terraform: Infrastructure-as-Code (IaC)
Docker: Containerization
SQL: Data Analysis & Exploration
Airflow: Pipeline Orchestration
dbt: Data Transformation
Spark: Distributed Processing
Kafka: Streaming

Prerequisites

To get most out of this course, you should feel comfortable with coding and command line, and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Instructors

Ankush Khanna (https://linkedin.com/in/ankushkhanna2)
Sejal Vaidya (https://linkedin.com/in/vaidyasejal)
Victoria Perez Mola (https://www.linkedin.com/in/victoriaperezmola/)
Alexey Grigorev (https://linkedin.com/in/agrigorev)

Tools

For this course you'll need to have the following software installed on your computer:

Docker and Docker-Compose
Python 3 (e.g. via Anaconda)
Google Cloud SDK
Terraform

See Week 1 for more details about installing these tools

Questions

Asking questions in Slack

You can ask any questions in the #course-data-engineering channel in DataTalks.Club slack

Please follow these recommendations when asking for help

FAQ

Q: I registered, but haven't received a confirmation email. Is it normal? A: Yes, it's normal. It's not automated. But you will receive an email eventually
Q: At what time of the day will it happen? A: Office hours will happen on Mondays at 17:00 CET. But everything will be recorded, so you can watch it whenever it's convenient for you
Q: Will there be a certificate? A: Yes, if you complete the project
Q: I'm 100% not sure I'll be able to attend. Can I still sign up? A: Yes, please do! You'll receive all the updates and then you can watch the course at your own pace.
Q: Do you plan to run a ML engineering course as well? A: Glad you asked. We do :)

Our friends

Big thanks to other communities for helping us spread the word about the course:

Check them out - they are cool!

Name		Name	Last commit message	Last commit date
Latest commit History 460 Commits
images		images
week_1_basics_n_setup		week_1_basics_n_setup
week_2_data_ingestion		week_2_data_ingestion
week_3_data_warehouse		week_3_data_warehouse
week_4_analytics_engineering		week_4_analytics_engineering
week_5_batch_processing		week_5_batch_processing
week_6_stream_processing		week_6_stream_processing
week_7_project		week_7_project
.gitignore		.gitignore
README.md		README.md
arch_diagram.md		arch_diagram.md
asking-questions.md		asking-questions.md
dataset.md		dataset.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Zoomcamp

Syllabus

Week 1: Introduction & Prerequisites

Week 2: Data ingestion

Week 3: Data Warehouse

Week 4: Analytics engineering

Week 5: Batch processing

Week 6: Streaming

Week 7, 8 & 9: Project

Overview

Architecture diagram

Technologies

Prerequisites

Instructors

Tools

Questions

Asking questions in Slack

FAQ

Our friends

About

Releases

Packages

Languages

will-fong/data-engineering-zoomcamp

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Zoomcamp

Syllabus

Overview

Architecture diagram

Technologies

Prerequisites

Instructors

Tools

Questions

Asking questions in Slack

FAQ

Our friends

About

Resources

Stars

Watchers

Forks

Languages