ETL Data Pipeline

Objective

Create a data pipeline that ingests user data via an API, processes and stores it, and then retrieves it in a serialized format.

Components

Data Source: Random API for fake user data
Python & Pandas: For programming and data manipulation.
Redis: Caching recent data for quick access.
Postgres: Long-term data storage.
FastAPI For an API endpoint for data retrieval
Docker: Containerization of the entire pipeline.

Steps

Data Ingestion:
- Python script to fetch data random user data from an API.
- Validate the data before processing.
- Pandas for data cleaning and transformation.
Caching Layer:
- Redis setup for caching recent User data and set a TTL.
- Python logic for data retrieval from Redis and Postgres.
Data Storage:
- Design and implement a Postgres database schema for the user data.
- Make sure PII is hashed before putting into storage
- Store processed data into Postgres.
Data Retrieval:
- API endpoint (e.g., using FastAPI) for data retrieval.
Dockerization:
- Dockerfile for the Python application.
- Docker Compose for orchestrating Redis and Postgres services.
Testing and Deployment:
- Unit tests for pipeline components.

Learning Outcomes

Data pipeline architecture.
Skills in Python, Pandas, Redis, Postgres, FastAPI and Docker.

Further Enhancements

~~Front-end dashboard for data display.~~
Advanced data processing features.

How to test the project

Clone the repo

git clone https://github.com/mrpbennett/etl-pipeline.git

cd into the cloned repo and run docker compose up

docker compose up

Then head over to the URL to access the front end to see where the data is stored

http://120.0.0.1:5173

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
src		src
static		static
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
docker-compose.debug.yml		docker-compose.debug.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Data Pipeline

Objective

Components

Steps

Learning Outcomes

Further Enhancements

How to test the project

⭐ Stargazers

About

Languages

mrpbennett/etl-pipeline

Folders and files

Latest commit

History

Repository files navigation

ETL Data Pipeline

Objective

Components

Steps

Learning Outcomes

Further Enhancements

How to test the project

⭐ Stargazers

About

Topics

Resources

Stars

Watchers

Forks

Languages