Sakila Company

Sakila Company is a reference project for delivering an analytics platform built solely from open source software for Sakila, a fictional DVD rental company.

Getting Started

$ ./scripts/init.sh
$ docker compose --env-file compose/.env up app-db analytics-dwh dagster-db metabase-db  # Ctrl+c after initialized
$ docker compose --env-file compose/.env up

The initialization script downloads required files and generates credential and environment files for every component.

Background

The author had been working as a data engineer and the first data person in a SaaS company in Thailand during 2022-2024. It was gold rush time for big data management tools and platforms. However, working in a developing country as Thailand, it didn't have profit margin enough to use commercial platforms showcased by developed countries. He had to develop and maintain data pipelines and data quality without the nicety of modern tools, as if it was in the pre-data science era.

He doesn't want data engineering to be blamed for being "Cost Centric" by executives anymore. One of effective ways to reduce cost is to host software by yourself as much as possible, so he came up with the project. Any company that struggles with its data platform operating cost can use this as a reference and adapt to fit its budget and team's knowledge level.

Architecture

This project uses Pagila, a Postgresql adaptation of MySQL Sakila database, as application data.

The project uses ELT pipeline approach, i.e. try to dump data to analytics data storage, such as data warehouse and data lake, as much as possible. The application data is loaded to Clickhouse data warehouse for serving analytics. Dagster stands at the center orchestrating loading, transforming and testing data. Loading and transforming across data storages are powered by dlt library. dbt provides modeling, transforming and testing data in the data warehouse.

Data after normalized into star schema in the data warehouse is served by Metabase as dashboards and analyses.

All of these components are deployed with a monolithic docker compose file.

Future Works

It will be migrated to Kubernetes to ease deployment and scaling to either on-premise servers or cloud providers.

DataHub will be used as a data catalog for teams to look up.

The project will allow adding and modifying application data and will include more variety of data sources, e.g. web/app analytics tracker and open datasets, to mimic day-to-day operations and enhance analyses.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
compose		compose
docs/assets		docs/assets
sakila_etl		sakila_etl
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sakila Company

Getting Started

Background

Architecture

Future Works

About

Releases

Packages

Languages

License

FulcronZ/sakila-company

Folders and files

Latest commit

History

Repository files navigation

Sakila Company

Getting Started

Background

Architecture

Future Works

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages