This repository contains whole codebase developed within the final assigment for Modern Data Analytics [B-KUL-G0Z39B] course at KU Leuven, academic year 2021-2022, by a group of six students. The course itself was a broad introduction to Python for data science, with elements of ML and graph analysis.
Our team was given a topic of "COVID-19 pandemic in US". In specific we were asked to collect data and then develop: visualisations, explanatory data analysis and model-based analysis. We developed complex dashboard using such a tech stack:
- Python 3.9.5
- pip and venv for dependency managment and environment handling
- Apache Airflow for data pipeline
- Apache Parquet for initial storage
- MongoDB (hosted on Atlas Cloud) as database
- Dash + Flask for dashboard and webservice creation
- Plotly for visualisations
- Heroku for hosting our dashboard
Our model-based analysis included:
- univariate time-series forecasting using ETS model
- counterfactual analysis using XGBoost
- causal inference using linear mixed random effects model
The deployed dashboard is no longer available due to limitations of a free subscription bu can be rebuild if needed.