This data engineering project aims to assist a graduate student (the user) who works part-time, in tracking and visualizing their work hours for better management. While the college provides a website with basic work-hour data, the user decided to create a solution that extracts, stores, and visualizes additional insights.
The primary objectives of this project include:
-
Data Extraction: Develop a process to collect work hours data from the Kronos website using its API.
-
Data Storage: Create and maintain a relational database to securely store the extracted data.
-
Data Transformation: Implement necessary procedures for data cleaning, validation, and standardization.
-
Dashboard Creation: Generate a Streamlit dashboard for visualizing work hours data.
A script or program interacts with the Kronos API to extract work hours data, including date, clock-in time, and clock-out time.
A relational database (Postgres on Docker) is used to store the extracted data, maintaining data integrity.
Procedures are implemented to clean, validate, and standardize the extracted data.
A real-time Streamlit dashboard displays work hours data, providing insights into trends, hours worked, and relevant metrics.
This project offers the following benefits:
- Efficiently tracks work hours.
- Provides data visualization for better insights.
- Facilitates better management of part-time work.
- API: Kronos API to extract work hours data
- JSON: Backed-up JSON file containing work hours data from 2023-2024
- Database: Postgres on Docker
This DAG extracts data from the source API, creates tables in the database, and loads the data into the tables. It is typically run once for every user to extract available data from the API and load it into the database.
This DAG extracts punches data from the source API and updates the database with new data for every user on a daily basis.
- Dashboard: Streamlit dashboard to visualize data from the database.
- Dashboard is updated in real-time with data from the database.
- Dashboard initial view:
The dashboard initially shows fake generated data, but the user can log in using credentials to view their data. (Currently, the dashboard is not deployed on a server, so the user can only view the dashboard locally after running the catchup DAG with their credentials)
To run the project, simply execute docker-compose up
.