[WIP] An automated ML pipeline to forecast hourly electricity demand for the PJM balancing authority.
The goal of this project is to demonstrate:
- ML Ops
- ML workflow orchestration.
- Versioning:
- Dataset versioning.
- Model versioning and experiment tracking.
- ETL Pipeline.
- Reliability:
- Pipeline performance visibility and alerting. (WIP)
- Automated unit tests (WIP)
- Isolation between development and deployed/production environment infrastructure.
- Performance comparisons between (model, version)s and a non-ML baseline.
- Hyperparameter tuning.
- Online prediction service
- ML
- Timeseries feature engineering.
- Timeseries forecasting with XGBoost.
- Timeseries cross validation.
- Electricity demand timeseries: EIA Open Data's hourly electricity demand data.
- Weather data: Open-meteo
- Holiday calendar: Calendarific
- Dev Env: Docker Compose
- ML Workflow Orchestration: Prefect
- Experiment tracking: mlflow
- Model registry: mlflow
- Dataset version tracking and 'Data Warehouse': DVC (+ git repo)
- Runtime data validation: Great Expectations
- Model: XGBoost
- Prod Env: AWS Copilot-managed containers on Fargate.
Prefect Flow deployment:
prefect deploy --name DEPLOYMENT_NAME --prefect-file flows/deployments/FLOW_DEPLOYMENT_CONFIG.yaml
- Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. "O'Reilly Media, Inc.", 2022.
- Rob Mulla's Kaggle tutorial on timeseries forecasting with XGBoost
- Prefect ECS Workers