forecastframe
generates interpretable forecasts using best-in-class feature-engineering, modeling, and validation strategies. It's designed to abstract away hierarchical relationships (e.g., [[Country ->
State -> Store], [Category -> Brand -> Product]]) and common time-series issues
so that you can focus on feature creation, model interpretation, and delivery.
See this notebook for an exploratory example.
- Feature engineering, modeling, and interpretation algorithms inspired by the M5 competition.
- Intuitive, inheritable class design simplifies complicated operations (e.g., rolling cross-validation without leakage, model ensembling, etc.).
- Built for speed and scale, taking advantage of asynch components, generators, and distributed frameworks like mxnet and Ray to run quickly and efficiently on billion-row datasets.
- Base classes
- pandas ✅
- mxnet
- Ray
- Intake
- Drag n' drop ✅
- AWS, BigQuery, and Azure connectors
- Preprocessing
- Scaling
- Logp1 ✅
- Standardization ✅
- Normalization ✅
- Encodings
- Categorical encodings ✅
- One-hot encodings
- NLP features
- Computer vision features
- Scaling
- Automated Feature Engineering
- Seasonality
- Seasonality features (day, week, monthyear, etc.) ✅
- Seasonality features with added Gaussian noise
- Statistical Features
- Lagged (shifted) features ✅
- Rolling, shifted aggregations (mean, median, max, min, skew, etc.) with momentums and rolling percentages✅
- Exponential moving averages with crossovers ✅
- Percent changes ✅
- Percent of features over some threshold in a rolling window (e.g., percent of weeks with non-zero sales per month) ✅
- Quantiles
- Kurtosis features
- Retail Features
- New product flags (days since first purchase) ✅
- High and low velocity flags
- Recency, frequency, and monetary Value (RFM) features
- Flag if not sold up to current day
- Out-of-stock flags
- External Features
- Demographics ✅
- Holidays
- Sporting events (e.g., number of events on a given day, time until next event, etc.)
- Weather
- Structural breaks
- CUSUM tests
- Explosiveness tests
- Right-tail unit-root tests
- Sub/super-martingale tests
- Submodel features
- Kalman filter predictions
- FB Prophet predictions
- ARIMA / ARMA predictions
- Pareto-NBD predictions and parameters
- Pareto-GGG predictions and parameters
- Seasonality
- Modeling
- Parameter Tuning
- Grid Search ✅
- Random Search ✅
- Bayesian Optimization
- Modeling Libraries (with smart defaults and abstractions to make confidence intervals easy)
- LightGBM (regression, tweedie, and quantile regressors)✅
- XGBoost
- Random Forest
- sklearn Random Forest and GBM
- Catboost
- Prophet
- Pareto NBD and other Bayesian MMs
- Model fitting behavior
- Ensembling
- Recursive modeling
- Dynamic modeling
- Dynamic / recursive hybrid
- Hurdle modeling
- Abilitiy to ignore certain time periods during modeling
- Parameter Tuning
- Validation Strategies
- Rolling Cross-Validation ✅
- Sliding-Window Cross-Validation
- Purged K-Fold Cross-Validation
- Combinatorial Purged Cross-Validation
- Interpretation & Visualization
- Error Comparisons
- Predictions vs. Actuals Curves ✅
- Table of error metrics by fold
- Visualizing error metrics by fold
- Model interpretation
- Training and validation curves
- Textual alerts and summaries
- Mean Decrease Accuracy (MDA)
- Mean Decrease Impurity (MDI)
- Single Feature Importance
- SHAP values
- Dependence plots
- Accumulated Local Effects
- Ability to view feature importances by quantile (for quantile regression)
- Partial dependency plots
- ACF plots
- ICE curves
- Data interpretation
- Clustering at different levels using target variable
- Anomaly detection for continuous timeseries
- Error Comparisons
- Forward-Looking Predictions
- Ability to generate a forward-looking dataframe
- Function for running best estimator on forward-looking dataframe
- Ability to ensemble multiple stored models to predict forward-looking dataframe
- Utilities
- Automated downcasting and categorical conversion ✅
- RAM and memory checks ✅
- Ability to save and load fframes ✅
- Filling gaps over time ✅
- Export to database
- Ability to add noise to ratio features
See the latest examples in /examples
$ git clone https://www.github.com/ntlind/forecastframe