feat: add moirai benchmark (#271)

Nixtla · Apr 23, 2024 · 9402f93 · 9402f93
1 parent 5c70931
commit 9402f93
Show file tree

Hide file tree

Showing 6 changed files with 620 additions and 0 deletions.
diff --git a/experiments/salesforce-moirai/README.md b/experiments/salesforce-moirai/README.md
@@ -0,0 +1,63 @@
+# Salesforce's Moirai performs great in hourly data and is much faster than Chronos but is still up to 33% less accurate and less efficient than statistical models when considering monthly, weekly, and yearly data
+
+We present a comprehensive, reproducible evaluation demonstrating that a Statistical Ensemble—comprising AutoARIMA, AutoETS, AutoCES, and DynamicOptimizedTheta—substantially surpasses [Salesforce Moirai](https://github.com/SalesforceAIResearch/uni2ts), a foundational model for time series forecasting with over 311 million parameters. The **Statistical Ensemble achieves 33%, 33%, and 15% superior performance in CRPS, MASE, and SMAPE metrics, respectively**, across benchmark datasets including M1, M3, M4, and Tourism. A  **Simple Seasonal Naive achieves 17% and 0.5%, superior performance in MASE, and SMAPE metrics, respectively. However, Morai is 25% more accurate than a seasonal naive in terms of CRPS**.  Benchmark datasets include M1, M3, M4, and Tourism. 
+These datasets cover more than **100,000 unique time series**, offering a robust comparison of the models. Efficiency-wise, **Moirai is 3.5x faster than the Statistical Ensemble but 160x slower than a seasonal naive forecast**, marking a trade-off between speed and accuracy in different forecasting frequencies.
+
+
+# Introduction
+
+Following our recent [benchmark demonstrating Amazon Chronos's lesser accuracy and slower speed compared to classical statistical models](https://github.com/Nixtla/nixtla/tree/main/experiments/amazon-chronos), the community sought a similar analysis for Moirai. We commend the Salesforce AI team for releasing the first fully open-source foundational time series model, complete with weights, data, and code. Morai's accuracy shines with hourly data, a noteworthy achievement we're eager to highlight. Our acknowledgment extends to Salesforce for recognizing our prior contributions to this research field.
+
+Foundational models like Salesforce's Moirai signify a notable advance in time series forecasting, leveraging deep learning and extensive datasets for pre-training to enhance predictions. Despite Moirai's impressive parameter count (311 million) and scope, our findings suggest that traditional forecasting methods grouped into a Statistical Ensemble often outperform in accuracy. This benchmark continues our exploration of statistical versus deep learning models in forecasting.
+
+In our assessment, Salesforece's Moirai shows a more promising path than Amazon Chronos in handling hourly data, hinting at the potential to surpass classical statistical methods eventually.
+
+
+## Empirical Evaluation
+
+Expanding upon our prior work, this study evaluates over 100,000 unique time series from the M1, M3, M4, and Tourism datasets across various frequencies. Our analysis also benchmarks against the Seasonal Naive model, a staple in traditional forecasting methods.
+
+## Results
+
+The **Statistical Ensemble achieves 33%, 33%, and 15% superior performance in CRPS, MASE, and SMAPE metrics, respectively**, across benchmark datasets including M1, M3, M4, and Tourism. A  **Simple Seasonal Naive achieves 17% and 0.5%, superior performance in MASE, and SMAPE metrics, respectively. However, Morai is 25% more accurate than a seasonal naive in terms of CRPS**.
+
+Efficiency-wise, **Moirai is 3.5x faster than the Statistical Ensemble but 160x slower than a seasonal naive forecast**, marking a trade-off between speed and accuracy in different forecasting frequencies.
+
+It is critical to highlight that Morai may possess an unfair advantage over the statistical ensemble due to its training methodology. Specifically, Morai was trained using all the datasets that are currently being used for evaluation. In contrast, the statistical ensemble was not exposed to the test dataset during its training phase.
+
+![image (27)](https://github.com/Nixtla/nixtla-backup/assets/4086186/71cf04f5-a48d-455e-8508-a0c393beed6e)
+
+The complete code to replicate all results is available at [GitHub](https://github.com/Nixtla/nixtla/tree/main/experiments/salesforce-moirai). This study underscores statistical models' continued relevance and superiority in specific scenarios, challenging the assumption that foundational deep-learning models are always the best solution for time series forecasting.
+
+This revision integrates the comparative performance of the Statistical Ensemble and Salesforce's Moirai, highlighting key findings from your data. Please ensure to replace the placeholder for the new table image with an actual image link or embed the table directly if the platform supports LaTeX rendering.
+
+## Reproducibility
+
+To ensure the reproducibility of our findings, the Statistical Ensemble experiments were conducted on an AWS c5a.24xlarge instance, equipped with 96 vCPUs and 192 GiB of RAM. In contrast, the experiments for Salesforce Moirai were carried out on an AWS g5.4xlarge GPU instance, which includes 16 vCPUs, 64 GiB of RAM, and an NVIDIA A10G Tensor Core GPU with 24 GiB. All necessary code and detailed instructions for reproducing the experiments are available in this directory.
+
+### Instructions
+
+1. Set up a Python environment:
+
+```bash
+mamba env create -f environment.yml
+conda activate moirai
+pip install git+https://github.com/SalesforceAIResearch/uni2ts.git
+```
+
+2. Run the experiments as reported in the table:
+
+```bash
+python -m src.main --mode fcst_statsforecast
+python -m src.main --mode fcst_moirai
+```
+
+3. Evaluate the results using:
+
+```bash
+python -m src.main --mode evaluation
+```
+
+### References
+- **Statistical Ensemble Paper**: [A Simple Combination of Univariate Models](https://www.sciencedirect.com/science/article/abs/pii/S0169207019300585?via%3Dihub)
+- **Salesforce Moirai Paper**: [nified Training of Universal Time Series Forecasting Transformers](https://arxiv.org/abs/2402.02592)
diff --git a/experiments/salesforce-moirai/environment.yml b/experiments/salesforce-moirai/environment.yml
@@ -0,0 +1,18 @@
+name: moirai
+channels:
+  - conda-forge
+  - defaults
+  - anaconda
+dependencies:
+  - jupyterlab
+  - pip
+  - python=3.10
+  - pip:
+    - datasetsforecast
+    - fire
+    - huggingface_hub[cli]
+    - neuralforecast
+    - orjson
+    - statsforecast
+    - utilsforecast
+
diff --git a/experiments/salesforce-moirai/src/main.py b/experiments/salesforce-moirai/src/main.py
@@ -0,0 +1,75 @@
+import logging
+import subprocess
+from typing import Literal
+
+import fire
+import pandas as pd
+
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+
+datasets = [
+    "m1_yearly",
+    "m1_quarterly",
+    "m1_monthly",
+    "m3_yearly",
+    "m3_quarterly",
+    "m3_monthly",
+    "m3_other",
+    "m4_yearly",
+    "m4_quarterly",
+    "m4_monthly",
+    "m4_weekly",
+    "m4_daily",
+    "m4_hourly",
+    "tourism_yearly",
+    "tourism_quarterly",
+    "tourism_monthly",
+]
+
+
+def main(mode: Literal["fcst_statsforecast", "fcst_moirai`", "evaluation"]):
+    prefix_process = ["python", "-m"]
+
+    if mode in ["fcst_statsforecast", "fcst_moirai"]:
+        for dataset in datasets:
+            logger.info(f"Forecasting {dataset}...")
+            suffix_process = ["--dataset", dataset]
+
+            def process(middle_process):
+                return prefix_process + middle_process + suffix_process
+
+            if mode == "fcst_statsforecast":
+                logger.info("Running StatisticalEnsemble")
+                subprocess.run(process(["src.statsforecast_pipeline"]))
+            elif mode == "fcst_moirai":
+                logger.info("Running SalesforceMoirai")
+                subprocess.run(process(["src.moirai_pipeline"]))
+    elif mode == "evaluation":
+        from src.utils import ExperimentHandler
+
+        eval_df = []
+        for dataset in datasets:
+            logger.info(f"Evaluating {dataset}...")
+            exp = ExperimentHandler(dataset)
+            try:
+                eval_dataset_df = exp.evaluate_models(
+                    [
+                        "SalesforceMoirai",
+                        "StatisticalEnsemble",
+                        "SeasonalNaive",
+                    ]
+                )
+                print(eval_dataset_df)
+                eval_df.append(eval_dataset_df)
+            except Exception as e:
+                logger.error(e)
+        eval_df = pd.concat(eval_df).reset_index(drop=True)
+        exp.save_dataframe(eval_df, "complete-results.csv")
+    else:
+        raise ValueError(f"mode {mode} not found")
+
+
+if __name__ == "__main__":
+    fire.Fire(main)
diff --git a/experiments/salesforce-moirai/src/moirai_pipeline.py b/experiments/salesforce-moirai/src/moirai_pipeline.py
@@ -0,0 +1,114 @@
+from time import time
+from typing import Iterable, List, Tuple
+
+import fire
+import pandas as pd
+import torch
+from gluonts.dataset import Dataset
+from gluonts.model.forecast import Forecast
+from gluonts.torch.model.predictor import PyTorchPredictor
+from huggingface_hub import hf_hub_download
+from tqdm import tqdm
+from uni2ts.model.moirai import MoiraiForecast
+
+from src.utils import ExperimentHandler
+
+
+def get_morai_predictor(
+    model_size: str,
+    prediction_length: int,
+    target_dim: int,
+    batch_size: int,
+) -> PyTorchPredictor:
+    model = MoiraiForecast.load_from_checkpoint(
+        checkpoint_path=hf_hub_download(
+            repo_id=f"Salesforce/moirai-1.0-R-{model_size}",
+            filename="model.ckpt",
+        ),
+        prediction_length=prediction_length,
+        context_length=200,
+        patch_size="auto",
+        num_samples=100,
+        target_dim=target_dim,
+        feat_dynamic_real_dim=0,
+        past_feat_dynamic_real_dim=0,
+        map_location="cuda:0" if torch.cuda.is_available() else "cpu",
+    )
+
+    predictor = model.create_predictor(batch_size)
+
+    return predictor
+
+
+def gluonts_instance_fcst_to_df(
+    fcst: Forecast,
+    quantiles: List[float],
+    model_name: str,
+) -> pd.DataFrame:
+    point_forecast = fcst.mean
+    h = len(point_forecast)
+    dates = pd.date_range(
+        fcst.start_date.to_timestamp(),
+        freq=fcst.freq,
+        periods=h,
+    )
+    fcst_df = pd.DataFrame(
+        {
+            "ds": dates,
+            "unique_id": fcst.item_id,
+            model_name: point_forecast,
+        }
+    )
+    for q in quantiles:
+        fcst_df[f"{model_name}-q-{q}"] = fcst.quantile(q)
+    return fcst_df
+
+
+def gluonts_fcsts_to_df(
+    fcsts: Iterable[Forecast],
+    quantiles: List[float],
+    model_name: str,
+) -> pd.DataFrame:
+    df = []
+    for fcst in tqdm(fcsts):
+        fcst_df = gluonts_instance_fcst_to_df(fcst, quantiles, model_name)
+        df.append(fcst_df)
+    return pd.concat(df).reset_index(drop=True)
+
+
+def run_moirai(
+    gluonts_dataset: Dataset,
+    model_size: str,
+    horizon: int,
+    target_dim: int,
+    batch_size: int,
+    quantiles: List[float],
+) -> Tuple[pd.DataFrame, float, str]:
+    init_time = time()
+    predictor = get_morai_predictor(model_size, horizon, target_dim, batch_size)
+    fcsts = predictor.predict(gluonts_dataset)
+    model_name = "SalesforceMoirai"
+    fcsts_df = gluonts_fcsts_to_df(
+        fcsts,
+        quantiles=quantiles,
+        model_name=model_name,
+    )
+    total_time = time() - init_time
+    return fcsts_df, total_time, model_name
+
+
+def main(dataset: str):
+    exp = ExperimentHandler(dataset)
+    fcst_df, total_time, model_name = run_moirai(
+        gluonts_dataset=exp.gluonts_train_dataset,
+        model_size="large",
+        horizon=exp.horizon,
+        target_dim=1,
+        batch_size=32,
+        quantiles=exp.quantiles,
+    )
+    exp.save_results(fcst_df, total_time, model_name)
+
+
+if __name__ == "__main__":
+    fire.Fire(main)