automl · ravinkohli · Jul 6, 2022 · Jun 28, 2022 · Jun 28, 2022 · Jun 28, 2022
diff --git a/README.md b/README.md
@@ -4,8 +4,9 @@ Copyright (C) 2021  [AutoML Groups Freiburg and Hannover](http://www.automl.org/
 
 While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed **Auto-PyTorch**, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).
 
-Auto-PyTorch is mainly developed to support tabular data (classification, regression).
+Auto-PyTorch is mainly developed to support tabular data (classification, regression) and time series data (forecasting).
 The newest features in Auto-PyTorch for tabular data are described in the paper ["Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"](https://arxiv.org/abs/2006.13799) (see below for bibtex ref).
+Details about Auto-PyTorch for multi-horizontal time series forecasting tasks can be found in the paper ["Efficient Automated Deep Learning for Time Series Forecasting"](https://arxiv.org/abs/2205.05511) (also see below for bibtex ref).
 
 Also, find the documentation [here](https://automl.github.io/Auto-PyTorch/master).
 
@@ -27,7 +28,9 @@ In other words, we evaluate the portfolio on a provided data as initial configur
 Then API starts the following procedures:
 1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
 2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
-3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
+3. **Evaluate baselines** 
+   * ***Tabular dataset*** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
+   * ***Time Series Forecasting dataset*** : Train a dummy predictor that repeats the last observed value in each series
 4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
     a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
     b. Sample a pipeline hyperparameter configuration *2 by SMAC\
@@ -50,6 +53,14 @@ pip install autoPyTorch
 
 ```
 
+Auto-PyTorch for Time Series Forecasting requires additional dependencies 
+
+```sh
+
+pip install autoPyTorch[forecasting]
+
+```
+
 ### Manual Installation
 
 We recommend using Anaconda for developing as follows:
@@ -70,6 +81,20 @@ python setup.py install
 
 ```
 
+Similarly, to install all the dependencies for Auto-PyTorch-TimeSeriesForecasting:
+
+
+```sh
+
+git submodule update --init --recursive
+
+conda create -n auto-pytorch python=3.8
+conda activate auto-pytorch
+conda install swig
+pip install -e[forecasting]
+
+```
+
 ## Examples
 
 In a nutshell:
@@ -105,6 +130,63 @@ score = api.score(y_pred, y_test)
 print("Accuracy score", score)
 ```
 
+For Time Series Forecasting Tasks
+```py
+
+from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask
+
+# data and metric imports
+from sktime.datasets import load_longley
+targets, features = load_longley()
+
+# define the forecasting horizon
+forecasting_horizon = 3
+
+# each series represent an element in the List
+# we take the last forecasting_horizon  as test targets. The itme before that as training targets
+# Normally the value to be forecasted should follow the training sets
+y_train = [targets[: -forecasting_horizon]]
+y_test = [targets[-forecasting_horizon:]]
+
+# same for features. For uni-variant models, X_train, X_test can be omitted
+X_train = [features[: -forecasting_horizon]]
+# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
+# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
+# we could also omit X_test
+known_future_features = list(features.columns)
+X_test = [features[-forecasting_horizon:]]
+
+start_times = [targets.index.to_timestamp()[0]]
+freq = '1Y'
+
+# initialise Auto-PyTorch api
+api = TimeSeriesForecastingTask()
+
+# Search for an ensemble of machine learning algorithms
+api.search(
+    X_train=X_train,
+    y_train=y_train,
+    X_test=X_test, 
+    optimize_metric='mean_MAPE_forecasting',
+    n_prediction_steps=forecasting_horizon,
+    memory_limit=16 * 1024,  # Currently, forecasting models need much more memories than it actually requires
+    freq=freq,
+    start_times=start_times,
+    func_eval_time_limit_secs=50,
+    total_walltime_limit=60,
+    min_num_test_instances=1000,  # proxy validation sets. This only works for the tasks with more than 1000 series
+    known_future_features=known_future_features,
+)
+
+# our dataset could directly generate sequences for new datasets
+test_sets = api.dataset.generate_test_seqs()
+
+# Calculate test accuracy
+y_pred = api.predict(test_sets)
+score = api.score(y_pred, y_test)
+print("Forecasting score", score)
+```
+
 For more examples including customising the search space, parellising the code, etc, checkout the `examples` folder
 
 ```sh
@@ -163,6 +245,17 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT
 }
 ```
 
+```bibtex
+@article{deng-ecml22,
+  author    = {Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer},
+  title     = {Efficient Automated Deep Learning for Time Series Forecasting},
+  year      = {2022},
+  booktitle = {Machine Learning and Knowledge Discovery in Databases. Research Track
+               - European Conference, {ECML} {PKDD} 2022},
+  url       = {https://doi.org/10.48550/arXiv.2205.05511},
+}
+```
+
 ## Contact
 
 Auto-PyTorch is developed by the [AutoML Groups of the University of Freiburg and Hannover](http://www.automl.org/).
diff --git a/docs/api.rst b/docs/api.rst
@@ -25,6 +25,15 @@ Regression
     :members:
     :inherited-members: search, refit, predict, score
 
+~~~~~~~~~~~~~~
+Time Series Forecasting
+~~~~~~~~~~~~~~
+
+.. autoclass:: autoPyTorch.api.time_series_forecasting.TimeSeriesForecastingTask
+    :members:
+    :inherited-members: search, refit, predict, score
+
+
 
 =========
 Pipelines
@@ -50,6 +59,14 @@ Tabular Regression
 .. autoclass:: autoPyTorch.pipeline.traditional_tabular_regression.TraditionalTabularRegressionPipeline
     :members:
 
+~~~~~~~~~~~~~~~~~~
+Time Series Forecasting
+~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: autoPyTorch.pipeline.time_series_forecasting.TimeSeriesForecastingPipeline
+    :members:
+
+
 =================
 Steps in Pipeline
 =================

diff --git a/docs/dev.rst b/docs/dev.rst
@@ -60,10 +60,23 @@ handle column-reordering.
 Note that column-reordering shifts categorical columns to the earlier indices
 and it is activated only if one uses a ColumnTransformer.
 
+Similar procedures can be found under time series forecasting tasks:
+
+#. `Feature Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/imputation>`_
+#. `Feature scaling <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/scaling>`_
+#. `Feature Encoding <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/encoding>`_
+#. `Feature preprocessing <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing>`_
+#. `Target Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/imputation>`_
+#. `Target Preprocessing <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing>`_
+#. `Target Scaling <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/setup/forecasting_target_scaling>`_
+#. `Loss Types <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/setup>`_
+#. `Algorithm setup <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/setup>`_
+#. `Training <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/training>`_
+
 Training of individual models
 -----------------------------
 
-Auto-PyTorch can fit 3 types of pipelines:
+**Auto-PyTorch Tabular** can fit 3 types of pipelines:
 
 #. Dummy pipeline: Use sklearn.dummy to construct an estimator that predicts using simple rules such as most frequent class
 #. Traditional machine learning pipelines: Use LightGBM, CatBoost, RandomForest, ExtraTrees, K-Nearest-Neighbors, and SupportVectorMachines
@@ -78,6 +91,9 @@ and data loaders required to perform the neural architecture search.
 After the training (fitting a pipeline), we use pickle to save it
 to disk as stated `here <https://scikit-learn.org/stable/modules/model_persistence.html>`_.
 
+**Auto-PyTorch Time Series Forecasting** currently only allows Dummy pipelines and PyTorch neural networks. Traditional machine learning pipelines
+will be introduced in the future iteration.
+
 Optimization of pipeline
 ------------------------
 

diff --git a/docs/installation.rst b/docs/installation.rst
@@ -25,6 +25,12 @@ PyPI Installation
 .. code:: bash
     pip install autoPyTorch
 
+Auto-PyTorch for Time Series Forecasting requires additional dependencies
+
+.. code:: bash
+    pip install autoPyTorch[forecasting]
+
+
 Manual Installation
 -------------------
 
@@ -44,6 +50,16 @@ Manual Installation
     cat requirements.txt | xargs -n 1 -L 1 pip install
     python setup.py install
 
+Similarly, Auto-PyTorch for time series forecasting requires additional dependencies
+
+.. code:: bash
+    git submodule update --init --recursive
+
+    conda create -n auto-pytorch python=3.8
+    conda activate auto-pytorch
+    conda install swig
+    pip install -e[forecasting]
+
 
 Docker Image
 ============

diff --git a/docs/manual.rst b/docs/manual.rst
@@ -18,23 +18,34 @@ Examples
 ========
 * `Classification <examples/20_basics/example_tabular_classification.html>`_
 * `Regression <examples/20_basics/example_tabular_regression.html>`_
+* `Forecasting <examples/20_basic/example_time_series_forecasting.html>`_
 * `Customizing the search space <examples/40_advanced/example_custom_configuration_space.html>`_
 * `Changing the resampling strategy <examples/40_advanced/example_resampling_strategy.html>`_
 * `Visualizing the results <examples/40_advanced/example_visualization.html>`_
 
 Data validation
 ===============
-For tabular tasks, *Auto-PyTorch* uses a feature and target validator on the input feature set and target set respectively. 
+For **tabular tasks**, *Auto-PyTorch* uses a feature and target validator on the input feature set and target set respectively.
 
 The feature validator checks whether the data is supported by *Auto-PyTorch* or not. Additionally, a sklearn column transformer
 is also used which imputes and ordinally encodes the categorical columns of the dataset. This ensures
 that no unseen category is found while fitting the data.
 
-The target validator applies a label encoder on the target column. 
+The target validator applies a label encoder on the target column.
+
+For **time series forecasting tasks**, besides the functions described above, time series forecasting validators will also
+check the information specify for time series forecasting tasks: it checks
+    * The index of the series that each data point belongs to
+    * if the dataset is uni-variant (only targets information is contained in the datasets)
+    * the sample frequency of the datasets
+    * the static features in the dataset, i.e., features that contain only one value within each series
+
+Time Series forecasting validator then transforms the features and targets into a `pd.DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_
+whose index is applied to identify the series that the time step belongs to.
 
 Data Preprocessing
 ==================
-The tabular preprocessing pipeline in *Auto-PyTorch* consists of 
+The **tabular preprocessing pipeline** in *Auto-PyTorch* consists of
 
 #. `Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/tabular_preprocessing/imputation>`_
 #. `Encoding <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/tabular_preprocessing/encoding>`_
@@ -47,7 +58,24 @@ The tabular preprocessing pipeline in *Auto-PyTorch* consists of
 Along with the choices, their corresponding hyperparameters are also tuned. A sklearn ColumnTransformer is
 created which includes a categorical pipeline and a numerical pipeline. These pipelines are made up of the 
 relevant preprocessors chosen in the previous steps. The column transformer is compatible with `torchvision transforms <https://pytorch.org/vision/stable/transforms.html>`_
-and is therefore passed to the DataLoader. 
+and is therefore passed to the DataLoader.
+
+**time series forecasting pipeline** has two sorts of setup:
+
+- Uni-variant model only requires target transformations. They include *1:
+    #. `Target Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/imputation/>`_
+        Choice of `linear`, `nearest`, `constant_zero`, `bfill` and `ffill`
+- Multi-variant model contains target transformations (see above) and feature transformation. They include
+    #. `Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/imputation>`_
+         Choice of `linear`, `nearest`, `constant_zero`, `bfill` and `ffill`
+    #. `Scaling <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/scaling>`_
+        Choice of `standard`, `min_max`, `max_abs`, `mean_abs`, or no transformation *2
+    #. `Encoding <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/encoding>`_
+        Choice of `OneHotEncoder` or no encoding.
+
+*1 Target scaling is considered as part of `setup <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/setup>`_ and the transform is done within each batch iteration
+
+*2 Scaling is transformed within each series
 
 Resource Allocation
 ===================