Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs for forecasting task #443

Merged
merged 7 commits into from
Jul 6, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 95 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ Copyright (C) 2021 [AutoML Groups Freiburg and Hannover](http://www.automl.org/

While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed **Auto-PyTorch**, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).

Auto-PyTorch is mainly developed to support tabular data (classification, regression).
Auto-PyTorch is mainly developed to support tabular data (classification, regression) and time series data (forecasting).
The newest features in Auto-PyTorch for tabular data are described in the paper ["Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"](https://arxiv.org/abs/2006.13799) (see below for bibtex ref).
Details about Auto-PyTorch for multi-horizontal time series forecasting tasks can be found in the paper ["Efficient Automated Deep Learning for Time Series Forecasting"](https://arxiv.org/abs/2205.05511) (also see below for bibtex ref).

Also, find the documentation [here](https://automl.github.io/Auto-PyTorch/master).

Expand All @@ -27,7 +28,9 @@ In other words, we evaluate the portfolio on a provided data as initial configur
Then API starts the following procedures:
1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
3. **Evaluate baselines**
* ***Tabular dataset*** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
* ***Time Series Forecasting dataset*** : Train a dummy predictor that repeats the last observed value in each series
4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
Expand All @@ -50,6 +53,14 @@ pip install autoPyTorch

```

Auto-PyTorch for Time Series Forecasting requires additional dependencies

```sh

pip install autoPyTorch[forecasting]

```

### Manual Installation

We recommend using Anaconda for developing as follows:
Expand All @@ -70,6 +81,20 @@ python setup.py install

```

Similarly, to install all the dependencies for Auto-PyTorch-TimeSeriesForecasting:


```sh

git submodule update --init --recursive

conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
pip install -e[forecasting]

```

## Examples

In a nutshell:
Expand Down Expand Up @@ -105,6 +130,63 @@ score = api.score(y_pred, y_test)
print("Accuracy score", score)
```

For Time Series Forecasting Tasks
```py

from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask

# data and metric imports
from sktime.datasets import load_longley
targets, features = load_longley()

# define the forecasting horizon
forecasting_horizon = 3

# each series represent an element in the List
dengdifan marked this conversation as resolved.
Show resolved Hide resolved
# we take the last forecasting_horizon as test targets. The itme before that as training targets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you miswrote here. Do you mean item or time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

items... the part before forecasting starts

# Normally the value to be forecasted should follow the training sets
y_train = [targets[: -forecasting_horizon]]
y_test = [targets[-forecasting_horizon:]]

# same for features. For uni-variant models, X_train, X_test can be omitted
X_train = [features[: -forecasting_horizon]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list(features.columns)
X_test = [features[-forecasting_horizon:]]

start_times = [targets.index.to_timestamp()[0]]
freq = '1Y'

# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask()

# Search for an ensemble of machine learning algorithms
api.search(
X_train=X_train,
y_train=y_train,
X_test=X_test,
optimize_metric='mean_MAPE_forecasting',
n_prediction_steps=forecasting_horizon,
memory_limit=16 * 1024, # Currently, forecasting models need much more memories than it actually requires
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to say "use much more memories"?

freq=freq,
start_times=start_times,
func_eval_time_limit_secs=50,
total_walltime_limit=60,
min_num_test_instances=1000, # proxy validation sets. This only works for the tasks with more than 1000 series
known_future_features=known_future_features,
)

# our dataset could directly generate sequences for new datasets
test_sets = api.dataset.generate_test_seqs()

# Calculate test accuracy
y_pred = api.predict(test_sets)
score = api.score(y_pred, y_test)
print("Forecasting score", score)
```

For more examples including customising the search space, parellising the code, etc, checkout the `examples` folder

```sh
Expand Down Expand Up @@ -163,6 +245,17 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT
}
```

```bibtex
@article{deng-ecml22,
author = {Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer},
title = {Efficient Automated Deep Learning for Time Series Forecasting},
year = {2022},
booktitle = {Machine Learning and Knowledge Discovery in Databases. Research Track
- European Conference, {ECML} {PKDD} 2022},
url = {https://doi.org/10.48550/arXiv.2205.05511},
}
```

## Contact

Auto-PyTorch is developed by the [AutoML Groups of the University of Freiburg and Hannover](http://www.automl.org/).
17 changes: 17 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,15 @@ Regression
:members:
:inherited-members: search, refit, predict, score

~~~~~~~~~~~~~~
Time Series Forecasting
~~~~~~~~~~~~~~

.. autoclass:: autoPyTorch.api.time_series_forecasting.TimeSeriesForecastingTask
:members:
:inherited-members: search, refit, predict, score



=========
Pipelines
Expand All @@ -50,6 +59,14 @@ Tabular Regression
.. autoclass:: autoPyTorch.pipeline.traditional_tabular_regression.TraditionalTabularRegressionPipeline
:members:

~~~~~~~~~~~~~~~~~~
Time Series Forecasting
~~~~~~~~~~~~~~~~~~

.. autoclass:: autoPyTorch.pipeline.time_series_forecasting.TimeSeriesForecastingPipeline
:members:


=================
Steps in Pipeline
=================
Expand Down
18 changes: 17 additions & 1 deletion docs/dev.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,23 @@ handle column-reordering.
Note that column-reordering shifts categorical columns to the earlier indices
and it is activated only if one uses a ColumnTransformer.

Similar procedures can be found under time series forecasting tasks:

#. `Feature Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/imputation>`_
#. `Feature scaling <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/scaling>`_
#. `Feature Encoding <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/encoding>`_
#. `Feature preprocessing <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing>`_
#. `Target Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/imputation>`_
#. `Target Preprocessing <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing>`_
#. `Target Scaling <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/setup/forecasting_target_scaling>`_
#. `Loss Types <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/setup>`_
#. `Algorithm setup <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/setup>`_
#. `Training <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/training>`_

Training of individual models
-----------------------------

Auto-PyTorch can fit 3 types of pipelines:
**Auto-PyTorch Tabular** can fit 3 types of pipelines:

#. Dummy pipeline: Use sklearn.dummy to construct an estimator that predicts using simple rules such as most frequent class
#. Traditional machine learning pipelines: Use LightGBM, CatBoost, RandomForest, ExtraTrees, K-Nearest-Neighbors, and SupportVectorMachines
Expand All @@ -78,6 +91,9 @@ and data loaders required to perform the neural architecture search.
After the training (fitting a pipeline), we use pickle to save it
to disk as stated `here <https://scikit-learn.org/stable/modules/model_persistence.html>`_.

**Auto-PyTorch Time Series Forecasting** currently only allows Dummy pipelines and PyTorch neural networks. Traditional machine learning pipelines
will be introduced in the future iteration.
dengdifan marked this conversation as resolved.
Show resolved Hide resolved

Optimization of pipeline
------------------------

Expand Down
16 changes: 16 additions & 0 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ PyPI Installation
.. code:: bash
pip install autoPyTorch

Auto-PyTorch for Time Series Forecasting requires additional dependencies

.. code:: bash
pip install autoPyTorch[forecasting]


Manual Installation
-------------------

Expand All @@ -44,6 +50,16 @@ Manual Installation
cat requirements.txt | xargs -n 1 -L 1 pip install
python setup.py install

Similarly, Auto-PyTorch for time series forecasting requires additional dependencies

.. code:: bash
git submodule update --init --recursive

conda create -n auto-pytorch python=3.8
conda activate auto-pytorch
conda install swig
pip install -e[forecasting]


Docker Image
============
Expand Down
36 changes: 32 additions & 4 deletions docs/manual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,34 @@ Examples
========
* `Classification <examples/20_basics/example_tabular_classification.html>`_
* `Regression <examples/20_basics/example_tabular_regression.html>`_
* `Forecasting <examples/20_basic/example_time_series_forecasting.html>`_
* `Customizing the search space <examples/40_advanced/example_custom_configuration_space.html>`_
* `Changing the resampling strategy <examples/40_advanced/example_resampling_strategy.html>`_
* `Visualizing the results <examples/40_advanced/example_visualization.html>`_

Data validation
===============
For tabular tasks, *Auto-PyTorch* uses a feature and target validator on the input feature set and target set respectively.
For **tabular tasks**, *Auto-PyTorch* uses a feature and target validator on the input feature set and target set respectively.

The feature validator checks whether the data is supported by *Auto-PyTorch* or not. Additionally, a sklearn column transformer
is also used which imputes and ordinally encodes the categorical columns of the dataset. This ensures
that no unseen category is found while fitting the data.

The target validator applies a label encoder on the target column.
The target validator applies a label encoder on the target column.

For **time series forecasting tasks**, besides the functions described above, time series forecasting validators will also
check the information specify for time series forecasting tasks: it checks
* The index of the series that each data point belongs to
* if the dataset is uni-variant (only targets information is contained in the datasets)
* the sample frequency of the datasets
* the static features in the dataset, i.e., features that contain only one value within each series

Time Series forecasting validator then transforms the features and targets into a `pd.DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_
whose index is applied to identify the series that the time step belongs to.

Data Preprocessing
==================
The tabular preprocessing pipeline in *Auto-PyTorch* consists of
The **tabular preprocessing pipeline** in *Auto-PyTorch* consists of

#. `Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/tabular_preprocessing/imputation>`_
#. `Encoding <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/tabular_preprocessing/encoding>`_
Expand All @@ -47,7 +58,24 @@ The tabular preprocessing pipeline in *Auto-PyTorch* consists of
Along with the choices, their corresponding hyperparameters are also tuned. A sklearn ColumnTransformer is
created which includes a categorical pipeline and a numerical pipeline. These pipelines are made up of the
relevant preprocessors chosen in the previous steps. The column transformer is compatible with `torchvision transforms <https://pytorch.org/vision/stable/transforms.html>`_
and is therefore passed to the DataLoader.
and is therefore passed to the DataLoader.

**time series forecasting pipeline** has two sorts of setup:

- Uni-variant model only requires target transformations. They include *1:
#. `Target Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/imputation/>`_
Choice of `linear`, `nearest`, `constant_zero`, `bfill` and `ffill`
- Multi-variant model contains target transformations (see above) and feature transformation. They include
#. `Imputation <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/imputation>`_
Choice of `linear`, `nearest`, `constant_zero`, `bfill` and `ffill`
#. `Scaling <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/scaling>`_
Choice of `standard`, `min_max`, `max_abs`, `mean_abs`, or no transformation *2
#. `Encoding <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/preprocessing/time_series_preprocessing/encoding>`_
Choice of `OneHotEncoder` or no encoding.

*1 Target scaling is considered as part of `setup <https://github.com/automl/Auto-PyTorch/tree/development/autoPyTorch/pipeline/components/setup>`_ and the transform is done within each batch iteration

*2 Scaling is transformed within each series

Resource Allocation
===================
Expand Down