Name	Name	Last commit message	Last commit date
parent directory ..
__pycache__	__pycache__
data	data
exp	exp
models	models
scripts	scripts
utils	utils
README.md	README.md
dct_func.py	dct_func.py
losses.py	losses.py
main_informer.py	main_informer.py
requirements.txt	requirements.txt

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (AAAI'21 Best Paper)

This is the origin Pytorch implementation of Informer in the following paper: Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Special thanks to Jieqi Peng@cookieminions for building this repo.

🚩News(Mar 25, 2021): We update all experiment results with hyperparameter settings.

🚩News(Feb 22, 2021): We provide Colab Examples for friendly usage.

🚩News(Feb 8, 2021): Our Informer paper has been awarded AAAI'21 Best Paper [Official][Beihang][Rutgers]! We will continue this line of research and update on this repo. Please star this repo and cite our paper if you find our work is helpful for you.

ProbSparse Attention

The self-attention scores form a long-tail distribution, where the "active" queries lie in the "head" scores and "lazy" queries lie in the "tail" area. We designed the ProbSparse Attention to select the "active" queries rather than the "lazy" queries. The ProbSparse Attention with Top-u queries forms a sparse Transformer by the probability distribution. Why not use Top-u keys? The self-attention layer's output is the re-represent of input. It is formulated as a weighted combination of values w.r.t. the score of dot-product pairs. The top queries with full keys encourage a complete re-represent of leading components in the input, and it is equivalent to selecting the "head" scores among all the dot-product pairs. If we choose Top-u keys, the full keys just preserve the trivial sum of values within the "long tail" scores but wreck the leading components' re-represent.

Requirements

Python 3.6
matplotlib == 3.1.1
numpy == 1.19.4
pandas == 0.25.1
scikit_learn == 0.21.3
torch == 1.8.0

Datasets

You can download all the datasets from Autoformer. Put all the csv files in the folder ./data.

# ETTh1
python -u main_informer.py --model informer --data ETTh1 --attn prob --freq h

# ETTh2
python -u main_informer.py --model informer --data ETTh2 --attn prob --freq h

# ETTm1
python -u main_informer.py --model informer --data ETTm1 --attn prob --freq t

More parameter information please refer to main_informer.py.

We provide a more detailed and complete command description for training and testing the model:

python -u main_informer.py --model <model> --data <data>
--root_path <root_path> --data_path <data_path> --features <features>
--target <target> --freq <freq> --checkpoints <checkpoints>
--seq_len <seq_len> --label_len <label_len> --pred_len <pred_len>
--enc_in <enc_in> --dec_in <dec_in> --c_out <c_out> --d_model <d_model>
--n_heads <n_heads> --e_layers <e_layers> --d_layers <d_layers>
--s_layers <s_layers> --d_ff <d_ff> --factor <factor> --padding <padding>
--distil --dropout <dropout> --attn <attn> --embed <embed> --activation <activation>
--output_attention --do_predict --mix --cols <cols> --itr <itr>
--num_workers <num_workers> --train_epochs <train_epochs>
--batch_size <batch_size> --patience <patience> --des <des>
--learning_rate <learning_rate> --loss <loss> --lradj <lradj>
--use_amp --inverse --use_gpu <use_gpu> --gpu <gpu> --use_multi_gpu --devices <devices>

The detailed descriptions about the arguments are as following:

Parameter name	Description of parameter
model	The model of experiment. This can be set to `informer`, `informerstack`, `informerlight(TBD)`
data	The dataset name ( Can be set to: `ETTh1`, `ETTh2`, `ETTm1`, `ETTm2`, `electricity`,`illness`,`exchange_rate`,`weather`)
root_path	The root path of the data file (defaults to `./data/`)
data_path	The data file name (defaults to `ETTh1.csv`)
features	The forecasting task (defaults to `M`). This can be set to `M`,`S`,`MS` (M : multivariate predict multivariate, S : univariate predict univariate, MS : multivariate predict univariate)
target	Target feature in S or MS task (defaults to `OT`)
freq	Freq for time features encoding (defaults to `h`). This can be set to `s`,`t`,`h`,`d`,`b`,`w`,`m` (s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly).You can also use more detailed freq like 15min or 3h
checkpoints	Location of model checkpoints (defaults to `./checkpoints/`)
seq_len	Input sequence length of Informer encoder (defaults to 96)
label_len	Start token length of Informer decoder (defaults to 48)
pred_len	Prediction sequence length (defaults to 24)
enc_in	Encoder input size (defaults to 7)
dec_in	Decoder input size (defaults to 7)
c_out	Output size (defaults to 7)
d_model	Dimension of model (defaults to 512)
n_heads	Num of heads (defaults to 8)
e_layers	Num of encoder layers (defaults to 2)
d_layers	Num of decoder layers (defaults to 1)
s_layers	Num of stack encoder layers (defaults to `3,2,1`)
d_ff	Dimension of fcn (defaults to 2048)
factor	Probsparse attn factor (defaults to 5)
padding	Padding type(defaults to 0).
distil	Whether to use distilling in encoder, using this argument means not using distilling (defaults to `True`)
dropout	The probability of dropout (defaults to 0.05)
attn	Attention used in encoder (defaults to `prob`). This can be set to `prob` (informer), `full` (transformer)
embed	Time features encoding (defaults to `timeF`). This can be set to `timeF`, `fixed`, `learned`
activation	Activation function (defaults to `gelu`)
output_attention	Whether to output attention in encoder, using this argument means outputing attention (defaults to `False`)
do_predict	Whether to predict unseen future data, using this argument means making predictions (defaults to `False`)
mix	Whether to use mix attention in generative decoder, using this argument means not using mix attention (defaults to `True`)
cols	Certain cols from the data files as the input features
num_workers	The num_works of Data loader (defaults to 0)
itr	Experiments times (defaults to 2)
train_epochs	Train epochs (defaults to 6)
batch_size	The batch size of training input data (defaults to 32)
patience	Early stopping patience (defaults to 3)
learning_rate	Optimizer learning rate (defaults to 0.0001)
des	Experiment description (defaults to `test`)
loss	Loss function (defaults to `mse`)
lradj	Ways to adjust the learning rate (defaults to `type1`)
use_amp	Whether to use automatic mixed precision training, using this argument means using amp (defaults to `False`)
inverse	Whether to inverse output data, using this argument means inversing output data (defaults to `False`)
use_gpu	Whether to use gpu (defaults to `True`)
gpu	The gpu no, used for training and inference (defaults to 0)
use_multi_gpu	Whether to use multiple gpus, using this argument means using mulitple gpus (defaults to `False`)
devices	Device ids of multile gpus (defaults to `0,1,2,3`)

FAQ

If you run into a problem like RuntimeError: The size of tensor a (98) must match the size of tensor b (96) at non-singleton dimension 1, you can check torch version or modify code about Conv1d of TokenEmbedding in models/embed.py as the way of circular padding mode in Conv1d changed in different torch versions.

Citation

If you find this repository useful in your research, please consider citing the following paper:

@inproceedings{haoyietal-informer-2021,
  author    = {Haoyi Zhou and
               Shanghang Zhang and
               Jieqi Peng and
               Shuai Zhang and
               Jianxin Li and
               Hui Xiong and
               Wancai Zhang},
  title     = {Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
  booktitle = {The Thirty-Fifth {AAAI} Conference on Artificial Intelligence, {AAAI} 2021, Virtual Conference},
  volume    = {35},
  number    = {12},
  pages     = {11106--11115},
  publisher = {{AAAI} Press},
  year      = {2021},
}

Contact

If you have any questions, feel free to contact Haoyi Zhou through Email ([email protected]) or Github issues. Pull requests are highly welcomed!

Acknowledgments

Thanks for the computing infrastructure provided by Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC). At the same time, thank you all for your attention to this work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Informer

Informer

README.md

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (AAAI'21 Best Paper)

ProbSparse Attention

Requirements

Datasets

FAQ

Citation

Contact

Acknowledgments

Files

Informer

Directory actions

More options

Directory actions

More options

Latest commit

History

Informer

Folders and files

parent directory

README.md

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (AAAI'21 Best Paper)

ProbSparse Attention

Requirements

Datasets

FAQ

Citation

Contact

Acknowledgments