Skip to content
/ CATS Public

[NeurIPS 2024] Official implementation of the paper "Are Self-Attentions Effective for Time Series Forecasting?"

License

Notifications You must be signed in to change notification settings

dongbeank/CATS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CATS (NeurIPS 2024)

This repository is an official PyTorch implementation of CATS: Are Self-Attentions Effective for Time Series Forecasting?

Key Design of CATS

alt text

⚡ Cross-Attention Only Time Series transformer

CATS removes self-attention and retains only cross-attention in its transformer architecture. This design choice aims to better preserve temporal information in time series forecasting, addressing the potential loss of such information during the embedding process in traditional transformer models.

⚡ Time and Memory Efficiency

CATS achieves improved time and memory efficiency compared to traditional self-attention-based transformers. While self-attention complexity grows quadratically with input length $$(O(L^2))$$, CATS' cross-attention-only approach scales linearly $$(O(LT))$$.

⚡ Enhanced Parameter Sharing

CATS implements extensive parameter sharing across all layers and dimensions for each horizon-dependent query. This approach, including shared projection layers, significantly reduces parameter count and improves computational efficiency in both training and inference phases.

Efficiency of CATS

We conducted extensive experiments to compare CATS with other state-of-the-art models for long input sequences. Our results demonstrate that CATS outperforms existing models in both efficiency and effectiveness.

Performance Across Various Input Lengths

CATS maintains robust performance as input length increases, unlike some complex models that suffer from increased computational burdens.

Efficiency Comparison Table

Handling Significantly Longer Sequences

We pushed CATS further by testing it with significantly longer input sequences (2880 time steps) and compared it to other models using shorter inputs (512 time steps). The results were remarkable:

  • CATS demonstrated better efficiency in terms of parameters, running time, and memory usage, even when processing nearly 5 times more data.
  • It achieved this while maintaining superior forecasting performance.

Efficiency Comparison Graph

Understanding Periodic Patterns with Cross-Attention

To better understand how CATS processes time series data, we visualized its cross-attention mechanisms. We used a simple time series composed of two independent signals with different periodicities (where $\tau = 24$, $S = 8$, and $k = 5$).

These maps reveal CATS' ability to capture both shocks and periodicities in the signal:

  • The left score map shows higher attention scores for patches containing shocks in the same direction.
  • The right score map clearly demonstrates the correlation over 24 steps, reflecting the model's capture of signal periodicity.

This visualization confirms CATS' effectiveness in leveraging periodic information for accurate predictions.

Forecasting Results

CATS demonstrates superior performance across most datasets and forecasting horizons. CATS shows competitive results, often achieving the best or second-best scores in various time series forecasting tasks.

Getting Started

Requirements

To set up the environment, follow these steps:

  1. Install Python 3.9
  2. Install the required packages:
pip install -r requirements.txt

Data Preparation

To replicate the experiments in our paper, follow these steps:

  1. Download the dataset from Autoformer.
  2. Create a folder named ./dataset in the root directory of this project.
  3. Place all downloaded files and folders within the ./dataset folder.

Scripts

We provide various scripts for different datasets and input lengths. Here are a couple of examples:

  1. For the ETTm1 dataset with 512 input length:
bash ./scripts/ETTm1_512_input.sh
  1. Specifically, for the Traffic dataset with large input (2880):
bash ./scripts/Traffic_2880_Large_input.sh

You can find more scripts in the ./scripts folder for other datasets and input lengths.

Citation

If you find this repo useful for your research, please cite our paper:

@inproceedings{kim2024self,
  title={Are Self-Attentions Effective for Time Series Forecasting?},
  author={Kim, Dongbin and Park, Jinseong and Lee, Jaewook and Kim, Hoki},
  booktitle={Advances in Neural Information Processing Systems},
  volume={37},
  year={2024}
}

Acknowledgements

We would like to express our appreciation for the following GitHub repositories, which provided valuable code bases and datasets:

Contact

If you have any questions or want to use code, please contact [email protected]

About

[NeurIPS 2024] Official implementation of the paper "Are Self-Attentions Effective for Time Series Forecasting?"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published