Linear Attention #40

JakobEliasWagner · 2024-02-15T09:27:38Z

Description

Current challenges in using Neural Operators are: irregular meshes, multiple inputs, multiple inputs on different meshes, or multi-scale problems. [1] The Attention mechanism is promising in that regard as it is able to contextualize these different inputs even for different/irregular input locations. However, common implementations of the Attention mechanism posses an overall complexity of O(n²d), which is squared with respect to the length of sequences. [3] This becomes limiting when applying these networks to very big datasets, as is the case for learning the solution operator of partial differential equations. [2] Therefore, multiple papers propose a linear attention mechanism to tackle this issue:

GNOT [0]: Heterogeneous Normalized (linear) Attention (HNA) block.
Transformer for partial differential equations' operator learning [1]: Linear Attention.

Proposed Solution

Researching different proposed linear attention models: As there are many different implementations ([1] [2] and more) and related research in the field of NLP [4] a broader look into proposed methods is beneficial.
Implementing the most promising candidates for linear attention: Compose a list of promising candidates and implement the best of these.
Good example dataset and benchmark: Introduce this operator to a good benchmark for this kind of problem.

Expected Benefits

Complexity: The proposed attention mechanism can be used to implement different transformer architectures to learn operators quickly because of the space complexity of this model. Also weighting different input functions and being able to adapt to irregular meshes is interesting.
Scalability: Transformer models can be trained on a very large scale and linear attention significantly lowers the costs.

Implementation Steps

Implement Linear Attention.

Open Questions

Which linear attention implementations are interesting to us?
How should this model be tested?
What are considerations when applying this model to physics constrained problems?
What are interesting benchmarks for this problem?

Literature

[1] Hao, Z. et al. Gnot: A general neural operator transformer for operator learning. in International Conference on Machine Learning 12556–12569 (PMLR, 2023).
[2] Li, Z., Meidani, K. & Farimani, A. B. Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671 (2022).
[3] Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30, (2017).
[4] Wang, Y. & Xiao, Z. LoMA: Lossless Compressed Memory Attention. (2024).

samuelburbulla · 2024-03-18T10:28:14Z

What's the status on this? Will this come soon? Otherwise I'll close the issue for now.

JakobEliasWagner added the enhancement New feature or request label Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear Attention #40

Linear Attention #40

JakobEliasWagner commented Feb 15, 2024 •

edited

Loading

samuelburbulla commented Mar 18, 2024

Linear Attention #40

Linear Attention #40

Comments

JakobEliasWagner commented Feb 15, 2024 • edited Loading

Description

Proposed Solution

Expected Benefits

Implementation Steps

Open Questions

Literature

samuelburbulla commented Mar 18, 2024

JakobEliasWagner commented Feb 15, 2024 •

edited

Loading