Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear Attention #40

Open
JakobEliasWagner opened this issue Feb 15, 2024 · 1 comment
Open

Linear Attention #40

JakobEliasWagner opened this issue Feb 15, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@JakobEliasWagner
Copy link
Collaborator

JakobEliasWagner commented Feb 15, 2024

Description

Current challenges in using Neural Operators are: irregular meshes, multiple inputs, multiple inputs on different meshes, or multi-scale problems. [1] The Attention mechanism is promising in that regard as it is able to contextualize these different inputs even for different/irregular input locations. However, common implementations of the Attention mechanism posses an overall complexity of O(n²d), which is squared with respect to the length of sequences. [3] This becomes limiting when applying these networks to very big datasets, as is the case for learning the solution operator of partial differential equations. [2] Therefore, multiple papers propose a linear attention mechanism to tackle this issue:

  • GNOT [0]: Heterogeneous Normalized (linear) Attention (HNA) block.
  • Transformer for partial differential equations' operator learning [1]: Linear Attention.

Proposed Solution

  1. Researching different proposed linear attention models: As there are many different implementations ([1] [2] and more) and related research in the field of NLP [4] a broader look into proposed methods is beneficial.
  2. Implementing the most promising candidates for linear attention: Compose a list of promising candidates and implement the best of these.
  3. Good example dataset and benchmark: Introduce this operator to a good benchmark for this kind of problem.

Expected Benefits

  • Complexity: The proposed attention mechanism can be used to implement different transformer architectures to learn operators quickly because of the space complexity of this model. Also weighting different input functions and being able to adapt to irregular meshes is interesting.
  • Scalability: Transformer models can be trained on a very large scale and linear attention significantly lowers the costs.

Implementation Steps

  1. Implement Linear Attention.

Open Questions

  • Which linear attention implementations are interesting to us?
  • How should this model be tested?
  • What are considerations when applying this model to physics constrained problems?
  • What are interesting benchmarks for this problem?

Literature

[1] Hao, Z. et al. Gnot: A general neural operator transformer for operator learning. in International Conference on Machine Learning 12556–12569 (PMLR, 2023).
[2] Li, Z., Meidani, K. & Farimani, A. B. Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671 (2022).
[3] Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30, (2017).
[4] Wang, Y. & Xiao, Z. LoMA: Lossless Compressed Memory Attention. (2024).

@JakobEliasWagner JakobEliasWagner added the enhancement New feature or request label Feb 15, 2024
@samuelburbulla
Copy link
Collaborator

What's the status on this? Will this come soon? Otherwise I'll close the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants