Skip to content

[NeurIPS 2024] Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

Notifications You must be signed in to change notification settings

soacker/Mesa-Extrapolation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

Implementation of the proposed Mesa-Extrapolation in Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs.

1.Abstract

Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we conduct a theoretical analysis to better understand why No Position Encoding (NoPE) fails outside its effective range, as well as examining the power of Position Encoding (PE) in this context. Our findings reveal that with meticulous weave position, PE can indeed be extended beyond effective range. Our theorems establish that LLMs equipped with weave PE can achieve improved extrapolation performance without additional cost. Furthermore, we introduce a novel weave PE method, Mesa-Extrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair PE to manage the final chunk. This method not only retains competitive performance but also offers substantial benefits such as significantly reduced memory demand and faster inference speed. Extensive experiments validate the effectiveness of Mesa-Extrapolation, demonstrating its potential as a scalable solution to enhancing LLMs' applicative reach.

2.Overall

The schematic diagram of our method is shown below:

Image 1

Our approach achieves minimal memory usage and the fastest inference latency:

Image 2

It extends the existing Phi-3-instruct model, which supports a sequence length of 128k, to at least 192k:

Image 3

3.Usage

Dependencies

Our current implementation is based on transformers==4.31.0. We will continue to update it in the future. For attention calculation, we currently support both the flash-attention and torch implementation.

Passkey Data Generation

python datas/make_passkey_data.py

Run

python experiments/evaluate_passkey_retrieval.py

4.TODOs

[1] Release core code of Mesa-Extrapolation, including llama, Pythia, Baichuan, Phi, etc.

[2] Supports implementation with newer versions of Transformers.

[3] Integrates with open-source inference frameworks such as vLLM.

5.Contributing

We welcome contributions from the research community to improve the efficiency of MesaExtrapolation. If you have any idea or would like to join in, please contact us ([email protected]).

If you find our method useful, please kindly cite our paper.

@misc{xin2024llm,
      title={Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs}, 
      author={Xin Ma and Yang Liu and Jingjing Li and Xiaoxu Ma},
      year={2024},
      eprint={2410.15859},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

[NeurIPS 2024] Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages