Skip to content

Latest commit

 

History

History
executable file
·
90 lines (73 loc) · 3.68 KB

README.md

File metadata and controls

executable file
·
90 lines (73 loc) · 3.68 KB

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation 🔥, ICML 2024

News

🔥May 2 2024: BiPE is accepted to ICML 2024!

🔥Apr 11 2024: Release a 1.6B BiPE-RoPE model pre-trained on 300B tokens, demonstrating consistent extrapolation ability comparable to that of the 151 million-parameter version.

🔥Apr 4 2024: Initial commits. More codes (YaRN finetuning, SCROLLs finetuning) are coming soon.

Overview

This repository contains the source code for

↓Overview of BiPE

Setup Environment

conda create -n bipe python=3.9
conda activate bipe
pip3 install -r requirements.txt

Data for Pretraining

We use the Pile for pretraining with all copyrighted data removed.

cd BiPE;
DATA_DIR=./data # the directory to save the data
python3 download_data.py --dataset-cache-dir $DATA_DIR

Pretraining

The scripts under script/ covers the commands for training and perpleixity evaluation.

For training, the key modifications for BiPE are getting token ids (intra-segment) and position ids (inter-segment) by the get_bilevel_ids function. Then, the token ids are used to get absolute positional encodings (get_ape_embeddings) and the position ids are used to get relative positional encodings. For example, you can start training 151M BiPE-RoPE model with the following command:

cd BiPE
OUTPUT_DIR=./output  # path to save checkpoints and tensorboard
DATA_DIR=./data  # path to load data
CONFIG_NAME=config/bipe_rope.json
bash script/train.sh

You can change CONFIG_NAME to choose different positional encoding variants. (choose from [config/bipe_rope.json, config/bipe_alibi.json, config/rope.json, config/alibi.json)

Perplexity Evaluation

For perplexity evaluation, you can use the following command:

cd BiPE;
DATA_DIR=./data  # path to load data
MODEL=./bipe_rope # model checkpoint path
bash script/eval.sh

You can also download our pre-trained models (Note that the 1.6B model is pre-trained with a batch size of 1024):

Model HuggingFace Checkpoint 🤗
BiPE_RoPE-151M link
BiPE_RoPE-1.6B link
RoPE-151M link
BiPE_ALiBi-151M link
ALiBi-151M link

For example, to evaluate BiPE-RoPE-151M, you can use the following command:

git lfs install
git clone https://huggingface.co/hzy00/BiPE_RoPE-151M
DATA_DIR=./data  # path to load data
MODEL=./BiPE_RoPE-151M # model checkpoint path
bash script/eval.sh

Citations


@inproceedings{
he2024two,
title={Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation},
author={Zhenyu He and Guhao Feng and Shengjie Luo and Kai Yang and Liwei Wang and Jingjing Xu and Zhi Zhang and Hongxia Yang and Di He},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=luqH1eL4PN}
}