GitHub - rayleizhu/vllm-ra: [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts

About

A vLLM fork with RelayAttention implemented. See the paper for details: RelayAttention for Efficient Large Language Model Serving with Long System Prompts

forked from vllm v0.2.6
used to produce ALL tables and figures in the paper.
including not only the implementation of the idea, but also those scripts for data collection and plotting.

How to use

Follow the vLLM documentation to install from source. See also _scripts/install.sh.
Check the scripts here to reproduce the experiments and collect data. If you are using a slurm cluster, check the _cluster directory instead.

You can use examples/relay_inference.py as the entrance for exploration of this project. See Figure 9 in the paper for a big picture.

Citation

If you use this repo for your research, please cite our paper:

@misc{zhu2024relayattention,
      title={RelayAttention for Efficient Large Language Model Serving with Long System Prompts}, 
      author={Lei Zhu and Xinjiang Wang and Wayne Zhang and Rynson W. H. Lau},
      year={2024},
      eprint={2402.14808},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 662 Commits
_assets		_assets
_cluster		_cluster
_scripts		_scripts
benchmarks		benchmarks
csrc		csrc
docs		docs
examples		examples
rocm_patch		rocm_patch
stat		stat
tests		tests
vllm		vllm
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.rocm		Dockerfile.rocm
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
format.sh		format.sh
mypy.ini		mypy.ini
patch_xformers.rocm.sh		patch_xformers.rocm.sh
pyproject.toml		pyproject.toml
relay_attention.md		relay_attention.md
requirements-build.txt		requirements-build.txt
requirements-dev.txt		requirements-dev.txt
requirements-rocm.txt		requirements-rocm.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

How to use

Citation

About

Releases

Packages

Languages

License

rayleizhu/vllm-ra

Folders and files

Latest commit

History

Repository files navigation

About

How to use

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages