Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radix Attention support #2560

Closed
MichaelJayW opened this issue Jan 23, 2024 · 2 comments
Closed

Radix Attention support #2560

MichaelJayW opened this issue Jan 23, 2024 · 2 comments

Comments

@MichaelJayW
Copy link

MichaelJayW commented Jan 23, 2024

RadixAttention, a novel technique for automatic KV cache reuse during runtime. Furthermore, RadixAttention is compatible with existing techniques like continuous batching and paged attention.

Blog: https://lmsys.org/blog/2024-01-17-sglang/
Paper:https://arxiv.org/abs/2312.07104
Code:https://github.com/sgl-project/sglang
llama_7b

@irasin
Copy link
Contributor

irasin commented Jan 25, 2024

LGTM

@zhuohan123
Copy link
Member

We have our plan here: #2614. Please take a look!

@hmellor hmellor closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants