-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add document for vllm paged attention kernel. #2978
Conversation
For reviewers, rendered here: https://vllm--2978.org.readthedocs.build/en/2978/dev/kernel/paged_attention.html |
Thank you for this great write up! |
This doc is very usefull. Hope to be merged soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks! Some minor comments.
@pian13131 This is AWESOME! Thanks for your contribution! |
Hello, I am currently studying the vLLM paged attention kernel, and I've found that the implementation can be quite complex for newcomers. After thoroughly reviewing the primary implementation of the kernel in
csrc/attention/attention_kernels.cu
, I have composed this document to provide a high-level understanding of the paged attention kernel. The document covers explanations on memory layout, read patterns, and step-by-step calculations, accompanied by diagrams and pseudo-code. It is intended to serve as a valuable reference for individuals interested in the implementation of the paged attention kernel.Given that I am still a novice in this subject, there may be some misunderstandings in the document. I welcome any comments and advice to enhance its accuracy and clarity. Your feedback is highly appreciated! Wish this document can be merged to help other people!