Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

Open
1 task done
dmitry-gorokhov opened this issue Sep 4, 2024 · 6 comments · May be fixed by #26975
Open
1 task done

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

dmitry-gorokhov opened this issue Sep 4, 2024 · 6 comments · May be fixed by #26975
Assignees
Labels
category: CPU OpenVINO CPU plugin enhancement New feature or request feature New feature request good first issue Good for newcomers platform: arm OpenVINO on ARM / ARM64

Comments

@dmitry-gorokhov
Copy link
Contributor

dmitry-gorokhov commented Sep 4, 2024

Request Description

PagedAttention operation is already implemented in bounds of CPU plugin using C++ and optimized for x64 using avx2/avx512 instrinsics.
The request is to optimize PA operation for aarch64 using NEON/SVE extensions.

Please refer to SDPA optimization using NEON for reference.
How to build OV on ARM: https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/build.md

Feature Use Case

PagedAttention operation implements attention algo required for workloads like continuous batching or speculative decoding. PagedAttention is used as basic attention block in VLLM OpenVINO backend and under OpenVINO GenAI API (for some use-cases). PA operation might take significant resources for execution (especially for long contexts), so its optimization is crucial for overall LLM based workloads.

Issue submission checklist

  • The feature request or improvement must be related to OpenVINO
@dmitry-gorokhov dmitry-gorokhov added enhancement New feature or request feature New feature request platform: arm OpenVINO on ARM / ARM64 labels Sep 4, 2024
@rkazants
Copy link
Contributor

rkazants commented Sep 5, 2024

@dmitry-gorokhov, can we add good-first-issue label here?

@wenjiew wenjiew added the good first issue Good for newcomers label Sep 6, 2024
@github-project-automation github-project-automation bot moved this to Contributors Needed in Good first issues Sep 6, 2024
@samkitshah1262
Copy link

Hi @dmitry-gorokhov @rkazants @wenjiew , is anyone working on this issue ? If not can I take it up ?

@mlukasze mlukasze moved this from Contributors Needed to Assigned in Good first issues Sep 30, 2024
@mlukasze
Copy link
Contributor

it's your now, have fun :)

@dmitry-gorokhov dmitry-gorokhov added the category: CPU OpenVINO CPU plugin label Sep 30, 2024
@samkitshah1262
Copy link

Hi , currently the paged attention executor logic depends on brgemm (only supports x86-64) , so to extend the support for ARM-64 (aarach-64) , which library should we use. following are the prospects - ARM compute library(ACL) , OpenBLAS, Eigen, TFlite.

@dmitry-gorokhov
Copy link
Contributor Author

Hi @samkitshah1262. OneDNN has Brgemm block implemented for aarch64. The problem is it supports only SVE512/SVE256.
So in order to cover Neon powered devices (like Apple silicon) we need to use another backend. I would propose to take a look on ACL NEGEMM kernel. It was already successfully applied for the same purpose inside SDPA op: #25183

@samkitshah1262
Copy link

@dmitry-gorokhov Implementation for PA on ARM using NEON and ACL is done , except the MHAHelper class as the ACL Gemm kernal does not support stride (yet). Could you please review the current changes and provide direction. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin enhancement New feature or request feature New feature request good first issue Good for newcomers platform: arm OpenVINO on ARM / ARM64
Projects
Status: Assigned
Development

Successfully merging a pull request may close this issue.

5 participants