[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

dmitry-gorokhov · 2024-09-04T09:20:51Z

Request Description

PagedAttention operation is already implemented in bounds of CPU plugin using C++ and optimized for x64 using avx2/avx512 instrinsics.
The request is to optimize PA operation for aarch64 using NEON/SVE extensions.

Please refer to SDPA optimization using NEON for reference.
How to build OV on ARM: https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/build.md

Feature Use Case

PagedAttention operation implements attention algo required for workloads like continuous batching or speculative decoding. PagedAttention is used as basic attention block in VLLM OpenVINO backend and under OpenVINO GenAI API (for some use-cases). PA operation might take significant resources for execution (especially for long contexts), so its optimization is crucial for overall LLM based workloads.

Issue submission checklist

The feature request or improvement must be related to OpenVINO

rkazants · 2024-09-05T15:38:51Z

@dmitry-gorokhov, can we add good-first-issue label here?

samkitshah1262 · 2024-09-28T10:18:15Z

Hi @dmitry-gorokhov @rkazants @wenjiew , is anyone working on this issue ? If not can I take it up ?

mlukasze · 2024-09-30T04:44:15Z

it's your now, have fun :)

samkitshah1262 · 2024-10-05T10:40:34Z

Hi , currently the paged attention executor logic depends on brgemm (only supports x86-64) , so to extend the support for ARM-64 (aarach-64) , which library should we use. following are the prospects - ARM compute library(ACL) , OpenBLAS, Eigen, TFlite.

dmitry-gorokhov · 2024-10-07T06:27:30Z

Hi @samkitshah1262. OneDNN has Brgemm block implemented for aarch64. The problem is it supports only SVE512/SVE256.
So in order to cover Neon powered devices (like Apple silicon) we need to use another backend. I would propose to take a look on ACL NEGEMM kernel. It was already successfully applied for the same purpose inside SDPA op: #25183

samkitshah1262 · 2024-10-09T13:43:18Z

@dmitry-gorokhov Implementation for PA on ARM using NEON and ACL is done , except the MHAHelper class as the ACL Gemm kernal does not support stride (yet). Could you please review the current changes and provide direction. Thanks!

dmitry-gorokhov added enhancement New feature or request feature New feature request platform: arm OpenVINO on ARM / ARM64 labels Sep 4, 2024

wenjiew added the good first issue Good for newcomers label Sep 6, 2024

github-project-automation bot added this to Good first issues Sep 6, 2024

github-project-automation bot moved this to Contributors Needed in Good first issues Sep 6, 2024

mlukasze assigned samkitshah1262 Sep 30, 2024

mlukasze moved this from Contributors Needed to Assigned in Good first issues Sep 30, 2024

dmitry-gorokhov added the category: CPU OpenVINO CPU plugin label Sep 30, 2024

samkitshah1262 linked a pull request Oct 9, 2024 that will close this issue

[WIP] [CPU] [ARM] Paged attention optimization for aarch64 #26975

Open

ashwins990 mentioned this issue Nov 30, 2024

Aarch64 paged attention enablement #27841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

dmitry-gorokhov commented Sep 4, 2024 •

edited

Loading

rkazants commented Sep 5, 2024

samkitshah1262 commented Sep 28, 2024

mlukasze commented Sep 30, 2024

samkitshah1262 commented Oct 5, 2024

dmitry-gorokhov commented Oct 7, 2024

samkitshah1262 commented Oct 9, 2024

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

[Feature Request]: Optimize PagedAttention operation on aarch64 HW #26422

Comments

dmitry-gorokhov commented Sep 4, 2024 • edited Loading

Request Description

Feature Use Case

Issue submission checklist

rkazants commented Sep 5, 2024

samkitshah1262 commented Sep 28, 2024

mlukasze commented Sep 30, 2024

samkitshah1262 commented Oct 5, 2024

dmitry-gorokhov commented Oct 7, 2024

samkitshah1262 commented Oct 9, 2024

dmitry-gorokhov commented Sep 4, 2024 •

edited

Loading