[NPUW] Port unroll SDPA optimization from GenAI #27891

AsyaPronina · 2024-12-04T03:26:02Z

Details:

Copy-pasted Unroll SDPA implementation from GenAI into NPUW ov::npuw::LLMCompiledModel

Tickets:

EISW-149347

dmatveev · 2024-12-05T18:50:16Z

@AsyaPronina please pick titles carefully.

dmatveev · 2024-12-13T11:23:57Z

@TolyaTalamanov please review

dmatveev · 2024-12-17T17:47:21Z

@TolyaTalamanov gentle reminder

dmatveev · 2024-12-18T17:27:12Z

@TolyaTalamanov gentle reminder

dmatveev · 2024-12-19T17:41:12Z

@TolyaTalamanov gentle reminder

TolyaTalamanov

I'm OK with changes.

Please discuss re-using SDPA unroll pass with GPU plugin team

TolyaTalamanov · 2024-12-20T11:18:56Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

-    for (auto tensor : model->inputs()) {
-        if (tensor.get_any_name().find("past_key") != std::string::npos) {
-            ppp.input(tensor.get_any_name()).tensor().set_element_type(ov::element::Type_t::f16);
+class ScaledDotProductAttentionDecomposition : public ov::pass::MatcherPass {


The pass was initially taken from GPU plugin. Should we re-use already existing pass instead?

TolyaTalamanov · 2024-12-20T11:19:21Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

-
-    LOG_DEBUG("3. Creating prefill model as clone of transformed kvcache one.");
+    LOG_DEBUG("3. Align u4 ZP constants.");
+    align_u4_zp_constants(kvcache_model);


I believe it's no longer needed

dmatveev · 2024-12-23T11:00:17Z

@AsyaPronina please resolve the conflicts and address @TolyaTalamanov latest comments here.

AsyaPronina · 2024-12-23T13:32:14Z

Ready!

…piledModel

AsyaPronina requested review from a team as code owners December 4, 2024 03:26

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Dec 4, 2024

dmatveev changed the title ~~Copy-pasted Unroll SDPA optimization from GenAI into NPUW~~ [NPUW] Port unroll SDPA optimization from GenAI Dec 5, 2024

AsyaPronina force-pushed the copy_unroll_sdpa_to_npuw_llm_comp_model branch from 772ebef to a4b0b81 Compare December 12, 2024 01:49

TolyaTalamanov approved these changes Dec 20, 2024

View reviewed changes

dmatveev added this to the 2025.0 milestone Dec 23, 2024

AsyaPronina force-pushed the copy_unroll_sdpa_to_npuw_llm_comp_model branch from b06e640 to d4b5c4b Compare December 23, 2024 13:32

AsyaPronina force-pushed the copy_unroll_sdpa_to_npuw_llm_comp_model branch 2 times, most recently from 8964fa3 to 6332248 Compare December 24, 2024 03:05

AsyaPronina added 6 commits December 24, 2024 03:08

Copy-pasted Unroll SDPA optimization from GenAI into ov::npuw::LLMCom…

6ad3183

…piledModel

Fixed review comments

268cc74

Fixed clang-format

5bb84be

Fixed Linux build

90ed74b

Fixed failing Linux builds

5dcf47b

Removed unnecessary functions

9049565

AsyaPronina force-pushed the copy_unroll_sdpa_to_npuw_llm_comp_model branch from 6332248 to 9049565 Compare December 24, 2024 03:09

dmatveev self-assigned this Dec 24, 2024

dmatveev enabled auto-merge December 24, 2024 13:21

dmatveev added this pull request to the merge queue Dec 24, 2024

AsyaPronina mentioned this pull request Dec 24, 2024

Static llm pipeline dynamic shape model openvinotoolkit/openvino.genai#1240

Open

Merged via the queue into openvinotoolkit:master with commit 4a450d5 Dec 24, 2024
168 checks passed

dmatveev deleted the copy_unroll_sdpa_to_npuw_llm_comp_model branch December 24, 2024 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPUW] Port unroll SDPA optimization from GenAI #27891

[NPUW] Port unroll SDPA optimization from GenAI #27891

AsyaPronina commented Dec 4, 2024

dmatveev commented Dec 5, 2024

dmatveev commented Dec 13, 2024

dmatveev commented Dec 17, 2024

dmatveev commented Dec 18, 2024

dmatveev commented Dec 19, 2024

TolyaTalamanov left a comment

TolyaTalamanov Dec 20, 2024

TolyaTalamanov Dec 20, 2024

dmatveev commented Dec 23, 2024

AsyaPronina commented Dec 23, 2024

[NPUW] Port unroll SDPA optimization from GenAI #27891

[NPUW] Port unroll SDPA optimization from GenAI #27891

Conversation

AsyaPronina commented Dec 4, 2024

Details:

Tickets:

dmatveev commented Dec 5, 2024

dmatveev commented Dec 13, 2024

dmatveev commented Dec 17, 2024

dmatveev commented Dec 18, 2024

dmatveev commented Dec 19, 2024

TolyaTalamanov left a comment

Choose a reason for hiding this comment

TolyaTalamanov Dec 20, 2024

Choose a reason for hiding this comment

TolyaTalamanov Dec 20, 2024

Choose a reason for hiding this comment

dmatveev commented Dec 23, 2024

AsyaPronina commented Dec 23, 2024