Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPUW] Port unroll SDPA optimization from GenAI #27891

Conversation

AsyaPronina
Copy link
Contributor

Details:

  • Copy-pasted Unroll SDPA implementation from GenAI into NPUW ov::npuw::LLMCompiledModel

Tickets:

  • EISW-149347

@AsyaPronina AsyaPronina requested review from a team as code owners December 4, 2024 03:26
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Dec 4, 2024
@dmatveev dmatveev changed the title Copy-pasted Unroll SDPA optimization from GenAI into NPUW [NPUW] Port unroll SDPA optimization from GenAI Dec 5, 2024
@dmatveev
Copy link
Contributor

dmatveev commented Dec 5, 2024

@AsyaPronina please pick titles carefully.

@AsyaPronina AsyaPronina force-pushed the copy_unroll_sdpa_to_npuw_llm_comp_model branch from 772ebef to a4b0b81 Compare December 12, 2024 01:49
@dmatveev
Copy link
Contributor

@TolyaTalamanov please review

@dmatveev
Copy link
Contributor

@TolyaTalamanov gentle reminder

2 similar comments
@dmatveev
Copy link
Contributor

@TolyaTalamanov gentle reminder

@dmatveev
Copy link
Contributor

@TolyaTalamanov gentle reminder

Copy link
Contributor

@TolyaTalamanov TolyaTalamanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with changes.

Please discuss re-using SDPA unroll pass with GPU plugin team

for (auto tensor : model->inputs()) {
if (tensor.get_any_name().find("past_key") != std::string::npos) {
ppp.input(tensor.get_any_name()).tensor().set_element_type(ov::element::Type_t::f16);
class ScaledDotProductAttentionDecomposition : public ov::pass::MatcherPass {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass was initially taken from GPU plugin. Should we re-use already existing pass instead?


LOG_DEBUG("3. Creating prefill model as clone of transformed kvcache one.");
LOG_DEBUG("3. Align u4 ZP constants.");
align_u4_zp_constants(kvcache_model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's no longer needed

@dmatveev dmatveev added this to the 2025.0 milestone Dec 23, 2024
@dmatveev
Copy link
Contributor

@AsyaPronina please resolve the conflicts and address @TolyaTalamanov latest comments here.

@AsyaPronina AsyaPronina force-pushed the copy_unroll_sdpa_to_npuw_llm_comp_model branch from b06e640 to d4b5c4b Compare December 23, 2024 13:32
@AsyaPronina
Copy link
Contributor Author

Ready!

@AsyaPronina AsyaPronina force-pushed the copy_unroll_sdpa_to_npuw_llm_comp_model branch 2 times, most recently from 8964fa3 to 6332248 Compare December 24, 2024 03:05
@AsyaPronina AsyaPronina force-pushed the copy_unroll_sdpa_to_npuw_llm_comp_model branch from 6332248 to 9049565 Compare December 24, 2024 03:09
@dmatveev dmatveev self-assigned this Dec 24, 2024
@dmatveev dmatveev enabled auto-merge December 24, 2024 13:21
@dmatveev dmatveev added this pull request to the merge queue Dec 24, 2024
Merged via the queue into openvinotoolkit:master with commit 4a450d5 Dec 24, 2024
168 checks passed
@dmatveev dmatveev deleted the copy_unroll_sdpa_to_npuw_llm_comp_model branch December 24, 2024 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants