-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPUW] Port unroll SDPA optimization from GenAI #27891
[NPUW] Port unroll SDPA optimization from GenAI #27891
Conversation
@AsyaPronina please pick titles carefully. |
772ebef
to
a4b0b81
Compare
@TolyaTalamanov please review |
@TolyaTalamanov gentle reminder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with changes.
Please discuss re-using SDPA unroll pass with GPU plugin team
for (auto tensor : model->inputs()) { | ||
if (tensor.get_any_name().find("past_key") != std::string::npos) { | ||
ppp.input(tensor.get_any_name()).tensor().set_element_type(ov::element::Type_t::f16); | ||
class ScaledDotProductAttentionDecomposition : public ov::pass::MatcherPass { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pass was initially taken from GPU plugin. Should we re-use already existing pass instead?
|
||
LOG_DEBUG("3. Creating prefill model as clone of transformed kvcache one."); | ||
LOG_DEBUG("3. Align u4 ZP constants."); | ||
align_u4_zp_constants(kvcache_model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's no longer needed
@AsyaPronina please resolve the conflicts and address @TolyaTalamanov latest comments here. |
b06e640
to
d4b5c4b
Compare
Ready! |
8964fa3
to
6332248
Compare
6332248
to
9049565
Compare
Details:
Tickets: