-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU]whisper readvalue optimize #26130
[CPU]whisper readvalue optimize #26130
Conversation
Profile each node execute time. Support Static and Dynamic infer. Signed-off-by: xipingya <[email protected]>
If reset is not called, these marked nodes also desn't need to be executed. Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
decoder network: 20ms -> 5 ms. Signed-off-by: xipingya <[email protected]>
Signed-off-by: xipingya <[email protected]>
…graph.cpp Co-authored-by: Maksim Kutakov <[email protected]>
…s/stateful_sdpa_fusion.cpp Co-authored-by: Maksim Kutakov <[email protected]>
…ision. So change convert's dst precision to i8. Signed-off-by: xipingya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/common/stateful_init_graph.cpp
Outdated
Show resolved
Hide resolved
…e_optimize # Conflicts: # src/plugins/intel_cpu/src/nodes/input.cpp # src/plugins/intel_cpu/src/transformations/cpu_opset/convert_to_cpu_specific_opset.hpp
…p16 precision." This reverts commit 1536ece.
Signed-off-by: xipingya <[email protected]>
…e_optimize # Conflicts: # src/plugins/intel_cpu/src/graph_optimizer.cpp # src/plugins/intel_cpu/src/nodes/input.cpp # src/plugins/intel_cpu/src/nodes/memory.cpp # src/plugins/intel_cpu/src/nodes/memory.hpp # src/plugins/intel_cpu/src/transformations/cpu_opset/convert_to_cpu_specific_opset.hpp
de098e3
to
9dbb1df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Internal performance validation is required. |
@dmitry-gorokhov , could you please take a look? |
GetOutputNodesMap API was removed. Update my PR based on new API. Signed-off-by: xipingya <[email protected]>
…nfig. Signed-off-by: xipingya <[email protected]>
Internal performance validation passed. |
General question: does this optimization supports Ini subgraps which fully or partially shared between several ReadValue ops (common scenarion for LLMs)? If not are there any ideas how proposed optimization will be extended for that cases? |
No, about past K, V, I skipped it(https://github.com/openvinotoolkit/openvino/pull/26130/files#diff-c12bcfcf5456497adee38ad4362bde01528476b601fae102135a56810dbb70deR273-R279). because current PR is a bit complex, maybe we can postpone implement this function for https://github.com/xipingyan/openvino/blob/17f10b34f1c1a824a366ceed14c51b44163a1d50/src/plugins/intel_cpu/src/nodes/memory.hpp#L263. I think the new extension will share or duplicate init graph codes between |
### Details: - *New `ReadValueWithSubgraph` node.* - *Move `ReadValue`'s initial subgraph nodes to `ReadValueWithSubgraph`* - *Mirror `ReadValueWithSubgraph `to `MemoryInput`* - *Upgrade MemoryInput and MemoryInputBase in order to let them support multiple inputs" - *Call new interface `Init` and `Activate` of ov::intel_cpu::Graph, avoid to memory copy. Refer: openvinotoolkit#25385 - *Depends on openvinotoolkit#27189 ### Tickets: - *128743* --------- Signed-off-by: xipingya <[email protected]> Co-authored-by: Egor Duplensky <[email protected]> Co-authored-by: Maksim Kutakov <[email protected]> Co-authored-by: Maksim Kutakov <[email protected]>
Details:
ReadValueWithSubgraph
node.ReadValue
's initial subgraph nodes toReadValueWithSubgraph
ReadValueWithSubgraph
toMemoryInput
Init
andActivate
of ov::intel_cpu::Graph, avoid to memory copy. Refer: [CPU] Introduce SubModel op and Composite node #25385Tickets: