Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Fix MLP segment fault if a new larger scratch created #25930

Merged

Conversation

luo-cheng2021
Copy link
Contributor

@luo-cheng2021 luo-cheng2021 commented Aug 6, 2024

Details:

  • Fix MLP segment fault may be caused by
    • if a new larger scratch created, the cached one is invalid
    • Silu injector in master will use r15 but currently not protect. The injector behavior changes in master, does not affect releases/2024/3.
  • ...

Tickets:

@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Aug 6, 2024
@luo-cheng2021 luo-cheng2021 marked this pull request as ready for review August 7, 2024 01:09
@luo-cheng2021 luo-cheng2021 requested review from a team as code owners August 7, 2024 01:09
@yuxu42
Copy link
Contributor

yuxu42 commented Aug 7, 2024

Hi @zhangYiIntel could you please take a review? Thanks!

@yuxu42 yuxu42 requested a review from zhangYiIntel August 7, 2024 01:45
@@ -200,7 +201,11 @@ struct LLMMLP::Impl {
}

void setM(int M) {
if (m_M < M) {
uint8_t* cur_scratch_base = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing the memory pointer is ambiguous. The condition behind is that the scratch buffer isn't big enough. Could you check why the scratch buffer is not big enough ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not big enough scratch here, it comes from situation:
1, each mlp layer will create a Memory object using a same scratch size such as 4M, then uses the scratch pointer to initialize the class member m_actUp, m_tempC, actually because the size is same, the pointer is same too.
2, some layers such as last Matmul lm_head may need a bigger scratch then scratch is re-created and the pointers used in m_actUp, m_tempC become invalid.

Here use pointer to detect the condition: changed scratch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does different MLP layers share same LLMMLP executor even if they have different M, K, N inside ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no executor, the jit kernel which is a global variable does not use M, N, K to generate the kernel code.

src/plugins/intel_cpu/src/nodes/qkv_proj.cpp Show resolved Hide resolved
@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue Aug 8, 2024
Merged via the queue into openvinotoolkit:master with commit 897079e Aug 8, 2024
132 checks passed
mory91 pushed a commit to mory91/openvino that referenced this pull request Aug 13, 2024
…toolkit#25930)

### Details:
 - *Fix MLP segment fault may be caused by*
   -  if a new larger scratch created, the cached one is invalid
- Silu injector in
[master](https://github.com/openvinotoolkit/oneDNN/blame/6b99866a4531e38a74d1de36d5b366c54c5e6cc3/src/cpu/x64/injectors/jit_uni_eltwise_injector.cpp#L175-L188)
will use r15 but currently not protect. The injector behavior changes in
master, does not affect releases/2024/3.
 - *...*

### Tickets:
 - *[148743](https://jira.devtools.intel.com/browse/CVS-148743)*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants