-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Fix MLP segment fault if a new larger scratch created #25930
[CPU] Fix MLP segment fault if a new larger scratch created #25930
Conversation
Hi @zhangYiIntel could you please take a review? Thanks! |
@@ -200,7 +201,11 @@ struct LLMMLP::Impl { | |||
} | |||
|
|||
void setM(int M) { | |||
if (m_M < M) { | |||
uint8_t* cur_scratch_base = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing the memory pointer is ambiguous. The condition behind is that the scratch buffer isn't big enough. Could you check why the scratch buffer is not big enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is not big enough scratch here, it comes from situation:
1, each mlp layer will create a Memory object using a same scratch size such as 4M, then uses the scratch pointer to initialize the class member m_actUp, m_tempC, actually because the size is same, the pointer is same too.
2, some layers such as last Matmul lm_head
may need a bigger scratch then scratch is re-created and the pointers used in m_actUp, m_tempC become invalid.
Here use pointer to detect the condition: changed scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does different MLP layers share same LLMMLP executor even if they have different M, K, N inside ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no executor, the jit kernel which is a global variable does not use M, N, K to generate the kernel code.
…toolkit#25930) ### Details: - *Fix MLP segment fault may be caused by* - if a new larger scratch created, the cached one is invalid - Silu injector in [master](https://github.com/openvinotoolkit/oneDNN/blame/6b99866a4531e38a74d1de36d5b366c54c5e6cc3/src/cpu/x64/injectors/jit_uni_eltwise_injector.cpp#L175-L188) will use r15 but currently not protect. The injector behavior changes in master, does not affect releases/2024/3. - *...* ### Tickets: - *[148743](https://jira.devtools.intel.com/browse/CVS-148743)*
Details:
Tickets: