You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
What would you like to be improved?
The current behaviour is, when a user engine pod goes into OOMKilled state, it gets into Error operating Launchengine. And even if they try to reconnect a new session, kyuubi connects to same old engine, till the engine timeout and the error persists. This can hinder user experience, who dont have cluster visibility
How should we improve?
Expected behaviour should be, instead of Application mapping itself to UNKNOWN state, it should be KILLED, which eventually results in application failed, and allows to reconnect for a new session.
Are you willing to submit PR?
Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
No. I cannot submit a PR at this time.
The text was updated successfully, but these errors were encountered:
… failed state
# 🔍 Description
## Issue References 🔗
This pull request fixes#6720
## Describe Your Solution 🔧
If pod goes into OOMKilled state, application should be marked as KILLED, which is eventually identified as isFailed
## Types of changes 🔖
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
Tested locally, was able to launch new session
<img width="922" alt="kyuubi_new_session" src="https://github.com/user-attachments/assets/b003c86f-484d-40c5-b173-847374a45b1d">
---
**Be nice. Be informative.**
Closes#6721 from Madhukar525722/OOM.
Closes#6720cd0bdf6 [madlnu] [KYUUBI #6720] K8s pod OOM Killed should be identified as Application failed state
Authored-by: madlnu <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit 2d64255)
Signed-off-by: Cheng Pan <[email protected]>
Code of Conduct
Search before asking
What would you like to be improved?
The current behaviour is, when a user engine pod goes into OOMKilled state, it gets into Error operating Launchengine. And even if they try to reconnect a new session, kyuubi connects to same old engine, till the engine timeout and the error persists. This can hinder user experience, who dont have cluster visibility
How should we improve?
Expected behaviour should be, instead of Application mapping itself to UNKNOWN state, it should be KILLED, which eventually results in application failed, and allows to reconnect for a new session.
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: