-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] internal k8s error messages are actionable for general users #675
Comments
@slai thank you for this. I think the aim of Flyte should be to simplify the messaging so that a complicated technology is more approachable. I really appreciate this feedback, but I think this is a feature not a bug? |
@kumare3 good point, I've updated the title and description to a feature request. |
* [wip] for feast demo Signed-off-by: Ketan Umare <[email protected]> * clean up a bit Signed-off-by: Yee Hing Tong <[email protected]> * add a test and move where constructor is called Signed-off-by: Yee Hing Tong <[email protected]> * remove unneeded import Signed-off-by: Yee Hing Tong <[email protected]> * add a part of a test Signed-off-by: Yee Hing Tong <[email protected]> * Added tests Signed-off-by: Kevin Su <[email protected]> * Fixed lint Signed-off-by: Kevin Su <[email protected]> * typo Signed-off-by: Kevin Su <[email protected]> Co-authored-by: Yee Hing Tong <[email protected]> Co-authored-by: Kevin Su <[email protected]>
Minor wording changes Changed namedtuple to NamedTuple Signed-off-by: SmritiSatyanV <[email protected]>
Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏 |
Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏 |
k8s event reporting is now available in the UI as of flyteorg/flytepropeller#600 |
Motivation: Why do you think this is important?
When an error occurs with k8s starting or running a task, the resulting error message in Flyte Console is often cryptic and unactionable.
Some examples -
Pod reported success despite being OOMKilled
, when a process in the task pod was OOMKilled but not pyflyte so task still completes. Reporting success despite a bad condition like OOMKilled is confusing - did the task succeed or not? What were the memory request/limits set? Which process was OOMKilled?containers with unready status: [execution_id]|context deadline exceeded
, generally when a container image cannot be pulled from the registry. This is a combination of an unclear k8s message (why is the container unready?) and a Go specific error that a general user wouldn't understand (context deadline exceeded, i.e. timed out waiting). What should the user do?[3/3] currentAttempt done. Last Error: SYSTEM::object [execution_id] terminated in the background, manually
, generally when the task pod was on an instance that was spot pre-empted or otherwise removed. What does 'currentAttempt done' mean, what does 'terminated in the background, manually' mean - what manually terminated it? This error is actually benign in most cases because the retry should succeed, but the message gives no indication of thatGoal: What should the final outcome look like, ideally?
The error message should strike a balance between -
For example, the first error message above could be -
Alternatively, maybe an error code that a user can look up elsewhere with more info would be a better way to keep the necessary k8s detail in one place, and the general user explanation in another.
Flyte component
Additional context
See https://flyte-org.slack.com/archives/CNMKCU6FR/p1611329101012800 for some further discussion. Also seems like #512 and #535 are similar issues.
Is this a blocker for you to adopt Flyte
Nope, already a user :)
The text was updated successfully, but these errors were encountered: