You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a REANA cluster has not kubernetes_jobs_max_user_memory_limit set, users can set any memory limit to their workflows via reana.yaml's kubernetes_memory_limit. If the user sets a value high enough (depending on the cluster memory resources) that a node can't schedule it, the job pod will remain in Pending status and generate a warning event with the reason FailedScheduling.
$ kubectl describe pod reana-run-job...Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> 0/4 nodes are available: 4 Insufficient memory. Warning FailedScheduling <unknown> 0/4 nodes are available: 4 Insufficient memory.
However, the workflow remains in a running status forever, and no logs propagated to the user.
Expected behavior
We should catch this case here and set the workflow status to failed, as well as generate meaningful logs for the user to consult via reana-client logs.
In addition, we should investigate similar cases of other k8s event reasons to handle them properly.
The text was updated successfully, but these errors were encountered:
Current behavior
If a REANA cluster has not
kubernetes_jobs_max_user_memory_limit
set, users can set any memory limit to their workflows via reana.yaml'skubernetes_memory_limit
. If the user sets a value high enough (depending on the cluster memory resources) that a node can't schedule it, the job pod will remain inPending
status and generate a warning event with the reasonFailedScheduling
.However, the workflow remains in a
running
status forever, and no logs propagated to the user.Expected behavior
We should catch this case here and set the workflow status to
failed
, as well as generate meaningful logs for the user to consult viareana-client logs
.In addition, we should investigate similar cases of other k8s event reasons to handle them properly.
The text was updated successfully, but these errors were encountered: