-
Notifications
You must be signed in to change notification settings - Fork 264
v0.4 runtime panic #861
v0.4 runtime panic #861
Comments
We merged #860 few days ago, which maybe helpful :) |
@k82cn that can not slove my problem,this issue is mainly node's Idle And the main reason is the inconsistent resources of gpu,I merge the code,the problem reappear |
Thanks for your confirmation :) |
/kind bug |
@k82cn maybe,I will confirm your information |
@k82cn |
@asifdxtreme , would you help to cherry pick volcano-retired#26 into kube-batch :) |
Observed a panic: &errors.errorString{s:"Resource is not sufficient to do operation: <cpu 56000.00, memory 270086234112.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 5000.00> sub <cpu 54000.00, memory 268435456000.00, nvidia.com/gpu 8000.00>"} (Resource is not sufficient to do operation: <cpu 56000.00, memory 270086234112.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 5000.00> sub <cpu 54000.00, memory 268435456000.00, nvidia.com/gpu 8000.00>) This is mainly caused by gpu lost, |
kubernetes version is :1.11
kube-batch version is:v0.5
when I start kube-batch and schedule tf job,After running for a while ,kube-batch will panic:
panic information is
Causing panic are:
The text was updated successfully, but these errors were encountered: