You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The controller tries to place pods on nodes with insufficient resources. These pods don't start due to insufficient resources. They stay for example in status OutOfcpu.
I have only verified this behavior for CPU for now, but looking at the code I expect the other resources like memory to be affected as well.
Steps To Reproduce
Fill up one of the nodes in your Kubernetes cluster so that all of its available CPU is used.
Deploy and run eraser.
You'll see that:
a. There's an eraser pod on the full node with status OutOfcpu.
b. The eraser-controller-manager's log does not contain the expected "pod does not fit on node, skipping" message.
Are you willing to submit PRs to contribute to this bug fix?
Yes, I am willing to implement it.
The text was updated successfully, but these errors were encountered:
By adding some additional logging, I found out that the field nodeInfo.Requested.MilliCPU, which is used in the check for insufficient resources, always has a zero value.
The reason for this seems to be that the controller creates the nodeInfo object with the following code:
I propose to add some code to list all the pods that belong to each node and then pass this list of pods to framework.NewNodeInfo(). Then, nodeInfo.Requested should be filled correctly and the check should work as expected.
Version of Eraser
v1.3.1
Expected Behavior
The controller should skip nodes that don't have enough free resources for the eraser pods. There's a check implemented in the code for this: https://github.com/eraser-dev/eraser/blob/2ea877ca8ac933cc7b233f3dd123d67754d476f5/controllers/imagejob/imagejob_controller.go#L418C3-L418C49
Actual Behavior
The controller tries to place pods on nodes with insufficient resources. These pods don't start due to insufficient resources. They stay for example in status
OutOfcpu
.I have only verified this behavior for CPU for now, but looking at the code I expect the other resources like memory to be affected as well.
Steps To Reproduce
a. There's an eraser pod on the full node with status
OutOfcpu
.b. The eraser-controller-manager's log does not contain the expected "pod does not fit on node, skipping" message.
Are you willing to submit PRs to contribute to this bug fix?
The text was updated successfully, but these errors were encountered: