-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes stuck in NotReady leaves pods in Pending state (does not autoscale) #37995
Comments
I'm seeing something similar with auto scaling, on gke / kube 1.5.1 In my case the new autoscaled node eventually becomes non responsive and enters into a NotReady state. I can't even ssh into the node from the cloud console - it appears to hang. The serial port output shows nothing of interest. I am using PVCs. I have a suspicion this may be related to attach / detach of pvc disks If I reset the node in the cloud console, the cluster eventually seems to recover, and I can ssh into the node |
I get the same thing with GKE with nodes on version 1.4.7 but without autoscaling. Every couple days as my CI system is updating the image on my deployments I notice my new pods can't be scheduled, my old pods are gone, and 2 of my 3 nodes are NotReady. |
/sig node |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Is this a request for help?: No
(I submitted a help request through Google Support who did some of the research below but stated that "This is an issue that need to be address by Kubernetes engineers.")
What keywords did you search in Kubernetes issues before filing this one? "notready" "autoscale"
4135 discusses similar problems with out-of-disk errors, but ours were related to out-of-memory which is configurable on nodes.
34772 is related to a race condition with scheduling; my issue has to do with node state.
BUG REPORT:
Kubernetes version (use
kubectl version
):Environment:
uname -a
): Linux report-3312827547-hony0 4.4.21+ Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Thu Nov 10 21:43:53 PST 2016 x86_64 x86_64 x86_64 GNU/LinuxWhat happened:
We have nodes that stop posting their node status back to kubernetes.
This leaves the node is a NotReady state which means that pods cannot be scheduled on it.
Our cluster is set up with two node groups, both of which are configured for autoscale. However, because the node exists, autoscaling won't add a new node (in either group). Because the node is stuck in "NotReady" state, kubernetes can't schedule any pods on it.
This leaves us in a situation where we have pods that are waiting to be scheduled.
Trying to SSH into the node just spins while "Establishing connection to SSH server". I've let this try for over an hour and it won't connect. The only solution I have to resolve this is to reset the node.
In investigation with Google Support we determined that the node had reached and OOM condition that appeared to crash kubelet (or something). The solution Google Support suggested was to set memory limits on every container.
We set memory limits on most of our containers, but continue to see this issue.
What you expected to happen:
Setting a memory limit on all containers feels counterproductive to me as I would expect that if kubernetes could fail if the system runs out of memory that it would protect against that (i.e kill any container that is exceeding available memory on the node or something).
Additionally, if a node stops responding I expect that to be a different state than a node that is starting up. So when the nodes are "NotReady" and they have stopped responding, the autoscaler will spin up new nodes to satisfy the "Pending" pod requirements.
(If you want me to split this into two issues, let me know.)
How to reproduce it (as minimally and precisely as possible):
I tried to build a test cluster. It doesn't seem to crash the nodes though, so something more complicated than my "use all the memory" script might be necessary.
Manifests for the below are in this gist.
Spin up an autoscale cluster and load a deployment with replicas that requires the cluster to grow.
Wait for the cluster to add a new node and schedule the pod.
The spin up a pod that will consume all the memory on a node.
Wait for the node to run out of memory and crash.
The text was updated successfully, but these errors were encountered: