Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes stuck in NotReady leaves pods in Pending state (does not autoscale) #37995

Closed
jeremywadsack opened this issue Dec 2, 2016 · 6 comments
Closed
Labels
area/provider/gcp Issues or PRs related to gcp provider lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@jeremywadsack
Copy link

Is this a request for help?: No

(I submitted a help request through Google Support who did some of the research below but stated that "This is an issue that need to be address by Kubernetes engineers.")

What keywords did you search in Kubernetes issues before filing this one? "notready" "autoscale"

4135 discusses similar problems with out-of-disk errors, but ours were related to out-of-memory which is configurable on nodes.
34772 is related to a race condition with scheduling; my issue has to do with node state.


BUG REPORT:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.4", 
GitCommit:"3b417cc4ccd1b8f38ff9ec96bb50a81ca0ea9d56", GitTreeState:"clean", BuildDate:"2016-10-21T02:48:38Z", GoVersion:"go1.7.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.6", GitCommit:"e569a27d02001e343cb68086bc06d47804f62af6", GitTreeState:"clean", BuildDate:"2016-11-12T05:16:27Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

What happened:

We have nodes that stop posting their node status back to kubernetes.

$ kubectl describe node gke-keylime-toolbox-highmem-pool-b5e681ff-fyyt
...
Wed, 16 Nov 2016 03:05:57 -0800 NodeStatusUnknown Kubelet stopped posting node status.
...

This leaves the node is a NotReady state which means that pods cannot be scheduled on it.

$ kubectl get nodes
NAME                                             STATUS     AGE
gke-keylime-toolbox-highmem-pool-b5e681ff-fyyt   NotReady   6d
$ kubectl get pods -a | grep -v Running 
NAME                                                         READY     STATUS    RESTARTS   AGE
report-file-1917812907-609xq          0/1       Pending   0          7h
reprocess-2942535581-pa5gj            0/1       Pending   0          6h
processing-2495744680-liqfk                              0/1       Pending   0          6h
mailers-1884379191-7bgb6               0/1       Pending   0          6h

Our cluster is set up with two node groups, both of which are configured for autoscale. However, because the node exists, autoscaling won't add a new node (in either group). Because the node is stuck in "NotReady" state, kubernetes can't schedule any pods on it.

This leaves us in a situation where we have pods that are waiting to be scheduled.

Trying to SSH into the node just spins while "Establishing connection to SSH server". I've let this try for over an hour and it won't connect. The only solution I have to resolve this is to reset the node.

In investigation with Google Support we determined that the node had reached and OOM condition that appeared to crash kubelet (or something). The solution Google Support suggested was to set memory limits on every container.

We set memory limits on most of our containers, but continue to see this issue.

What you expected to happen:
Setting a memory limit on all containers feels counterproductive to me as I would expect that if kubernetes could fail if the system runs out of memory that it would protect against that (i.e kill any container that is exceeding available memory on the node or something).

Additionally, if a node stops responding I expect that to be a different state than a node that is starting up. So when the nodes are "NotReady" and they have stopped responding, the autoscaler will spin up new nodes to satisfy the "Pending" pod requirements.

(If you want me to split this into two issues, let me know.)

How to reproduce it (as minimally and precisely as possible):
I tried to build a test cluster. It doesn't seem to crash the nodes though, so something more complicated than my "use all the memory" script might be necessary.

Manifests for the below are in this gist.

Spin up an autoscale cluster and load a deployment with replicas that requires the cluster to grow.

$ gcloud alpha container clusters create test-not-ready --num-nodes=1 --enable-autoscaling --min-nodes=1 --max-nodes=5 --machine-type g1-small
Creating cluster test-not-ready...done.
$ kubectl apply -f nginx-deployment.yaml 
deployment "nginx-deployment" created
$ kubectl get pods
NAME                                READY     STATUS              RESTARTS   AGE
nginx-deployment-3818977466-3rx74   0/1       ContainerCreating   0          6s
nginx-deployment-3818977466-847me   0/1       Pending             0          6s

Wait for the cluster to add a new node and schedule the pod.

The spin up a pod that will consume all the memory on a node.

$ kubectl delete deployment all-memory-deployment
$ kubectl top nodes
NAME                                            CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
gke-test-not-ready-default-pool-b04036fd-q9ay   61m          6%        763Mi           44%       
gke-test-not-ready-default-pool-b04036fd-cbi5   986m         98%       856Mi           50%

Wait for the node to run out of memory and crash.

@wstrange
Copy link
Contributor

wstrange commented Jan 4, 2017

I'm seeing something similar with auto scaling, on gke / kube 1.5.1

In my case the new autoscaled node eventually becomes non responsive and enters into a NotReady state.

I can't even ssh into the node from the cloud console - it appears to hang. The serial port output shows nothing of interest.

I am using PVCs. I have a suspicion this may be related to attach / detach of pvc disks

If I reset the node in the cloud console, the cluster eventually seems to recover, and I can ssh into the node

@omerzach
Copy link

I get the same thing with GKE with nodes on version 1.4.7 but without autoscaling. Every couple days as my CI system is updating the image on my deployments I notice my new pods can't be scheduled, my old pods are gone, and 2 of my 3 nodes are NotReady.

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 25, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 25, 2017
@0xmichalis 0xmichalis added area/provider/gcp Issues or PRs related to gcp provider needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 25, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 30, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 29, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/gcp Issues or PRs related to gcp provider lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

7 participants