Node `NotReady` status with "Kubelet stopped posting node status error" #34314

axsuul · 2016-10-07T09:38:34Z

On k8s 1.4 and used kubeadm to provision cluster:

I have node and master on same server. Suddenly by node is posting a NotReady status. Running a

# kubectl describe node <NODE>

returns

Name:                   operate
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubeadm.alpha.kubernetes.io/role=master
                        kubernetes.io/hostname=operate
Taints:                 <none>
CreationTimestamp:      Thu, 06 Oct 2016 23:57:52 +0000
Phase:
Conditions:
  Type                  Status          LastHeartbeatTime                       LastTransitionTime                      Reason           Message
  ----                  ------          -----------------                       ------------------                      ------           -------
  OutOfDisk             Unknown         Fri, 07 Oct 2016 08:13:50 +0000         Fri, 07 Oct 2016 08:14:30 +0000         NodeStatusUnknown Kubelet stopped posting node status.
  MemoryPressure        False           Fri, 07 Oct 2016 08:13:50 +0000         Thu, 06 Oct 2016 23:57:52 +0000         KubeletHasSufficientMemory        kubelet has sufficient memory available
  DiskPressure          False           Fri, 07 Oct 2016 08:13:50 +0000         Thu, 06 Oct 2016 23:57:52 +0000         KubeletHasNoDiskPressure  kubelet has no disk pressure
  Ready                 Unknown         Fri, 07 Oct 2016 08:13:50 +0000         Fri, 07 Oct 2016 08:14:30 +0000         NodeStatusUnknown Kubelet stopped posting node status.
Addresses:              10.138.0.2,10.138.0.2
Capacity:
 alpha.kubernetes.io/nvidia-gpu:        0
 cpu:                                   1
 memory:                                1737208Ki
 pods:                                  110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:        0
 cpu:                                   1
 memory:                                1737208Ki
 pods:                                  110
System Info:
 Machine ID:                    af77f36e18459f0d0d262ed74e977e59
 System UUID:                   AF77F36E-1845-9F0D-0D26-2ED74E977E59
 Boot ID:                       617db356-a6da-4099-9b63-ad5f993178fd
 Kernel Version:                4.4.0-38-generic
 OS Image:                      Ubuntu 16.04.1 LTS
 Operating System:              linux
 Architecture:                  amd64
 Container Runtime Version:     docker://1.11.2
 Kubelet Version:               v1.4.0
 Kube-Proxy Version:            v1.4.0
ExternalID:                     operate
Non-terminated Pods:            (8 in total)
  Namespace                     Name                                            CPU Requests    CPU Limits      Memory Requests Memory Limits
  ---------                     ----                                            ------------    ----------      --------------- -------------
  kube-system                   etcd-operate                                    200m (20%)      0 (0%)          0 (0%)          0 (0%)
  kube-system                   kube-controller-manager-operate                 200m (20%)      0 (0%)          0 (0%)          0 (0%)
  kube-system                   kube-discovery-982812725-kkarx                  0 (0%)          0 (0%)          0 (0%)          0 (0%)
  kube-system                   kube-dns-2247936740-fse3h                       210m (21%)      210m (21%)      390Mi (22%)     390Mi (22%)
  kube-system                   kube-proxy-amd64-x3x3m                          0 (0%)          0 (0%)          0 (0%)          0 (0%)
  kube-system                   kube-scheduler-operate                          100m (10%)      0 (0%)          0 (0%)          0 (0%)
  kube-system                   kubernetes-dashboard-1655269645-0hzho           0 (0%)          0 (0%)          0 (0%)          0 (0%)
  kube-system                   weave-net-r38tz                                 20m (2%)        0 (0%)          0 (0%)          0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.
  CPU Requests  CPU Limits      Memory Requests Memory Limits
  ------------  ----------      --------------- -------------
  730m (73%)    210m (21%)      390Mi (22%)     390Mi (22%)

I've tried restarting the server to no success. How would I debug this? Thanks

The text was updated successfully, but these errors were encountered:

axsuul · 2016-10-14T03:36:09Z

This issue came up again. I've tried debugging with

$ sudo journalctl -u kubelet

to view logs. Nothing out of the ordinary. Also fine here:

$ systemctl status kubelet
$ systemctl status docker

How can I debug this?

axsuul · 2016-10-14T03:48:10Z

Ok this was related to changing the kubeadm cluster IP... I think.

tuananh · 2016-11-16T03:46:39Z

I had the same issue. I'm using GKE (google container)

wstrange · 2017-01-04T00:24:57Z

I just ran into this - on GKE 1.5.1 with alpha features turned on

The problem appeared when the cluster auto-scaled. The first node went to status NotReady
and status:
Kubelet stopped posting node status

The node was non-responsive - I could not ssh into it. Restarting the node cleared the status

dev-e · 2017-01-23T08:30:44Z

The same problem on CoreOS. k8s 1.5.2. After recreating /var/lib/kubelet directory and re-registering master node I get this repeating messages in the log:

E0123 08:22:50.647822 887 kubelet_node_status.go:302] Error updating node status, will retry: Operation cannot be fulfilled on nodes "z14-0546-amis-c.vesta.ru": the object has been modified; please apply your changes to the latest version and try again

Node status becomes "NotReady" and pods, created by ReplicationContorllers with NodeSelector value of this node, get status "Pending", reason: "MatchNodeSelector". Reboot does not make sence.

greglearns · 2017-01-28T15:08:14Z

I just had the same problem k8s 1.4.7 stable. Very little was running on my cluster (1 master, 2 workers) other than Deis, running on AWS launched by Kops. Both workers had the same problems as above. AWS CloudWatch reported everything was fine on all servers.

Name:                   ip-172-20-116-89.us-west-2.compute.internal
Role:
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/instance-type=t2.micro
                        beta.kubernetes.io/os=linux
                        failure-domain.beta.kubernetes.io/region=us-west-2
                        failure-domain.beta.kubernetes.io/zone=us-west-2c
                        kubernetes.io/hostname=ip-172-20-116-89.us-west-2.compute.internal
Taints:                 <none>
CreationTimestamp:      Tue, 24 Jan 2017 20:52:53 -0700
Phase:
Conditions:
  Type                  Status          LastHeartbeatTime                       LastTransitionTime                      Reason                          Message
  ----                  ------          -----------------                       ------------------                      ------                          -------
  OutOfDisk             Unknown         Fri, 27 Jan 2017 10:38:42 -0700         Fri, 27 Jan 2017 10:39:26 -0700         NodeStatusUnknown               Kubelet stopped posting node status.
  MemoryPressure        False           Fri, 27 Jan 2017 10:38:42 -0700         Tue, 24 Jan 2017 20:52:53 -0700         KubeletHasSufficientMemory      kubelet has sufficient memory available
  DiskPressure          False           Fri, 27 Jan 2017 10:38:42 -0700         Tue, 24 Jan 2017 20:52:53 -0700         KubeletHasNoDiskPressure        kubelet has no disk pressure
  Ready                 Unknown         Fri, 27 Jan 2017 10:38:42 -0700         Fri, 27 Jan 2017 10:39:26 -0700         NodeStatusUnknown               Kubelet stopped posting node status.
  NetworkUnavailable    False           Sat, 28 Jan 2017 08:03:04 -0700         Sat, 28 Jan 2017 08:03:04 -0700         RouteCreated                    RouteController created a route

dev-e · 2017-02-01T14:23:56Z

Problem solved by applying changes to kubelet configuration (/etc/systemd/system/kubelet.service) according to latest version of reference page on CoreOS: https://coreos.com/kubernetes/docs/latest/deploy-master.html

skylabreddy · 2017-11-20T16:32:24Z

I am also facing same issue.

I see some issue after deploy the app. App is deployed success But running is 0.

root@kubernetes:# kubectl run kubernetes-bootcamp --image=docker.io/jocatalin/kubernetes-bootcamp:v1 --port=8080
deployment "kubernetes-bootcamp" created
root@kubernetes:# kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kubernetes-bootcamp 1 1 1 0 15s
skycouch 1 1 1 0 2d
test 1 1 1 0 3d

Can you suggest and give me the appropriate suggestion.

root@kubernetes:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubenode1 NotReady 3d v1.8.3
kubenode2 NotReady 3d v1.8.3
kubernetes NotReady master 3d v1.8.3

Thanks
Skylab

viveksinghggits · 2019-03-14T12:46:19Z

Ok this was related to changing the kubeadm cluster IP... I think.

@axsuul were you able to resolve the issue, can you share the details. I also encountered the same issue where the master and worker is on the same node (one node cluster).

axsuul · 2019-03-14T20:54:24Z

@viveksinghggits Sorry I ended up moving to Docker Swarm and I don't remember the details anymore, sorry

SaltedEggIndomee · 2019-04-24T03:47:33Z

I'm having the same issue on EKS with Kubernetes 1.12.

Minimal steps to reproduce:

Create a deployment with 1 replica. 2 Nodes.
Create a HPA with 50% cpu target, minpods 1, maxpods3
Overload the cpu on the first Pod
Watch HPA scaling with "kubectl get hpa -w"
After 1 minute, see 1 Node go down with NotReady status.
After 30 mins, Node still in NotReady status. Even after HPA has scaled down back to 1 Pod.

Rebooting the EC2 instance doesn't help.

ghost · 2019-05-17T17:30:27Z

I'm having the same issue. Is the issue resolved ?. If yes, can anyone provide step by step instructions on resolving the issue?

JnMik · 2019-07-04T14:25:31Z

Happens to me as well in AWS EKS.
Any hint ?

Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                    Message
  ----             ------    -----------------                 ------------------                ------                    -------
  OutOfDisk        Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  MemoryPressure   Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  DiskPressure     Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  PIDPressure      False     Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 08:26:42 -0400   KubeletHasSufficientPID   kubelet has sufficient PID available
  Ready            Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.

Can't log into the instance to inspect kubelet. Seems the instance is frozen or something

Edit: Follow up here awslabs/amazon-eks-ami#79

ghost · 2019-07-04T15:54:49Z

Keep the powershell scripts running. Don't shutdown the windows machine or close the powershell window. If you had to restart the machine. Run the join script again. Sent from Yahoo Mail on Android On Thu, 4 Jul 2019 at 7:57 PM, Jean-Michael Cyr<[email protected]> wrote: Happens to me as well. Any hint ? Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk Unknown Thu, 04 Jul 2019 10:12:19 -0400 Thu, 04 Jul 2019 10:13:04 -0400 NodeStatusUnknown Kubelet stopped posting node status. MemoryPressure Unknown Thu, 04 Jul 2019 10:12:19 -0400 Thu, 04 Jul 2019 10:13:04 -0400 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Thu, 04 Jul 2019 10:12:19 -0400 Thu, 04 Jul 2019 10:13:04 -0400 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure False Thu, 04 Jul 2019 10:12:19 -0400 Thu, 04 Jul 2019 08:26:42 -0400 KubeletHasSufficientPID kubelet has sufficient PID available Ready Unknown Thu, 04 Jul 2019 10:12:19 -0400 Thu, 04 Jul 2019 10:13:04 -0400 NodeStatusUnknown Kubelet stopped posting node status. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

bobbui · 2019-07-05T09:54:43Z

Happens to me as well in AWS EKS.
Any hint ?

Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                    Message
  ----             ------    -----------------                 ------------------                ------                    -------
  OutOfDisk        Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  MemoryPressure   Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  DiskPressure     Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.
  PIDPressure      False     Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 08:26:42 -0400   KubeletHasSufficientPID   kubelet has sufficient PID available
  Ready            Unknown   Thu, 04 Jul 2019 10:12:19 -0400   Thu, 04 Jul 2019 10:13:04 -0400   NodeStatusUnknown         Kubelet stopped posting node status.

Can't log into the instance to inspect kubelet. Seems the instance is frozen or something

Edit: Follow up here awslabs/amazon-eks-ami#79

Happen to me as well, started to happen when I was running the stress test against the services running inside cluster

mansurali901 · 2020-03-05T14:13:17Z

For me first you find any HPA that is exceeding resources delete the HPA will work

bagulm123 · 2020-07-07T16:57:54Z

Is there any solution to this issue? I have observed it when my cluster got autoscaled. The first worker node became Not Ready and its in the same status till now (After 8 hours).

truongtrevor · 2020-07-26T14:35:48Z

same issue here using minikube

truongtrevor · 2020-07-26T14:36:15Z

CreationTimestamp: Sun, 26 Jul 2020 18:41:43 +0700
Taints: node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: localhost.localdomain
AcquireTime:
RenewTime: Sun, 26 Jul 2020 19:44:19 +0700
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message

MemoryPressure Unknown Sun, 26 Jul 2020 19:42:21 +0700 Sun, 26 Jul 2020 21:26:06 +0700 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Sun, 26 Jul 2020 19:42:21 +0700 Sun, 26 Jul 2020 21:26:06 +0700 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Sun, 26 Jul 2020 19:42:21 +0700 Sun, 26 Jul 2020 21:26:06 +0700 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Sun, 26 Jul 2020 19:42:21 +0700 Sun, 26 Jul 2020 21:26:06 +0700 NodeStatusUnknown Kubelet stopped posting node status.

nemo-xue · 2020-12-17T14:05:47Z

Hi, the issue is closed. But does anyone has a solution for it?

immanuelfodor · 2020-12-17T15:09:36Z

Maybe this thread helps you, you probably need to reserve resources for host daemons using kubelet args: rancher/rancher#29997 (comment)

nemo-xue · 2020-12-17T16:29:51Z

Thanks @immanuelfodor .
I found there were many Pending csr.
This command helps to solve my issue: "oc get csr -o name | xargs oc adm certificate approve"

thomasresley · 2021-03-12T11:46:12Z

The problem is likely to be, the memory and processing resources within the clusters don't match the workload. That is you have exhausted the cluster resources and you need to deploy more worker nodes. Restart all the instances all at once. Give them some time to reboot and restart all the Kubernetes resources on the cluster. Worked for me on AWS

dinesh25cs · 2022-02-11T04:42:44Z

I got the same issue we have debuged using below commands it works really

KUBERNETES:
Deleting node and rejoining it to the cluster:
On MASTER:

kubectl undordon node_name
kubectl delete node node_name
kubeadm token create --print-join-command (show the kubeadm join token info)
On NODE:
kubeadm reset
kubeadm join 10.87.208.94:6443 --token eah77w.1yfl82ahipkdr1da --discovery-token-ca-cert-hash sha256:15e3637fa73615d30b97c162e610709384c8a395755dd6bba7982cde1a458da8
[preflight] Running pre-flight checks

[root@cerebro05 etc]# kubeadm join 10.87.208.94:6443 --token eah77w.1yfl82ahipkdr1da --discovery-token-ca-cert-hash sha256:15e3637fa73615d30b97c162e610709384c8a395755dd6bba7982cde1a458da8
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher

rajaduraicloud · 2022-08-23T06:38:23Z

Check swap on or off ---> free -m
if swap is on , turn off --->sudo swapoff -a
now its works.....!

k8s-github-robot added area/apiserver team/ux labels Oct 7, 2016

axsuul closed this as completed Oct 7, 2016

axsuul reopened this Oct 14, 2016

axsuul closed this as completed Oct 14, 2016

nickumia-reisys mentioned this issue Apr 1, 2022

Setup auto-scaling for Managed Node Groups (MNGs) GSA/data.gov#3669

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node `NotReady` status with "Kubelet stopped posting node status error" #34314

Node `NotReady` status with "Kubelet stopped posting node status error" #34314

axsuul commented Oct 7, 2016

axsuul commented Oct 14, 2016

axsuul commented Oct 14, 2016

tuananh commented Nov 16, 2016

wstrange commented Jan 4, 2017 •

edited

Loading

dev-e commented Jan 23, 2017

greglearns commented Jan 28, 2017

dev-e commented Feb 1, 2017

skylabreddy commented Nov 20, 2017

viveksinghggits commented Mar 14, 2019

axsuul commented Mar 14, 2019

SaltedEggIndomee commented Apr 24, 2019

ghost commented May 17, 2019

JnMik commented Jul 4, 2019 •

edited

Loading

ghost commented Jul 4, 2019 via email

bobbui commented Jul 5, 2019 •

edited

Loading

mansurali901 commented Mar 5, 2020

bagulm123 commented Jul 7, 2020

truongtrevor commented Jul 26, 2020

truongtrevor commented Jul 26, 2020

nemo-xue commented Dec 17, 2020

immanuelfodor commented Dec 17, 2020

nemo-xue commented Dec 17, 2020

thomasresley commented Mar 12, 2021 •

edited

Loading

dinesh25cs commented Feb 11, 2022 •

edited

Loading

rajaduraicloud commented Aug 23, 2022

Node NotReady status with "Kubelet stopped posting node status error" #34314

Node NotReady status with "Kubelet stopped posting node status error" #34314

Comments

axsuul commented Oct 7, 2016

axsuul commented Oct 14, 2016

axsuul commented Oct 14, 2016

tuananh commented Nov 16, 2016

wstrange commented Jan 4, 2017 • edited Loading

dev-e commented Jan 23, 2017

greglearns commented Jan 28, 2017

dev-e commented Feb 1, 2017

skylabreddy commented Nov 20, 2017

viveksinghggits commented Mar 14, 2019

axsuul commented Mar 14, 2019

SaltedEggIndomee commented Apr 24, 2019

ghost commented May 17, 2019

JnMik commented Jul 4, 2019 • edited Loading

ghost commented Jul 4, 2019 via email

bobbui commented Jul 5, 2019 • edited Loading

mansurali901 commented Mar 5, 2020

bagulm123 commented Jul 7, 2020

truongtrevor commented Jul 26, 2020

truongtrevor commented Jul 26, 2020

nemo-xue commented Dec 17, 2020

immanuelfodor commented Dec 17, 2020

nemo-xue commented Dec 17, 2020

thomasresley commented Mar 12, 2021 • edited Loading

dinesh25cs commented Feb 11, 2022 • edited Loading

rajaduraicloud commented Aug 23, 2022

Node `NotReady` status with "Kubelet stopped posting node status error" #34314

Node `NotReady` status with "Kubelet stopped posting node status error" #34314

wstrange commented Jan 4, 2017 •

edited

Loading

JnMik commented Jul 4, 2019 •

edited

Loading

bobbui commented Jul 5, 2019 •

edited

Loading

thomasresley commented Mar 12, 2021 •

edited

Loading

dinesh25cs commented Feb 11, 2022 •

edited

Loading