Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"etcdserver: request timed out" #717

Closed
howardjohn opened this issue Jul 17, 2019 · 21 comments
Closed

"etcdserver: request timed out" #717

howardjohn opened this issue Jul 17, 2019 · 21 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question.

Comments

@howardjohn
Copy link
Contributor

What happened:
We started using KinD in our prow integration tests for Istio, and occasionally are seeing errors like
Error from server: error when creating "/logs/artifacts/galley-test-a99cf400cb3343eeac7/_suite_context/istio-deployment-577548603/istio-config-only.yaml": etcdserver: request timed out

What you expected to happen:
etcd doesn't timeout.

How to reproduce it (as minimally and precisely as possible):
This is the hard part, I am not sure how to reproduce this consistently. I do, however, have a bunch of logs from where it occurred - attached below.

I realize this is probably not a very actionable bug report, my main question is what info do we need to collect to root cause this?

Environment:

  • kind version: (use kind version): v0.3.0
  • Kubernetes version: (use kubectl version): I think 1.14? That is what comes with 0.3.0 right?
  • Docker version: (use docker info): 18.06.1-ce
  • OS (e.g. from /etc/os-release): Ubuntu 16.04

Logs:

@howardjohn howardjohn added the kind/bug Categorizes issue or PR as related to a bug. label Jul 17, 2019
@BenTheElder
Copy link
Member

I would look at resource usage on the host, this is probably a CPU exhaustion or disk I/O issue?

@howardjohn
Copy link
Contributor Author

Around the time this happened the node was at ~60% CPU utilization and disk IO was peaking at 108MB/s and 648 operations/s (write). The disk claims to support 122.8mb/s sustained throughput (256gb GCP SSD).

So it does seem like it was potentially hitting the limit?

@BenTheElder
Copy link
Member

That might be it, we had some problems with I/O throttling breaking various prowjobs (kind or otherwise) on prow.k8s.io due to heavy throttling because of highly exceeding the limits, I believe prow.k8s.io switched to appropriately sized pd-ssd on the nodes to avoid throttling. (IIRC 250gb pd-ssd?)

@BenTheElder
Copy link
Member

There should be a graph with I/O throttling in the VM details IIRC, though it averages so you can miss it if you're looking at broad timespans.

@howardjohn
Copy link
Contributor Author

You are right, heavily throttled, spikes at 100MB/s throttled. We just switched to 256gb SSDs (to match yours) because of semi-related issues where our docker instances were getting killed. The problem is we build a bunch of docker images at the start of all our tests which probably uses tons of disk. So I guess if we can optimize that to do less work this problem may resolve itself

@BenTheElder
Copy link
Member

That may help, another option is to try to avoid disk heavy jobs from co-scheduling.

Unfortunately Kubernetes / Prow are not terribly helpful there. Pod anti-affinity (or at least in the past a cheaper option was identical hostPorts on the pods, eg hostPort: 9999 on all pods that shouldn't be co-scheduled), could help, but currently disk I/O is not really something that is accounted for :/

As far as I know for disk heavy workloads the standard pattern is to have dedicated nodes with taints and labels for those specific workloads and schedule a limited number of them (say one per node) to those nodes, with other workloads on other nodes. That may not be the most practical for Prow.. 😞

@BenTheElder BenTheElder added the kind/support Categorizes issue or PR as a support question. label Jul 17, 2019
@howardjohn
Copy link
Contributor Author

We are getting this one a lot too, I suspect the same problem. Looks like kubeadm is health checking and timing out

+ kind create cluster --name=e2e-suite --loglevel debug
time="23:24:28" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\\t{{.Label \"io.k8s.sigs.kind.cluster\"}}]"
Creating cluster "e2e-suite" ...
 • Ensuring node image (kindest/node:v1.14.2) 🖼  ...
time="23:24:28" level=debug msg="Running: /usr/bin/docker [docker inspect --type=image kindest/node:v1.14.2]"
time="23:24:28" level=info msg="Pulling image: kindest/node:v1.14.2 ..."
time="23:24:28" level=debug msg="Running: /usr/bin/docker [docker pull kindest/node:v1.14.2]"
 ✓ Ensuring node image (kindest/node:v1.14.2) 🖼
 • Preparing nodes 📦  ...
time="23:24:54" level=debug msg="Running: /usr/bin/docker [docker info --format '{{json .SecurityOptions}}']"
time="23:24:54" level=debug msg="Running: /usr/bin/docker [docker run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname e2e-suite-control-plane --name e2e-suite-control-plane --label io.k8s.sigs.kind.cluster=e2e-suite --label io.k8s.sigs.kind.role=control-plane --expose 41951 -p 127.0.0.1:41951:6443 kindest/node:v1.14.2@sha256:33539d830a6cf20e3e0a75d0c46a4e94730d78c7375435e6b49833d81448c319]"
 ✓ Preparing nodes 📦
time="23:25:32" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\\t{{.Label \"io.k8s.sigs.kind.cluster\"}} --filter label=io.k8s.sigs.kind.cluster=e2e-suite]"
time="23:25:32" level=debug msg="Running: /usr/bin/docker [docker inspect -f {{index .Config.Labels \"io.k8s.sigs.kind.role\"}} e2e-suite-control-plane]"
 • Creating kubeadm config 📜  ...
time="23:25:32" level=debug msg="Running: /usr/bin/docker [docker exec --privileged e2e-suite-control-plane cat /kind/version]"
time="23:25:32" level=debug msg="Using kubeadm config:\napiServer:\n  certSANs:\n  - localhost\napiVersion: kubeadm.k8s.io/v1beta1\nclusterName: e2e-suite\ncontrollerManager:\n  extraArgs:\n    enable-hostpath-provisioner: \"true\"\nkind: ClusterConfiguration\nkubernetesVersion: v1.14.2\nnetworking:\n  podSubnet: 10.244.0.0/16\n---\napiVersion: kubeadm.k8s.io/v1beta1\nbootstrapTokens:\n- token: abcdef.0123456789abcdef\nkind: InitConfiguration\nlocalAPIEndpoint:\n  bindPort: 6443\nnodeRegistration:\n  criSocket: /run/containerd/containerd.sock\n---\napiVersion: kubeadm.k8s.io/v1beta1\nkind: JoinConfiguration\nnodeRegistration:\n  criSocket: /run/containerd/containerd.sock\n---\napiVersion: kubelet.config.k8s.io/v1beta1\nevictionHard:\n  imagefs.available: 0%\n  nodefs.available: 0%\n  nodefs.inodesFree: 0%\nimageGCHighThresholdPercent: 100\nkind: KubeletConfiguration\n---\napiVersion: kubeproxy.config.k8s.io/v1alpha1\nkind: KubeProxyConfiguration\n"
time="23:25:32" level=debug msg="Running: /usr/bin/docker [docker exec --privileged e2e-suite-control-plane mkdir -p /kind]"
time="23:25:33" level=debug msg="Running: /usr/bin/docker [docker exec --privileged -i e2e-suite-control-plane cp /dev/stdin /kind/kubeadm.conf]"
 ✓ Creating kubeadm config 📜
 • Starting control-plane 🕹️  ...
time="23:25:33" level=debug msg="Running: /usr/bin/docker [docker exec --privileged e2e-suite-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6]"
time="23:27:34" level=debug msg="I0718 23:25:34.203518     101 initconfiguration.go:186] loading configuration from \"/kind/kubeadm.conf\"\n[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta1, Kind=JoinConfiguration\nI0718 23:25:34.207689     101 interface.go:384] Looking for default routes with IPv4 addresses\nI0718 23:25:34.207708     101 interface.go:389] Default route transits interface \"eth0\"\nI0718 23:25:34.207838     101 interface.go:196] Interface eth0 is up\nI0718 23:25:34.207913     101 interface.go:244] Interface \"eth0\" has 1 addresses :[172.17.0.2/16].\nI0718 23:25:34.207947     101 interface.go:211] Checking addr  172.17.0.2/16.\nI0718 23:25:34.207957     101 interface.go:218] IP found 172.17.0.2\nI0718 23:25:34.207970     101 interface.go:250] Found valid IPv4 address 172.17.0.2 for interface \"eth0\".\nI0718 23:25:34.207978     101 interface.go:395] Found active IP 172.17.0.2 \nI0718 23:25:34.208228     101 feature_gate.go:226] feature gates: &{map[]}\n[init] Using Kubernetes version: v1.14.2\n[preflight] Running pre-flight checks\nI0718 23:25:34.208507     101 checks.go:581] validating Kubernetes and kubeadm version\nI0718 23:25:34.208532     101 checks.go:172] validating if the firewall is enabled and active\nI0718 23:25:34.232891     101 checks.go:209] validating availability of port 6443\nI0718 23:25:34.233091     101 checks.go:209] validating availability of port 10251\nI0718 23:25:34.233117     101 checks.go:209] validating availability of port 10252\nI0718 23:25:34.233145     101 checks.go:292] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml\nI0718 23:25:34.233157     101 checks.go:292] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml\nI0718 23:25:34.233164     101 checks.go:292] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml\nI0718 23:25:34.233171     101 checks.go:292] validating the existence of file /etc/kubernetes/manifests/etcd.yaml\nI0718 23:25:34.233180     101 checks.go:439] validating if the connectivity type is via proxy or direct\nI0718 23:25:34.233231     101 checks.go:475] validating http connectivity to first IP address in the CIDR\nI0718 23:25:34.233248     101 checks.go:475] validating http connectivity to first IP address in the CIDR\nI0718 23:25:34.233258     101 checks.go:105] validating the container runtime\nI0718 23:25:34.288326     101 checks.go:382] validating the presence of executable crictl\nI0718 23:25:34.288396     101 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables\n\t[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist\nI0718 23:25:34.288473     101 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward\nI0718 23:25:34.288521     101 checks.go:653] validating whether swap is enabled or not\nI0718 23:25:34.288574     101 checks.go:382] validating the presence of executable ip\nI0718 23:25:34.288605     101 checks.go:382] validating the presence of executable iptables\nI0718 23:25:34.288733     101 checks.go:382] validating the presence of executable mount\nI0718 23:25:34.288764     101 checks.go:382] validating the presence of executable nsenter\nI0718 23:25:34.288857     101 checks.go:382] validating the presence of executable ebtables\nI0718 23:25:34.288922     101 checks.go:382] validating the presence of executable ethtool\nI0718 23:25:34.288959     101 checks.go:382] validating the presence of executable socat\nI0718 23:25:34.289022     101 checks.go:382] validating the presence of executable tc\nI0718 23:25:34.289058     101 checks.go:382] validating the presence of executable touch\nI0718 23:25:34.289111     101 checks.go:524] running all checks\nI0718 23:25:34.298118     101 checks.go:412] checking whether the given node name is reachable using net.LookupHost\nI0718 23:25:34.298357     101 checks.go:622] validating kubelet version\nI0718 23:25:34.478800     101 checks.go:131] validating if the service is enabled and active\nI0718 23:25:34.497175     101 checks.go:209] validating availability of port 10250\nI0718 23:25:34.497266     101 checks.go:209] validating availability of port 2379\nI0718 23:25:34.497290     101 checks.go:209] validating availability of port 2380\nI0718 23:25:34.497315     101 checks.go:254] validating the existence and emptiness of directory /var/lib/etcd\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\nI0718 23:25:34.510225     101 checks.go:842] image exists: k8s.gcr.io/kube-apiserver:v1.14.2\nI0718 23:25:34.583347     101 checks.go:842] image exists: k8s.gcr.io/kube-controller-manager:v1.14.2\nI0718 23:25:34.595777     101 checks.go:842] image exists: k8s.gcr.io/kube-scheduler:v1.14.2\nI0718 23:25:34.609813     101 checks.go:842] image exists: k8s.gcr.io/kube-proxy:v1.14.2\nI0718 23:25:34.627592     101 checks.go:842] image exists: k8s.gcr.io/pause:3.1\nI0718 23:25:34.642628     101 checks.go:842] image exists: k8s.gcr.io/etcd:3.3.10\nI0718 23:25:34.659836     101 checks.go:842] image exists: k8s.gcr.io/coredns:1.3.1\nI0718 23:25:34.659876     101 kubelet.go:61] Stopping the kubelet\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\nI0718 23:25:34.676713     101 kubelet.go:79] Starting the kubelet\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\nI0718 23:25:34.872872     101 certs.go:110] creating a new certificate authority for front-proxy-ca\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\nI0718 23:25:35.492604     101 certs.go:110] creating a new certificate authority for etcd-ca\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [e2e-suite-control-plane localhost] and IPs [172.17.0.2 127.0.0.1 ::1]\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [e2e-suite-control-plane localhost] and IPs [172.17.0.2 127.0.0.1 ::1]\nI0718 23:25:37.308288     101 certs.go:110] creating a new certificate authority for ca\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [e2e-suite-control-plane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost] and IPs [10.96.0.1 172.17.0.2]\nI0718 23:25:37.839914     101 certs.go:69] creating a new public/private key files for signing service account users\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\nI0718 23:25:37.927876     101 kubeconfig.go:94] creating kubeconfig file for admin.conf\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\nI0718 23:25:38.125191     101 kubeconfig.go:94] creating kubeconfig file for kubelet.conf\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\nI0718 23:25:38.445718     101 kubeconfig.go:94] creating kubeconfig file for controller-manager.conf\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\nI0718 23:25:38.635879     101 kubeconfig.go:94] creating kubeconfig file for scheduler.conf\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\nI0718 23:25:38.905193     101 manifests.go:114] [control-plane] getting StaticPodSpecs\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\nI0718 23:25:38.911844     101 manifests.go:130] [control-plane] wrote static Pod manifest for component \"kube-apiserver\" to \"/etc/kubernetes/manifests/kube-apiserver.yaml\"\nI0718 23:25:38.911874     101 manifests.go:114] [control-plane] getting StaticPodSpecs\nI0718 23:25:38.914252     101 manifests.go:130] [control-plane] wrote static Pod manifest for component \"kube-controller-manager\" to \"/etc/kubernetes/manifests/kube-controller-manager.yaml\"\nI0718 23:25:38.915626     101 manifests.go:114] [control-plane] getting StaticPodSpecs\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\nI0718 23:25:38.916525     101 manifests.go:130] [control-plane] wrote static Pod manifest for component \"kube-scheduler\" to \"/etc/kubernetes/manifests/kube-scheduler.yaml\"\nI0718 23:25:38.917370     101 local.go:60] [etcd] wrote Static Pod manifest for a local etcd member to \"/etc/kubernetes/manifests/etcd.yaml\"\nI0718 23:25:38.917403     101 waitcontrolplane.go:80] [wait-control-plane] Waiting for the API server to be healthy\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\nI0718 23:25:38.938873     101 loader.go:359] Config loaded from file /etc/kubernetes/admin.conf\nI0718 23:25:38.940563     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:39.442098     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:39.941195     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:40.441748     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:40.941833     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:41.441258     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:41.941248     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:42.441143     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:42.941231     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:43.441190     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:43.941150     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:44.441200     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:44.943183     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:45.441798     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:45.941973     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:46.441309     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:46.941184     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:47.441169     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:47.941224     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:48.441602     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:48.943495     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 1 milliseconds\nI0718 23:25:49.441970     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:49.941200     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:50.441597     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:50.941187     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:51.442974     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:51.942905     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:52.441242     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:52.941245     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:53.442853     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:53.941709     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:54.441271     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:54.941646     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:55.441535     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:55.941133     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:56.441116     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:56.941155     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:57.441164     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:57.941526     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:58.441244     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:58.941124     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:59.442062     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:25:59.941546     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:00.442171     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:00.941180     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:01.441096     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:01.941416     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:02.442609     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:02.941626     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:03.441891     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:03.941597     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:04.442628     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:04.941575     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:05.442278     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:05.941456     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:06.441183     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:06.942243     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:07.441599     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:07.941608     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:08.441116     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:08.941216     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:09.441416     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:09.941592     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:10.441123     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:10.941237     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:11.441500     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:11.941154     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:12.441575     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:12.941122     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:13.441267     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:13.941119     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:14.442407     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:14.941178     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:15.443057     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:15.941616     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:16.441580     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:16.941527     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:17.441163     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:17.941243     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:18.441273     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\n[kubelet-check] Initial timeout of 40s passed.\nI0718 23:26:18.942217     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\nI0718 23:26:19.442259     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:19.943322     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 1 milliseconds\nI0718 23:26:20.441180     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:20.941137     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:21.441427     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:21.941404     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:22.441487     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:22.941178     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:23.441194     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:23.941496     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\nI0718 23:26:24.441190     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:24.941524     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:25.441150     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:25.942183     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 1 milliseconds\nI0718 23:26:26.441159     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:26.941131     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:27.441144     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:27.941265     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:28.441190     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:28.941229     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:29.441608     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:29.941629     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:30.441545     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:30.941143     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:31.442358     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:31.941189     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:32.441507     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:32.941431     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:33.441453     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:33.941191     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\nI0718 23:26:34.441183     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:34.941633     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:35.441771     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:35.941225     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:36.441421     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:36.941285     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:37.441285     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:37.941254     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:38.441586     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:38.941178     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:39.441229     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:39.941357     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:40.442364     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 1 milliseconds\nI0718 23:26:40.941448     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:41.441597     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:41.942121     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:42.441359     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:42.941243     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:43.443731     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 2 milliseconds\nI0718 23:26:43.941355     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:44.441211     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:44.941148     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:45.441456     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:45.941423     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:46.441184     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:46.941557     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:47.441595     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:47.941660     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:48.441492     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:48.942219     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:49.441888     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:49.941138     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:50.441227     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:50.941206     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:51.441281     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:51.941265     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:52.441209     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:52.941503     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:53.442424     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 1 milliseconds\nI0718 23:26:53.941183     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\nI0718 23:26:54.441155     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:54.941167     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:55.442248     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:55.941566     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:56.442139     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:56.941176     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:57.441263     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:57.941169     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:58.442079     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:58.942095     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:59.441187     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:26:59.941098     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:00.441237     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:00.941420     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:01.441554     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:01.941572     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:02.441141     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:02.941166     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:03.441317     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:03.941503     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:04.441265     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:04.941919     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:05.441284     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:05.941471     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:06.441137     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:06.941644     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:07.441546     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:07.941644     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:08.441594     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:08.941041     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:09.441554     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:09.941179     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:10.441061     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:10.941225     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:11.441598     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:11.941198     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:12.441136     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:12.941212     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:13.441271     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:13.941497     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:14.441194     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:14.941198     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:15.441290     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:15.941492     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:16.441185     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:16.941478     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:17.441308     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:17.941339     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:18.441611     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:18.941981     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:19.441188     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:19.941149     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:20.441284     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:20.941182     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:21.441198     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:21.941600     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:22.441214     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:22.941131     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:23.441145     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:23.941478     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:24.441264     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:24.941605     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:25.441214     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:25.941405     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:26.441574     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:26.941179     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:27.441560     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:27.941219     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:28.441138     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:28.942359     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:29.441557     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:29.942183     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:30.441394     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:30.941548     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:31.441261     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:31.941410     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:32.441513     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:32.941500     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:33.441518     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 0 milliseconds\nI0718 23:27:33.943599     101 round_trippers.go:438] GET https://172.17.0.2:6443/healthz?timeout=32s  in 1 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.\n\nUnfortunately, an error has occurred:\n\ttimed out waiting for the condition\n\nThis error is likely caused by:\n\t- The kubelet is not running\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\nIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t- 'systemctl status kubelet'\n\t- 'journalctl -xeu kubelet'\n\nAdditionally, a control plane component may have crashed or exited when started by the container runtime.\nTo troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.\nHere is one example how you may list all Kubernetes containers running in docker:\n\t- 'docker ps -a | grep kube | grep -v pause'\n\tOnce you have found the failing container, you can inspect its logs with:\n\t- 'docker logs CONTAINERID'\nerror execution phase wait-control-plane: couldn't initialize a Kubernetes cluster"
 ✗ Starting control-plane 🕹️
time="23:27:34" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\\t{{.Label \"io.k8s.sigs.kind.cluster\"}} --filter label=io.k8s.sigs.kind.cluster=e2e-suite]"
time="23:27:34" level=debug msg="Running: /usr/bin/docker [docker rm -f -v e2e-suite-control-plane]"
Error: failed to create cluster: failed to init node with kubeadm: exit status 1
+ attempts=1
+ echo 'Could not setup KinD environment. Something wrong with KinD setup. Trying again.'
Could not setup KinD environment. Something wrong with KinD setup. Trying again.

@howardjohn
Copy link
Contributor Author

So we are seeing the a fair amount so I added some retries in istio/istio#15637 (and set loglevel=debug now for further debugging) to the setup. Any other ideas to mitigate this?

@BenTheElder
Copy link
Member

😞

Symptomatically the API Server is not coming up healthily, the most common case is that it was killed or evicted. That could be due to this.

kind / kubeadm can't do much there, it's expected that if the host doesn't have enough capacity to bring up the API server healthily in that time frame (controlled by kubeadm) that it won't come up. IIRC they've timed it for Raspberry PIs.

If you run create with --retain (so it doesn't cleanup on failure) and run a kind export logs before kind delete cluster (to cleanup in some exit handler) we can capture if the kubelet evicted the API server and what eviction threshold was crossed from the kubelet logs.

@BenTheElder
Copy link
Member

I would add that we don't see either of these on prow.k8s.io and I've not seen these locally*, I would still guess a resource management issue on the underlying hosts. I.E. the prow cluster nodes are overloaded / throttled 😬 Are the nodes being I/O throttled still?


Additional notes:

* there used to be an issue where if the host had low disk you'd see the more recent one you posted but we avoid that now by disabling disk based eviction.

  • I've not seen anyone suggest retrying cluster creation with kind, we have CI setups contributed from other users for a number of platforms and this hasn't come up before there or in any other issues.

@howardjohn
Copy link
Contributor Author

howardjohn commented Jul 19, 2019

Ok so dug into it a bit more. With the retry PR, it doesn't seem to help. Either kind comes up, or it fails 3 times - it never seems to fail once then succeed.

I ran another test now that the cluster is pretty much empty. The node had no pods running on it until the test was scheduled on it. The first step of the test is setting up the kind cluster: https://gubernator.k8s.io/build/istio-prow/pr-logs/pull/istio_istio/15642/integ-galley-k8s-presubmit-tests-master/3435/

Looking at the node metrics, IO write bytes throttle is peaking at 15mb/s. So even just the kind create cluster is causing it to get throttled to some extent - not sure how normal that is.

I did the retaining + log dump, hopefully that can help. Example failure is at https://prow.istio.io/view/gcs/istio-prow/pr-logs/pull/istio_istio/15637/integ-telemetry-k8s-presubmit-tests-master/2039, artifiacts will have all the logs

From these logs the most obvious looking error:

Jul 19 03:43:15 e2e-suite-control-plane containerd[46]: time="2019-07-19T03:43:15.596990905Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

There is also this:

Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: E0719 03:42:12.897914     231 raw.go:146] Failed to watch directory "/sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket": inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: E0719 03:42:12.897963     231 raw.go:146] Failed to watch directory "/sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice": inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: E0719 03:42:12.897975     231 raw.go:146] Failed to watch directory "/sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded": inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: E0719 03:42:12.897988     231 raw.go:146] Failed to watch directory "/sys/fs/cgroup/devices/docker": inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device
Jul 19 03:42:12 e2e-suite-control-plane kubelet[231]: F0719 03:42:12.898008     231 kubelet.go:1344] Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/devices/docker/412c7ad78fb8d9dd964a8c5751f592e4deb419a9fe508576ac8bde2e5309fded/system.slice/systemd-journald-dev-log.socket: no space left on device

We had some other tests where we got kind running, but then one of our containers running in kind failed to start a server with "no space left on device". I am pretty sure this is not literal disk space, but inotify limit? It seems it repeats that no space left error about 10x then it finally exits, which leads me to believe that is the root cause.

So seems like maybe increasing fs.inotify.max_user_watches may resolve this?

@BenTheElder
Copy link
Member

Either kind comes up, or it fails 3 times - it never seems to fail once then succeed.

That sounds very suspicious.

Looking at the node metrics, IO write bytes throttle is peaking at 15mb/s. So even just the kind create cluster is causing it to get throttled to some extent - not sure how normal that is.

I'm not sure either.

Jul 19 03:43:15 e2e-suite-control-plane containerd[46]: time="2019-07-19T03:43:15.596990905Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

That's normal there's a period during startup where the CNI config is not written out yet. This error will show up on many, many clusters while the CNI daemonset is still coming up and writing that out. As long as it eventually gets the CNI config this is normal and fine 😅

I am pretty sure this is not literal disk space, but inotify limit? It seems it repeats that no space left error about 10x then it finally exits, which leads me to believe that is the root cause.

Yes! Now this error I have seen before, but not much on prow.k8s.io. This is indeed an issue with running out of inotify watches.

So seems like maybe increasing fs.inotify.max_user_watches may resolve this?

Yes, and we should be able to write a daemonset to do this.

@BenTheElder
Copy link
Member

This should go in our known-issues guide as well, and prow.k8s.io should probably bump its inotify watches as well, IIRC they're relatively low on the cluster nodes by default.

@howardjohn
Copy link
Contributor Author

great, thanks for all the help! Now that we have the logs being dumped hopefully we can track down future issues faster now too.

@Katharine can you help us get the limit raised? I don't have access to the cluster

@BenTheElder
Copy link
Member

BenTheElder commented Jul 19, 2019

adapted to stable APIs from Azure/AKS#772 (comment)

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: tune-sysctls
  namespace: kube-system
  labels:
    app: tune-sysctls
spec:
  selector:
    matchLabels:
      name: tune-sysctls
  template:
    metadata:
      labels:
        name: tune-sysctls
    spec:
      hostNetwork: true
      hostPID: true
      hostIPC: true
      initContainers:
      - name: setsysctls 
        command:
        - sh
        - -c
        - sysctl -w fs.inotify.max_user_watches=524288;
        image: alpine:3.6
        imagePullPolicy: IfNotPresent
        name: setsysctls
        resources: {}
        securityContext:
          privileged: true
        volumeMounts:
        - name: sys
          mountPath: /sys
      containers:
      - name: sleepforever
        resources:
          requests:
            cpu: 0.01
        image: alpine:3.6
        name: sleepforever
        command: ["tail"]
        args: ["-f", "/dev/null"]
      volumes:
      - name: sys
        hostPath:
          path: /sys

(Note: arbitrary-ish number of watches, the default on some distros is like ~8k ish IIRC).

@BenTheElder
Copy link
Member

Filed kubernetes/test-infra#13515 to check this in somewhere.

@Katharine
Copy link
Contributor

I've applied the above across the istio test cluster. We should check it in to istio/test-infra so it accurately reflects reality.

@BenTheElder
Copy link
Member

We've deployed the daemonset today, the current version for prow is at https://github.com/kubernetes/test-infra/blob/f96fa91682a29c57838d4df17d9ef8d4ecf7260f/prow/cluster/tune-sysctls_daemonset.yaml (more or less the same, minor tweaks).

@howardjohn
Copy link
Contributor Author

With the fix things are going smoothly, thanks for the help. feel free to close this

@BenTheElder
Copy link
Member

Excellent! 😄
/close

@k8s-ci-robot
Copy link
Contributor

@BenTheElder: Closing this issue.

In response to this:

Excellent! 😄
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

4 participants