Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeadm fails - kubelet fails to find /etc/kubernetes/bootstrap-kubelet.conf #3769

Closed
servo1x opened this issue Nov 27, 2018 · 21 comments
Closed

Comments

@servo1x
Copy link

servo1x commented Nov 27, 2018

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Environment:

  • Cloud provider or hardware configuration: None, 4 vagrant vms.

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 3.10.0-862.14.4.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Version of Ansible (ansible --version):
ansible 2.7.2
  config file = None
  configured module search path = [u'/Users/user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 2.7.15 (default, Aug 17 2018, 22:39:05) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]

Kubespray version (commit) (git rev-parse --short HEAD):

02169e8

Network plugin used:

Calico

Copy of your inventory file:

[all]
node1 ansible_host=10.10.10.10 ansible_user=vagrant ansible_become=true ansible_become_method=sudo ip=10.10.10.10 etcd_member_name=etcd1
node2 ansible_host=10.10.10.2  ansible_user=vagrant ansible_become=true ansible_become_method=sudo ip=10.10.10.2  etcd_member_name=etcd2
node3 ansible_host=10.10.10.3  ansible_user=vagrant ansible_become=true ansible_become_method=sudo ip=10.10.10.3  etcd_member_name=etcd3
node4 ansible_host=10.10.10.4  ansible_user=vagrant ansible_become=true ansible_become_method=sudo ip=10.10.10.4

[kube-master]
node1
node2
node3

[etcd]
node1
node2
node3

[kube-node]
node1
node2
node3
node4

[k8s-cluster:children]
kube-master
kube-node

[localhost]
127.0.0.1 ansible_connection=local kubeadm_enabled=true skip_non_kubeadm_warning=false ansible_become=false

Kubespray config:

---
bootstrap_os: centos

kernel_upgrade: false

nginx_config_dir: /data/nginx

etcd_data_dir: /data/etcd

cluster_name: "vagrant"
dns_domain: "{{ cluster_name }}.local"

dns_mode: coredns

deploy_netchecker: false

kube_config_dir: /data/kubernetes

kube_api_pwd: "{{ secret_kube_api_pwd }}"

kube_users:
  kube:
    pass: "{{ kube_api_pwd }}"
    role: admin
    groups:
      - system:masters

kube_network_plugin: calico
kubeadm_enabled: true
kube_proxy_mode: ipvs

docker_daemon_graph: "/data/docker"

dashboard_enabled: false

vault_base_dir: /data/vault

kubelet_load_modules: true
kubernetes_audit: true

docker_version: "18.06"
kube_version: v1.12.3
kubeadm_version: "{{ kube_version }}"
etcd_version: v3.2.24
coredns_version: "1.2.6"

kubeconfig_localhost: true

docker_dns_servers_strict: false
docker_storage_options: -s overlay2

kubelet_authentication_token_webhook: true
kubelet_authorization_mode_webhook: true

calico_felix_prometheusmetricsenabled: true
etcd_metrics: extensive
kube_read_only_port: 10255
kube_apiserver_insecure_port: 0
kube_api_anonymous_auth: true

Command used to invoke ansible:

ansible-playbook -i inventories/vagrant playbooks/kubespray_cluster.yml -vv --flush-cache -k --become --become-user=root -K --user=user

Output of ansible run:

TASK [kubernetes/master : kubeadm | Initialize first master] *********************************************************************************************************************************************
task path: /Users/user/workspace/ops/ansible/vendor/kubespray/roles/kubernetes/master/tasks/kubeadm-setup.yml:117
Tuesday 27 November 2018  00:58:30 -0800 (0:00:02.496)       0:19:14.100 ******
skipping: [node2] => changed=false
  skip_reason: Conditional result was False
skipping: [node3] => changed=false
  skip_reason: Conditional result was False
fatal: [node1]: FAILED! => changed=true
  cmd:
  - timeout
  - -k
  - 600s
  - 600s
  - /usr/local/bin/kubeadm
  - init
  - --config=/data/kubernetes/kubeadm-config.v1alpha3.yaml
  - --ignore-preflight-errors=all
  delta: '0:03:06.063868'
  end: '2018-11-27 09:01:37.417495'
  failed_when_result: true
  msg: non-zero return code
  rc: 1
  start: '2018-11-27 08:58:31.353627'
  stderr: |2-
            [WARNING KubeletVersion]: couldn't get kubelet version: executable file not found in $PATH
    couldn't initialize a Kubernetes cluster
  stderr_lines:
  - "\t[WARNING KubeletVersion]: couldn't get kubelet version: executable file not found in $PATH"
  - couldn't initialize a Kubernetes cluster
  stdout: |-
    [init] using Kubernetes version: v1.12.3
    [preflight] running pre-flight checks
    [preflight/images] Pulling images required for setting up a Kubernetes cluster
    [preflight/images] This might take a minute or two, depending on the speed of your internet connection
    [preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
    [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [preflight] Activating the kubelet service
    [certificates] Generated front-proxy-ca certificate and key.
    [certificates] Generated front-proxy-client certificate and key.
    [certificates] Generated ca certificate and key.
    [certificates] Generated apiserver certificate and key.
    [certificates] apiserver serving cert is signed for DNS names [node1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.dt-vagrant.local kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.dt-vagrant.local localhost node1 node2 node3] and IPs [10.233.0.1 10.10.10.10 10.10.10.10 10.233.0.1 127.0.0.1 10.10.10.10 10.10.10.2 10.10.10.3]
    [certificates] Generated apiserver-kubelet-client certificate and key.
    [certificates] valid certificates and keys now exist in "/data/kubernetes/ssl"
    [certificates] Generated sa key and public key.
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
    [controlplane] Adding extra host path mount "audit-policy" to "kube-apiserver"
    [controlplane] Adding extra host path mount "audit-logs" to "kube-apiserver"
    [controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
    [controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
    [controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
    [init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
    [init] this might take a minute or longer if the control plane images have to be pulled
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

    Unfortunately, an error has occurred:
            timed out waiting for the condition

    This error is likely caused by:
            - The kubelet is not running
            - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
            - 'systemctl status kubelet'
            - 'journalctl -xeu kubelet'

    Additionally, a control plane component may have crashed or exited when started by the container runtime.
    To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
    Here is one example how you may list all Kubernetes containers running in docker:
            - 'docker ps -a | grep kube | grep -v pause'
            Once you have found the failing container, you can inspect its logs with:
            - 'docker logs CONTAINERID'
  stdout_lines: <omitted>

Anything else do we need to know:

[vagrant@node1 ~]$ sudo systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2018-11-27 09:00:16 UTC; 1s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
  Process: 27284 ExecStart=/usr/local/bin/kubelet $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBELET_API_SERVER $KUBELET_ADDRESS $KUBELET_PORT $KUBELET_HOSTNAME $KUBE_ALLOW_PRIV $KUBELET_ARGS $DOCKER_SOCKET $KUBELET_NETWORK_PLUGIN $KUBELET_VOLUME_PLUGIN $KUBELET_CLOUDPROVIDER (code=exited, status=255)
  Process: 27283 ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volume-plugins (code=exited, status=0/SUCCESS)
 Main PID: 27284 (code=exited, status=255)

Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.479833   27284 feature_gate.go:206] feature gates: &{map[]}
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.479967   27284 feature_gate.go:206] feature gates: &{map[]}
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.565929   27284 mount_linux.go:179] Detected OS with systemd
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566080   27284 server.go:408] Version: v1.12.3
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566178   27284 feature_gate.go:206] feature gates: &{map[]}
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566350   27284 feature_gate.go:206] feature gates: &{map[]}
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566452   27284 plugins.go:99] No cloud provider specified.
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566463   27284 server.go:524] No cloud provider specified: "" from the config file: ""
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566483   27284 bootstrap.go:61] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
Nov 27 09:00:16 node1 kubelet[27284]: F1127 09:00:16.566510   27284 server.go:262] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

Kubespray is successful if I disable kubeadm... any thoughts?

@gongzili456
Copy link

Same Problem, at step Upgrade first master.
the API Server is unhealthy, dial tcp ip:6443: connect: connection refused

TASK [kubernetes/master : sets kubeadm api version to v1alpha3] ******************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:35 +0000 (0:00:00.118)       0:07:09.368 ****
ok: [node1] => {"ansible_facts": {"kubeadmConfig_api_version": "v1alpha3"}, "changed": false}
ok: [node2] => {"ansible_facts": {"kubeadmConfig_api_version": "v1alpha3"}, "changed": false}
ok: [node3] => {"ansible_facts": {"kubeadmConfig_api_version": "v1alpha3"}, "changed": false}

TASK [kubernetes/master : set kubeadm_config_api_fqdn define] ********************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:36 +0000 (0:00:00.604)       0:07:09.972 ****

TASK [kubernetes/master : kubeadm | Create kubeadm config] ***********************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:36 +0000 (0:00:00.117)       0:07:10.090 ****
changed: [node1] => {"changed": true, "checksum": "beba3df2670ac508cb590fc1ab7a98773271ddd8", "dest": "/etc/kubernetes/kubeadm-config.v1alpha3.yaml", "gid": 0, "group": "root", "md5sum": "3d3ec8be0b1276978b07a097b4eb2773", "mode": "0644", "owner": "root", "size": 2488, "src": "/home/ubuntu/.ansible/tmp/ansible-tmp-1543377097.1-22799602472481/source", "state": "file", "uid": 0}
changed: [node2] => {"changed": true, "checksum": "9e70ec6c4c7ce90f12ef6093afe36fdf0f3a5e1b", "dest": "/etc/kubernetes/kubeadm-config.v1alpha3.yaml", "gid": 0, "group": "root", "md5sum": "2c018642cba08c6550073d8878f0348c", "mode": "0644", "owner": "root", "size": 2486, "src": "/home/ubuntu/.ansible/tmp/ansible-tmp-1543377097.15-125800389592253/source", "state": "file", "uid": 0}
changed: [node3] => {"changed": true, "checksum": "ec973b4b905d41f3eb181eaec6068ac2a85dfb7b", "dest": "/etc/kubernetes/kubeadm-config.v1alpha3.yaml", "gid": 0, "group": "root", "md5sum": "97dee72ceefe26c7242e0c1206572806", "mode": "0644", "owner": "root", "size": 2488, "src": "/home/ubuntu/.ansible/tmp/ansible-tmp-1543377097.22-142786641515500/source", "state": "file", "uid": 0}

TASK [kubernetes/master : kubeadm | Initialize first master] *********************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:38 +0000 (0:00:01.867)       0:07:11.958 ****

TASK [kubernetes/master : kubeadm | Upgrade first master] ************************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:38 +0000 (0:00:00.113)       0:07:12.071 ****
fatal: [node1]: FAILED! => {"changed": true, "cmd": ["timeout", "-k", "600s", "600s", "/usr/local/bin/kubeadm", "upgrade", "apply", "-y", "v1.12.3", "--config=/etc/kubernetes/kubeadm-config.v1alpha3.yaml", "--ignore-preflight-errors=all", "--allow-experimental-upgrades", "--allow-release-candidate-upgrades", "--etcd-upgrade=false", "--force"], "delta": "0:00:00.035118", "end": "2018-11-28 03:51:38.926230", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2018-11-28 03:51:38.891112", "stderr": "\t[WARNING APIServerHealth]: the API Server is unhealthy; /healthz didn't return \"ok\"\n\t[WARNING MasterNodesReady]: couldn't list masters in cluster: Get https://172.31.9.146:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D: dial tcp 172.31.9.146:6443: connect: connection refused\n[upgrade/version] FATAL: The --version argument is invalid due to these fatal errors:\n\n\t- Unable to fetch cluster version: Couldn't fetch cluster version from the API Server: Get https://172.31.9.146:6443/version?timeout=32s: dial tcp 172.31.9.146:6443: connect: connection refused\n\nPlease fix the misalignments highlighted above and try upgrading again", "stderr_lines": ["\t[WARNING APIServerHealth]: the API Server is unhealthy; /healthz didn't return \"ok\"", "\t[WARNING MasterNodesReady]: couldn't list masters in cluster: Get https://172.31.9.146:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D: dial tcp 172.31.9.146:6443: connect: connection refused", "[upgrade/version] FATAL: The --version argument is invalid due to these fatal errors:", "", "\t- Unable to fetch cluster version: Couldn't fetch cluster version from the API Server: Get https://172.31.9.146:6443/version?timeout=32s: dial tcp 172.31.9.146:6443: connect: connection refused", "", "Please fix the misalignments highlighted above and try upgrading again"], "stdout": "[preflight] Running pre-flight checks.\n[upgrade] Making sure the cluster is healthy:\n[upgrade/config] Making sure the configuration is correct:\n[upgrade/config] Reading configuration options from a file: /etc/kubernetes/kubeadm-config.v1alpha3.yaml\n[upgrade/apply] Respecting the --cri-socket flag that is set with higher priority than the config file.\n[upgrade/version] You have chosen to change the cluster version to \"v1.12.3\"", "stdout_lines": ["[preflight] Running pre-flight checks.", "[upgrade] Making sure the cluster is healthy:", "[upgrade/config] Making sure the configuration is correct:", "[upgrade/config] Reading configuration options from a file: /etc/kubernetes/kubeadm-config.v1alpha3.yaml", "[upgrade/apply] Respecting the --cri-socket flag that is set with higher priority than the config file.", "[upgrade/version] You have chosen to change the cluster version to \"v1.12.3\""]}

NO MORE HOSTS LEFT ***************************************************************************************************************************************************************************************************************************
	to retry, use: --limit @/home/ubuntu/kkkkube/kubespray-settings/cluster.retry

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0
node1                      : ok=266  changed=19   unreachable=0    failed=1
node2                      : ok=244  changed=18   unreachable=0    failed=0
node3                      : ok=244  changed=18   unreachable=0    failed=0
node4                      : ok=200  changed=11   unreachable=0    failed=0
node5                      : ok=200  changed=11   unreachable=0    failed=0
node6                      : ok=200  changed=11   unreachable=0    failed=0

@servo1x
Copy link
Author

servo1x commented Nov 30, 2018

@gongzili456 can you share the output of sudo systemctl status kubelet -l from the master as well?

I suspect kubelet is failing to startup because it can't find the bootstrap-kubelet.conf, right?

@lianghuiyuan
Copy link

yes, the kubelet is failing to startup. how to resolve it?
@servo1x

kubeadm join:

[i1987@k8s-node01 ~]$ sudo kubeadm join 172.16.18.53:6443 --token 3cxl4o.npf352g4ryvdl89i --discovery-token-ca-cert-hash sha256:88ddf380ab354067b0bb830ad6e76484f79073b3edbe3702ac1537d850f35cd4 --ignore-preflight-errors=all
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
	[WARNING Hostname]: hostname "k8s-node01" could not be reached
	[WARNING Hostname]: hostname "k8s-node01": lookup k8s-node01 on 100.100.2.136:53: no such host
[discovery] Trying to connect to API Server "172.16.18.53:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.18.53:6443"
[discovery] Requesting info from "https://172.16.18.53:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "172.16.18.53:6443"
[discovery] Successfully established connection with API Server "172.16.18.53:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized
[i1987@k8s-node01 ~]$ 
[i1987@k8s-node01 ~]$ kubeadm join 172.16.18.53:6443 --token t4dhp1.6c132knx4hh8oroz --discovery-token-ca-cert-hash sha256:88ddf380ab354067b0bb830ad6e76484f79073b3edbe3702ac1537d850f35cd4
[preflight] Running pre-flight checks
[preflight] Some fatal errors occurred:
	[ERROR IsPrivilegedUser]: user is not running as root
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
[i1987@k8s-node01 ~]$ kubeadm join 172.16.18.53:6443 --token t4dhp1.6c132knx4hh8oroz --discovery-token-ca-cert-hash sha256:88ddf380ab354067b0bb830ad6e76484f79073b3edbe3702ac1537d850f35cd4 --ignore-preflight-errors=all
[preflight] Running pre-flight checks
	[WARNING IsPrivilegedUser]: user is not running as root
	[WARNING CRI]: container runtime is not running: output: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.39/info: dial unix /var/run/docker.sock: connect: permission denied
, error: exit status 1
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 3.10.0-693.2.2.el7.x86_64
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
CONFIG_INET: enabled
CONFIG_EXT4_FS: enabled (as module)
CONFIG_PROC_FS: enabled
CONFIG_NETFILTER_XT_TARGET_REDIRECT: enabled (as module)
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled (as module)
CONFIG_OVERLAY_FS: enabled (as module)
CONFIG_AUFS_FS: not set - Required for aufs.
CONFIG_BLK_DEV_DM: enabled (as module)
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
	[WARNING SystemVerification]: failed to get docker info: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/info: dial unix /var/run/docker.sock: connect: permission denied
	[WARNING Hostname]: hostname "k8s-node01" could not be reached
	[WARNING Hostname]: hostname "k8s-node01": lookup k8s-node01 on 100.100.2.136:53: no such host
[discovery] Trying to connect to API Server "172.16.18.53:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.18.53:6443"
[discovery] Requesting info from "https://172.16.18.53:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "172.16.18.53:6443"
[discovery] Successfully established connection with API Server "172.16.18.53:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
couldn't save bootstrap-kubelet.conf to disk: open /etc/kubernetes/bootstrap-kubelet.conf: permission denied

kubelet status:

[i1987@k8s-node01 ~]$ sudo systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: inactive (dead) (Result: exit-code) since Wed 2018-12-05 18:45:28 CST; 42min ago
     Docs: https://kubernetes.io/docs/
  Process: 24839 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
 Main PID: 24839 (code=exited, status=255)

Dec 05 18:45:24 k8s-node01 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Dec 05 18:45:24 k8s-node01 systemd[1]: Unit kubelet.service entered failed state.
Dec 05 18:45:24 k8s-node01 systemd[1]: kubelet.service failed.
Dec 05 18:45:28 k8s-node01 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.

@lianghuiyuan
Copy link

I resolved it.
It is because k8s release the v1.13.0 version yesterday.
my k8s-master version is v1.12.3,
and today I add a node(it download the newest k8s version: v1.13.0) to my k8s cluster.
I Just update the k8s-master version resolved the problem.(or keep the master and nodes the same version)

How to update k8s version: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-13/

@servo1x
Copy link
Author

servo1x commented Dec 7, 2018

@lianghuiyuan Still having this issue on v1.13.0...

I'm setting up an entirely new cluster or trying to go from a kubespray non kubeadm to kubeadm config.

@justfortooltest
Copy link

I got same broblem , when i upgrade cluster from 1.8.10 to v1.12.3 , setup new v1.12.3 is very .

@servo1x
Copy link
Author

servo1x commented Dec 18, 2018

Anyone else having any luck with this? I see non kubeadm deploys have been completely removed from new releases as well... 😞

@servo1x
Copy link
Author

servo1x commented Dec 25, 2018

Any thoughts @riverzhang ?

@servo1x
Copy link
Author

servo1x commented Dec 26, 2018

Moving all the content from /data/kubernetes to /etc/kubernetes, allows the switch to kubeadm.

@servo1x servo1x closed this as completed Dec 26, 2018
@soulmz
Copy link

soulmz commented Jan 16, 2019

我也遇到此问题了,我尝试使用

kubeadm reset -f
kubeadm init --config /etc/kubernetes/kubeadm-config.yaml

result

[root@k8s-m1 ~]# kubeadm init --config /etc/kubernetes/kubeadm-config.yaml
[init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/ssl"
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate authority generation
[certs] External etcd mode: Skipping etcd/peer certificate authority generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation
[certs] Using existing ca certificate authority
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing apiserver certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 5m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
	- 'docker ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

我查看了 kubelet 启动日志 发现 kubelet 依然从 /etc/kubernetes/pki 查找 证书

[root@k8s-m1 ~]# systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since 三 2019-01-16 08:46:55 CST; 7s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
  Process: 40602 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
  Process: 40599 ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volume-plugins (code=exited, status=0/SUCCESS)
 Main PID: 40602 (code=exited, status=255)

1月 16 08:46:55 k8s-m1 systemd[1]: Unit kubelet.service entered failed state.
1月 16 08:46:55 k8s-m1 systemd[1]: kubelet.service failed.
[root@k8s-m1 ~]# journalctl -xeu kubelet
1月 16 08:52:02 k8s-m1 kubelet[945]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by
1月 16 08:52:02 k8s-m1 kubelet[945]: F0116 08:52:02.623031     945 server.go:244] unable to load client CA file /etc/kubernetes/pki/ca.crt: o
1月 16 08:52:02 k8s-m1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
1月 16 08:52:02 k8s-m1 systemd[1]: Unit kubelet.service entered failed state.
1月 16 08:52:02 k8s-m1 systemd[1]: kubelet.service failed.
1月 16 08:52:12 k8s-m1 systemd[1]: kubelet.service holdoff time over, scheduling restart.
1月 16 08:52:12 k8s-m1 systemd[1]: Stopped Kubernetes Kubelet Server.
-- Subject: Unit kubelet.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished shutting down.
1月 16 08:52:12 k8s-m1 systemd[1]: Starting Kubernetes Kubelet Server...
-- Subject: Unit kubelet.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has begun starting up.
1月 16 08:52:12 k8s-m1 systemd[1]: Started Kubernetes Kubelet Server.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
1月 16 08:52:12 k8s-m1 kubelet[979]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by
1月 16 08:52:12 k8s-m1 kubelet[979]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by
1月 16 08:52:12 k8s-m1 kubelet[979]: F0116 08:52:12.873068     979 server.go:244]unable to load client CA file /etc/kubernetes/pki/ca.crt: o
1月 16 08:52:12 k8s-m1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
1月 16 08:52:12 k8s-m1 systemd[1]: Unit kubelet.service entered failed state.
1月 16 08:52:12 k8s-m1 systemd[1]: kubelet.service failed.

1月 16 08:52:12 k8s-m1 kubelet[979]: F0116 08:52:12.873068 979 server.go:244]unable to load client CA file /etc/kubernetes/pki/ca.crt: o

我尝试修改 [certs] Using certificateDir folder "/etc/kubernetes/ssl" -> /etc/kubernetes/pki

success

[root@k8s-m1 ~]# kubeadm init --config /etc/kubernetes/kubeadm-config.yaml
[init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-m50 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost k8s-m1 k8s-m2 k8s-m3] and IPs [10.233.0.1 10.2.1.50 10.2.1.50 10.233.0.1 127.0.0.1 10.2.1.50 10.2.2.51 10.2.3.52]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping etcd/peer certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate authority generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 5m0s
[apiclient] All control plane components are healthy after 21.003030 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-m1" as an annotation
[mark-control-plane] Marking the node k8s-m1 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-m1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: suyxa7.x25v9cltnmjvjewb
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 10.2.1.50:6443 --token suyxa7.x25v9cltnmjvjewb --discovery-token-ca-cert-hash sha256:37156c269fbef1ea58772b69c2297c8981c494f6db397bc9c2403ac62bfa42f4

但是,我本地虚拟机执行是可以的。这是我在服务器离线部署遇到的问题。

Forgive me for not be able to use English description

@mrochcn
Copy link

mrochcn commented Feb 14, 2019

@zhangmz0223 same problem(同样的问题,但是我搞不懂)[certs] Using certificateDir folder "/etc/kubernetes/ssl" -> /etc/kubernetes/pki这个怎么更改

@soulmz
Copy link

soulmz commented Feb 14, 2019

@Mroch-Cn 这个只能修改具体的yaml配置。它这里的做法是写死了。

你可以打印下步骤信息, 里面有一个yaml配置文件在执行 kubeadm init 的时候是写死了 /etc/kubernetes/ssl ,我的做法是 修改成 /etc/kubernetes/pki

也可以,全文检索下。

刚查找了下,大概是 roles/kubespray-default/defaults/main.yaml 大概在 93行。

# This is where all the cert scripts and certs will be located
kube_cert_dir: "{{ kube_config_dir }}/ssl"

我修改过后是可以的。

@mrochcn
Copy link

mrochcn commented Feb 14, 2019 via email

@bsakweson
Copy link

I just ran into this issue and saw that I been closed but I cannot clearly understand what the fix is and if there is any branch that carry a fix. Can someone please point me to a fix. I believe I started seeing this after I updated my kernel from 3.10 to 4.20. I needed to do that in order to take advantage of some features provided by rook-ceph on my baremetal cluster.

@bsakweson
Copy link

So, I will try not to be longwinded on this one, my hope is that this save someone some pain. I had everything working fine on my on premises 7 node cluster. I realized some rook features like filesystem could not be used because of my kernel version. My hosts runs on Centos 7 that comes with kernel version 3.10. So I decided to tear everything down, update my kernel and got version 4.20 on all my host. Then I realized I ran into this 3986 issue. Apparently there was a bug on everything 2.8.2 and below causing it to fail on systems with Kernel version >= 4.19. That bug supposedly came from upstream kubernetes. It was fixed on version 1.13.0 and also accommodated here in kubespray but on the master branch only.

I traced the fix down to the master branch, read more about it 3986. Long story short even after an upgrade to use the master branch, and I know I am leaving on the edge, I still ran into this 4008. It turns out there are some significant changes on the master branch that also require a complete change of the inventory folder. The easiest way to do that is to copy the sample folder and then make the necessary changes to your host.ini or any additional changes that are specific to your environment.

@rijub2019
Copy link

rijub2019 commented Feb 20, 2019

Thanks @zhangmz0223 !! "{{ kube_config_dir }}/ssl -> {{ kube_config_dir }}/pki" this one worked for me.

@Panoptik
Copy link

Watching logs journalctl -xe show me the reason of the problem

Part of the existing bootstrap client certificate is expired:

Found solution in this SO answer
https://stackoverflow.com/a/56334732/2110663

@ykfq
Copy link

ykfq commented Jul 13, 2020

As the document said here The kubelet drop-in file for systemd:

The KubeConfig file to use for the TLS Bootstrap is /etc/kubernetes/bootstrap-kubelet.conf, 
but it is only used if /etc/kubernetes/kubelet.conf does not exist.

/etc/kubernetes/bootstrap-kubelet.conf was only used when /etc/kubernetes/kubelet.conf does not exist, so you can fix it:

  • copy a bootstrap-kubelet.conf from other nodes, just make sure it exsits;

  • renew a bootstrap token and replace the old one bootstrap file:

    new_token=$(kubeadm token create)
    sed -i "s/token: .*/token: $new_token/" /etc/kubernetes/bootstrap-kubelet.conf
    

    restart kubelet and it will generate a new kubelet.conf file.

    OR

  • generate kubelet.conf from here renew-kubernetes-pki-after-expired/56334732#56334732

@KeithTt
Copy link

KeithTt commented Oct 18, 2021

kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) >/etc/kubernetes/kubelet.conf

From here: kubernetes/kubernetes#84252

@bdaoudtdc
Copy link

Hello,
I previously used this command to solve my issue about "Kubelet client certificate rotation fails"
kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) > kubelet.conf

However , when I am trying to use it now I see this error
required flag(s) "config" not set
I am running version v1.20.12

I am not sure what should be in that config file.
Any ideas.

@swapnildhakne00
Copy link

swapnildhakne00 commented May 11, 2023

This is basically related to cluster certificates only.
check you cluster certifications and renew them and restart kubelet and container runtime. this mostly will solve the issue.

Steps I followed to solve the issue:
ENV:
Kubernetes Version : 1.16.7-0
Docker Version : 18.09.2-3.el7

take backup of /etc/kubernetes/*

kubeadm alpha certs check-expiration
kubeadm alpha certs renew all

systemctl restart docker
systemctl restart kubelet

If the kubelet certs are not renewed automatically,
we need to renew them manually,

cd /etc/kubernetes/
kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) > kubelet.conf

systemctl restart kubelet
cp /etc/kubernetes/admin.conf /root/.kube/config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests