Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xfs?] kind create cluster fails on centos8 docker 20.10.3 with timeout #2050

Closed
maxant opened this issue Feb 3, 2021 · 5 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@maxant
Copy link

maxant commented Feb 3, 2021

What happened:
I installed kind v0.10.0 go1.15.7 linux/amd64 on a centos8 server running containerd and docker daemon.
I run ./kind -v 9 create cluster --wait 10m but after 2 minutes I get a timeout error, see below.

What you expected to happen:
No timeout error.

How to reproduce it (as minimally and precisely as possible):

./kind -v 9 create cluster --wait 10m

wait... and then:

Creating cluster "kind" ...
DEBUG: docker/images.go:58] Image: 
kindest/node:v1.20.2@sha256:8f7ea6e7642c0da54f04a7ee10431549c0257315b3a634f6ef2fecaaedb19bab present locally
 ✓ Ensuring node image (kindest/node:v1.20.2) 🖼 
 ✓ Preparing nodes 📦  
DEBUG: config/config.go:90] Using the following kubeadm config for node kind-control-plane:

...

I0203 21:59:19.491111     212 loader.go:379] Config loaded from file:  /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". 
This can take up to 4m0s

... the following is then repeated a lot:

I0203 22:01:13.999076     212 round_trippers.go:445] GET https://kind-control-plane:6443/healthz?timeout=10s  in 2 milliseconds
I0203 22:01:14.498120     212 round_trippers.go:445] GET https://kind-control-plane:6443/healthz?timeout=10s  in 1 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get 
"http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.

Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
	- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'

couldn't initialize a Kubernetes cluster

Anything else we need to know?:

while kind is trying to start up, I do this:

docker logs -f kind-control-plane

no errors are displayed, but near the bottom it says:

[  OK  ] Reached target Timers.
     Starting containerd container runtime...
     Starting kubelet: The Kubernetes Node Agent...
[  OK  ] Started containerd container runtime.
[  OK  ] Started kubelet: The Kubernetes Node Agent.

I run this docker exec kind-control-plane journalctl -u kubelet, and get the following error:

Feb 03 21:59:52 kind-control-plane systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 28.
Feb 03 21:59:52 kind-control-plane systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 03 21:59:52 kind-control-plane systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Feb 03 21:59:52 kind-control-plane systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 03 21:59:52 kind-control-plane kubelet[692]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 03 21:59:52 kind-control-plane kubelet[692]: Flag --provider-id has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 03 21:59:52 kind-control-plane kubelet[692]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 03 21:59:52 kind-control-plane kubelet[692]: Flag --cgroup-root has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 03 21:59:52 kind-control-plane kubelet[692]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 03 21:59:52 kind-control-plane kubelet[692]: Flag --provider-id has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 03 21:59:52 kind-control-plane kubelet[692]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 03 21:59:52 kind-control-plane kubelet[692]: Flag --cgroup-root has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.759672     692 server.go:416] Version: v1.20.2
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.760132     692 server.go:837] Client rotation is on, will bootstrap in background
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.764984     692 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.767662     692 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.795945     692 container_manager_linux.go:274] container manager verified user specified cgroup-root exists: [kubelet]
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.795999     692 container_manager_linux.go:279] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/kubelet CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796086     692 topology_manager.go:120] [topologymanager] Creating topology manager with none policy per container scope
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796101     692 container_manager_linux.go:310] [topologymanager] Initializing Topology Manager with none policy and container-level scope
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796111     692 container_manager_linux.go:315] Creating device plugin manager: true
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796361     692 remote_runtime.go:62] parsed scheme: ""
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796374     692 remote_runtime.go:62] scheme "" not registered, fallback to default scheme
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796412     692 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796425     692 clientconn.go:948] ClientConn switching balancer to "pick_first"
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796477     692 remote_image.go:50] parsed scheme: ""
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796490     692 remote_image.go:50] scheme "" not registered, fallback to default scheme
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796505     692 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796515     692 clientconn.go:948] ClientConn switching balancer to "pick_first"
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796565     692 kubelet.go:262] Adding pod path: /etc/kubernetes/manifests
Feb 03 21:59:52 kind-control-plane kubelet[692]: I0203 21:59:52.796601     692 kubelet.go:273] Watching apiserver
Feb 03 21:59:52 kind-control-plane kubelet[692]: E0203 21:59:52.798744     692 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://kind-control-plane:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 172.19.0.2:6443: connect: connection refused
Feb 03 21:59:52 kind-control-plane kubelet[692]: E0203 21:59:52.798858     692 reflector.go:138] k8s.io/kubernetes/pkg/kubelet/kubelet.go:438: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://kind-control-plane:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkind-control-plane&limit=500&resourceVersion=0": dial tcp 172.19.0.2:6443: connect: connection refused
Feb 03 21:59:52 kind-control-plane kubelet[692]: E0203 21:59:52.799006     692 reflector.go:138] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://kind-control-plane:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkind-control-plane&limit=500&resourceVersion=0": dial tcp 172.19.0.2:6443: connect: connection refused
Feb 03 21:59:52 kind-control-plane kubelet[692]: E0203 21:59:52.799498     692 remote_runtime.go:86] Version from runtime service failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
Feb 03 21:59:52 kind-control-plane kubelet[692]: E0203 21:59:52.799572     692 kuberuntime_manager.go:202] Get runtime version failed: get remote runtime typed version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
Feb 03 21:59:52 kind-control-plane kubelet[692]: F0203 21:59:52.799629     692 server.go:269] failed to run Kubelet: failed to create kubelet: get remote runtime typed version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService

Environment:

  • kind version: (use kind version):

kind v0.10.0 go1.15.7 linux/amd64

  • Kubernetes version: (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Error from server (NotFound): the server could not find the requested resource

  • Docker version: (use docker info):

Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
Containers: 26
Running: 20
Paused: 0
Stopped: 6
Images: 170
Server Version: 20.10.3
Storage Driver: devicemapper
Pool Name: docker-9:1-2150461680-pool
Pool Blocksize: 65.54kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data file: /dev/loop0
Metadata file: /dev/loop1
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 25.45GB
Data Space Total: 107.4GB
Data Space Available: 81.93GB
Metadata Space Used: 46.29MB
Metadata Space Total: 2.147GB
Metadata Space Available: 2.101GB
Thin Pool Minimum Free Space: 10.74GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.171-RHEL8 (2020-05-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.18.0-240.10.1.el8_3.x86_64
Operating System: CentOS Linux 8
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 30.96GiB
ID: 4USB:BZXG:XFXM:COII:MMPX:5FMC:Y6L2:OVPY:JKRO:BKB7:KCCK:HU2N
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
localhost:32000
127.0.0.0/8
Live Restore Enabled: false

WARNING: No blkio weight support
WARNING: No blkio weight_device support
WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
Use --storage-opt dm.thinpooldev to specify a custom block storage device.

  • OS (e.g. from /etc/os-release):

NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"

@maxant maxant added the kind/bug Categorizes issue or PR as related to a bug. label Feb 3, 2021
@BenTheElder
Copy link
Member

Feb 03 21:59:52 kind-control-plane kubelet[692]: F0203 21:59:52.799629 692 server.go:269] failed to run Kubelet: failed to create kubelet: get remote runtime typed version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService

sounds like containerd is failing to run, I bet the overlay is not working on top of xfs/device mapper. we don't commonly see people using xfs for kubernetes.

on a hunch you can try:

cat <<EOF | kind create cluster --config=-                                 
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
 [plugins."io.containerd.grpc.v1.cri".containerd]
 snapshotter = "native"
nodes:
- role: control-plane
   extraMounts:
   - hostPath: /dev/mapper
      containerPath: /dev/mapper
EOF

And with/without this the containerd logs would be helpful. You can run kind export logs after kind create cluster --retain and upload the directory here for lots more debug info.

@BenTheElder BenTheElder changed the title kind create cluster fails on centos8 docker 20.10.3 with timeout [xfs?] kind create cluster fails on centos8 docker 20.10.3 with timeout Feb 3, 2021
@BenTheElder
Copy link
Member

see also the warning from docker re: xfs + devicemapper:

WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.

You should probably consider moving to a supported docker configuration.

@ssinger
Copy link

ssinger commented Mar 21, 2021

I was getting the same 'unknown service runtime.v1alpha2.RuntimeService' error. I am also using xfs.

Running with the config from #2050 (comment) fixes/avoids the issue.

@BenTheElder
Copy link
Member

Thanks for confirming!
This should be relatively straightforward to fix such that we automatically do the equivalent of the config, we already do for certain other filesystems.

@BenTheElder
Copy link
Member

#2149 by @aojea fixes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants