-
Notifications
You must be signed in to change notification settings - Fork 349
'failed to reserve sandbox name' error after hard reboot #1014
Comments
The problem is that you have 2 instances of kube-scheduler which have "Attempt == 2". Do you have kubelet and contianerd log before the reboot? We need to figure out why that happened. There are 2 possibilities:
|
why it happened.. and how to recover from it.. with the other instance 0 of scheduler also sitting there in the list of sandbox containers. We may need to include additional aggressive recovery logic like we have on regular containers.. on the sandbox containers to address orphaned metadata entries and/or orphaned sandbox containers. |
@mikebrow Kubelet should handle everything, and we should only provide the truth of current state. However, in this case, there must be something wrong, either we are not providing the truth, or kubelet is not behaving correctly. |
Thanks for the responses. Unfortunately I didn't have kubelet or containerd log level set to debug at the time the issue happened. As you can see the VM was in such a bad state (I think due to high CPU and memory utilization) that kubelet doesn't even appear to have logs printed for awhile before reboot. Here are the kubelet logs at the time of reboot: Jan 02 05:59:45 node1 kubelet[250135]: W0102 05:44:26.511266 250135 container.go:507] Failed to update stats for container "/system.slice/systemd-journald.service": failed to parse memory.usage_in_bytes - read /sys/fs/cgroup/memory/system.slice/systemd-journald.se
rvice/memory.usage_in_bytes: no such device, continuing to push stats
Jan 02 06:02:50 node1 kubelet[250135]: I0102 05:58:15.625938 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 65h28m34.260464164s ago; threshold is 3m0s]
Jan 02 06:03:35 node1 kubelet[250135]: E0102 05:52:05.467883 250135 kuberuntime_sandbox.go:198] ListPodSandbox failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 06:05:34 node1 kubelet[250135]: E0102 06:03:59.681519 250135 container_log_manager.go:174] Failed to rotate container logs: failed to list containers: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 06:06:09 node1 kubelet[250135]: E0102 06:03:13.975741 250135 remote_image.go:67] ListImages with filter nil from image service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:06:59 node1 kubelet[250135]: E0102 06:02:00.945335 250135 kubelet.go:2093] Container runtime sanity check failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 06:07:12 node1 kubelet[250135]: E0102 06:04:39.453068 250135 kuberuntime_container.go:329] getKubeletContainers failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:08:18 node1 kubelet[250135]: E0102 06:01:24.763178 250135 remote_image.go:83] ImageStatus "k8s.gcr.io/pause:3.1" from image service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:08:54 node1 kubelet[250135]: E0102 06:04:16.730867 250135 generic.go:197] GenericPLEG: Unable to retrieve pods: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 06:09:27 node1 kubelet[250135]: I0102 06:06:09.937213 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 65h52m25.104780642s ago; threshold is 3m0s]
Jan 02 06:09:51 node1 kubelet[250135]: E0102 06:07:00.448259 250135 remote_image.go:146] ImageFsInfo from image service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 06:13:47 node1 kubelet[250135]: E0102 06:08:25.495647 250135 kubelet.go:1224] Container garbage collection failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:16:45 node1 kubelet[250135]: E0102 06:11:04.949897 250135 kubelet.go:1248] Image garbage collection failed multiple times in a row: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 06:18:31 node1 kubelet[250135]: E0102 06:10:47.436995 250135 remote_runtime.go:332] ExecSync eded36f67fb9fb5d4b53c2b13031b005488e4299b4525f19f067d5e4a28ec15b '/bin/sh -ec ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pk
i/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key get foo' from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:19:27 node1 kubelet[250135]: I0102 06:17:01.793096 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 65h59m22.909790739s ago; threshold is 3m0s]
Jan 02 06:19:43 node1 kubelet[250135]: W0102 06:19:43.265573 250135 container.go:507] Failed to update stats for container "/system.slice/rsyslog.service": read /sys/fs/cgroup/cpu,cpuacct/system.slice/rsyslog.service/cpuacct.usage_percpu: no such device, continuin
g to push stats
Jan 02 06:20:35 node1 kubelet[250135]: E0102 06:19:53.091800 250135 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://10.0.22.191:6443/api/v1/nodes?fieldSelector=metadata.name%3Dnode1&limit=500&resourceVersio
n=0: dial tcp 10.0.22.191:6443: i/o timeout
Jan 02 06:24:18 node1 kubelet[250135]: E0102 06:12:45.820773 250135 remote_runtime.go:262] ListContainers with filter &ContainerFilter{Id:,State:nil,PodSandboxId:,LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = Unavailable desc =
grpc: the connection is unavailable
Jan 02 06:26:04 node1 kubelet[250135]: E0102 06:19:43.769109 250135 kuberuntime_image.go:102] ListImages failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:49:56 node1 kubelet[250135]: I0102 06:25:12.597611 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 66h11m46.185165456s ago; threshold is 3m0s]
Jan 02 06:49:56 node1 kubelet[250135]: E0102 06:26:48.997372 250135 container_log_manager.go:174] Failed to rotate container logs: failed to list containers: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 06:49:56 node1 kubelet[250135]: E0102 06:24:31.441784 250135 kuberuntime_image.go:87] ImageStatus for image {"k8s.gcr.io/pause:3.1"} failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:49:56 node1 kubelet[250135]: W0102 06:28:13.485985 250135 image_gc_manager.go:192] [imageGCManager] Failed to update image list: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:49:56 node1 kubelet[250135]: E0102 06:32:23.862407 250135 remote_runtime.go:434] Status from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 06:50:10 node1 kubelet[250135]: I0102 06:42:00.569834 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 66h20m49.039936615s ago; threshold is 3m0s]
Jan 02 06:51:09 node1 kubelet[250135]: E0102 06:43:59.285805 250135 kubelet_network.go:106] Failed to ensure marking rule for KUBE-MARK-DROP: timed out while checking rules
Jan 02 06:51:37 node1 kubelet[250135]: W0102 06:48:09.771347 250135 container.go:507] Failed to update stats for container "/system.slice/systemd-journald.service": failed to parse memory.usage_in_bytes - read /sys/fs/cgroup/memory/system.slice/systemd-journald.se
rvice/memory.usage_in_bytes: no such device, continuing to push stats
Jan 02 06:51:43 node1 kubelet[250135]: E0102 06:43:18.110305 250135 remote_runtime.go:169] ListPodSandbox with filter nil from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:41:35 node1 kubelet[250135]: I0102 06:53:03.432801 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 66h40m8.970872531s ago; threshold is 3m0s]
Jan 02 07:49:13 node1 kubelet[250135]: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:49:13 node1 kubelet[250135]: E0102 07:14:31.703814 250135 remote_image.go:67] ListImages with filter nil from image service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:49:13 node1 kubelet[250135]: E0102 07:27:16.923135 250135 container_log_manager.go:174] Failed to rotate container logs: failed to list containers: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:49:13 node1 kubelet[250135]: E0102 07:33:29.187245 250135 generic.go:197] GenericPLEG: Unable to retrieve pods: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:49:13 node1 kubelet[250135]: E0102 07:33:00.805360 250135 kuberuntime_image.go:102] ListImages failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:49:13 node1 kubelet[250135]: I0102 07:34:07.293984 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 67h16m46.680280374s ago; threshold is 3m0s]
Jan 02 07:49:13 node1 kubelet[250135]: W0102 07:40:12.294121 250135 container.go:507] Failed to update stats for container "/system.slice/systemd-journald.service": read /sys/fs/cgroup/cpu,cpuacct/system.slice/systemd-journald.service/cpuacct.stat: no such device,
continuing to push stats
Jan 02 07:49:13 node1 kubelet[250135]: E0102 07:36:32.089071 250135 event.go:212] Unable to write event: 'Patch https://10.0.22.191:6443/api/v1/namespaces/production/events/edge-rabbitmq-0.1574dcf4da048fa4: dial tcp 10.0.22.191:6443: i/o timeout' (may retry after
sleeping)
Jan 02 07:52:12 node1 kubelet[250135]: E0102 07:46:56.824013 250135 remote_image.go:146] ImageFsInfo from image service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 07:52:12 node1 kubelet[250135]: W0102 07:43:01.909734 250135 image_gc_manager.go:181] [imageGCManager] Failed to monitor images: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:54:22 node1 kubelet[250135]: E0102 07:44:32.112734 250135 remote_runtime.go:434] Status from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:54:22 node1 kubelet[250135]: E0102 07:44:12.374300 250135 kuberuntime_container.go:329] getKubeletContainers failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:54:46 node1 kubelet[250135]: E0102 07:48:54.837475 250135 kuberuntime_image.go:102] ListImages failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:55:52 node1 kubelet[250135]: I0102 07:49:18.584202 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 67h31m0.218586782s ago; threshold is 3m0s]
Jan 02 07:58:00 node1 kubelet[250135]: E0102 07:57:03.467992 250135 kubelet.go:1224] Container garbage collection failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 07:59:18 node1 kubelet[250135]: W0102 07:57:45.964650 250135 image_gc_manager.go:192] [imageGCManager] Failed to update image list: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 02 08:02:52 node1 kubelet[250135]: E0102 07:54:29.637241 250135 kubelet.go:1248] Image garbage collection failed multiple times in a row: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Jan 02 08:03:43 node1 kubelet[250135]: I0102 08:03:31.886928 250135 kubelet.go:1771] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 67h46m6.515419587s ago; threshold is 3m0s]
-- Reboot --
Jan 02 16:12:10 node1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 02 16:12:52 node1 kubelet[1014]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 02 16:12:52 node1 kubelet[1014]: Flag --container-log-max-files has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 02 16:12:52 node1 kubelet[1014]: Flag --container-log-max-size has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 02 16:12:52 node1 kubelet[1014]: Flag --runtime-request-timeout has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 02 16:12:53 node1 kubelet[1014]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 02 16:12:53 node1 kubelet[1014]: Flag --container-log-max-files has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 02 16:12:53 node1 kubelet[1014]: Flag --container-log-max-size has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 02 16:12:53 node1 kubelet[1014]: Flag --runtime-request-timeout has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 02 16:12:53 node1 kubelet[1014]: I0102 16:12:53.516342 1014 server.go:408] Version: v1.11.6
Jan 02 16:12:53 node1 kubelet[1014]: I0102 16:12:53.618406 1014 plugins.go:97] No cloud provider specified.
Jan 02 16:12:53 node1 kubelet[1014]: I0102 16:12:53.669199 1014 certificate_store.go:131] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.477425 1014 server.go:648] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.505471 1014 container_manager_linux.go:243] container manager verified user specified cgroup-root exists: []
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.505562 1014 container_manager_linux.go:248] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true}
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.506035 1014 container_manager_linux.go:267] Creating device plugin manager: true
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.506177 1014 state_mem.go:36] [cpumanager] initializing new in-memory state store
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.663586 1014 state_mem.go:84] [cpumanager] updated default cpuset: ""
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.663655 1014 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.677989 1014 kubelet.go:274] Adding pod path: /etc/kubernetes/manifests
Jan 02 16:12:54 node1 kubelet[1014]: I0102 16:12:54.692507 1014 kubelet.go:299] Watching apiserver
Jan 02 16:12:55 node1 kubelet[1014]: E0102 16:12:55.245453 1014 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://10.0.22.191:6443/api/v1/nodes?fieldSelector=metadata.name%3Dnode1&limit=500&resourceVersion=0: dial tcp 10.0.22.191:6443: connect: connection refused
Jan 02 16:12:55 node1 kubelet[1014]: E0102 16:12:55.245533 1014 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Get https://10.0.22.191:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.22.191:6443: connect: connection refused
Jan 02 16:12:55 node1 kubelet[1014]: E0102 16:12:55.245577 1014 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.22.191:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dnode1&limit=500&resourceVersion=0: dial tcp 10.0.22.191:6443: connect: connection refused
Jan 02 16:12:56 node1 kubelet[1014]: E0102 16:12:56.246381 1014 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://10.0.22.191:6443/api/v1/nodes?fieldSelector=metadata.name%3Dnode1&limit=500&resourceVersion=0: dial tcp 10.0.22.191:6443: connect: connection refused
Jan 02 16:12:56 node1 kubelet[1014]: E0102 16:12:56.247557 1014 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list *v1.Service: Get https://10.0.22.191:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.22.191:6443: connect: connection refused
Jan 02 16:12:56 node1 kubelet[1014]: E0102 16:12:56.248821 1014 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.22.191:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dnode1&limit=500&resourceVersion=0: dial tcp 10.0.22.191:6443: connect: connection refused
Jan 02 16:12:56 node1 kubelet[1014]: E0102 16:12:56.680498 1014 remote_runtime.go:69] Version from runtime service failed: rpc error: code = Unknown desc = server is not initialized yet
Jan 02 16:12:56 node1 kubelet[1014]: E0102 16:12:56.680840 1014 kuberuntime_manager.go:172] Get runtime version failed: rpc error: code = Unknown desc = server is not initialized yet
Jan 02 16:12:56 node1 kubelet[1014]: F0102 16:12:56.680888 1014 server.go:262] failed to run Kubelet: failed to create kubelet: rpc error: code = Unknown desc = server is not initialized yet
Jan 02 16:12:56 node1 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 02 16:12:56 node1 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan 02 16:13:06 node1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 02 16:13:06 node1 systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 1.
Jan 02 16:13:06 node1 systemd[1]: Stopped kubelet: The Kubernetes Node Agent. And the containerd logs at time of reboot: Jan 02 04:41:45 node1 containerd[29802]: time="2019-01-02T04:41:39.673899977Z" level=info msg="starting containerd" revision=9b32062dc1f5a7c2564315c269b5059754f12b9d version=v1.2.1
Jan 02 04:41:45 node1 containerd[29802]: time="2019-01-02T04:41:46.054889862Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1
Jan 02 04:42:06 node1 containerd[29802]: time="2019-01-02T04:41:56.968366655Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1
Jan 02 04:43:11 node1 containerd[29802]: time="2019-01-02T04:42:54.968807606Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with
the btrfs snapshotter"
Jan 02 04:43:17 node1 containerd[29802]: time="2019-01-02T04:43:16.423999810Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1
Jan 02 04:43:20 node1 containerd[29802]: time="2019-01-02T04:43:20.467404089Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1
Jan 02 04:43:20 node1 containerd[29802]: time="2019-01-02T04:43:20.481799661Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1
Jan 02 04:43:20 node1 containerd[29802]: time="2019-01-02T04:43:20.560129408Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1
Jan 02 04:43:22 node1 containerd[29802]: time="2019-01-02T04:43:20.930865651Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the z
fs snapshotter"
Jan 02 04:43:22 node1 containerd[29802]: time="2019-01-02T04:43:20.931327785Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1
Jan 02 04:43:22 node1 containerd[29802]: time="2019-01-02T04:43:20.932160282Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the b
trfs snapshotter"
Jan 02 04:43:22 node1 containerd[29802]: time="2019-01-02T04:43:20.932200870Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs sna
pshotter"
Jan 02 04:43:23 node1 containerd[29802]: time="2019-01-02T04:43:23.473684376Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." type=io.containerd.differ.v1
Jan 02 04:43:23 node1 containerd[29802]: time="2019-01-02T04:43:23.474422488Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." type=io.containerd.gc.v1
Jan 02 04:43:23 node1 containerd[29802]: time="2019-01-02T04:43:23.753873589Z" level=info msg="loading plugin "io.containerd.service.v1.containers-service"..." type=io.containerd.service.v1
Jan 02 04:43:30 node1 containerd[29802]: time="2019-01-02T04:43:23.920711116Z" level=info msg="loading plugin "io.containerd.service.v1.content-service"..." type=io.containerd.service.v1
Jan 02 04:43:30 node1 containerd[29802]: time="2019-01-02T04:43:23.920802019Z" level=info msg="loading plugin "io.containerd.service.v1.diff-service"..." type=io.containerd.service.v1
Jan 02 04:43:30 node1 containerd[29802]: time="2019-01-02T04:43:24.190301964Z" level=info msg="loading plugin "io.containerd.service.v1.images-service"..." type=io.containerd.service.v1
Jan 02 04:43:30 node1 containerd[29802]: time="2019-01-02T04:43:24.490346644Z" level=info msg="loading plugin "io.containerd.service.v1.leases-service"..." type=io.containerd.service.v1
Jan 02 04:43:30 node1 containerd[29802]: time="2019-01-02T04:43:24.490428291Z" level=info msg="loading plugin "io.containerd.service.v1.namespaces-service"..." type=io.containerd.service.v1
Jan 02 04:43:30 node1 containerd[29802]: time="2019-01-02T04:43:24.490459716Z" level=info msg="loading plugin "io.containerd.service.v1.snapshots-service"..." type=io.containerd.service.v1
Jan 02 04:43:30 node1 containerd[29802]: time="2019-01-02T04:43:24.490492829Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." type=io.containerd.runtime.v1
-- Reboot --
Jan 02 16:12:13 node1 systemd[1]: Starting containerd container runtime...
Jan 02 16:12:14 node1 systemd[1]: Started containerd container runtime.
Jan 02 16:12:40 node1 containerd[1119]: time="2019-01-02T16:12:40.703410994Z" level=info msg="starting containerd" revision=9b32062dc1f5a7c2564315c269b5059754f12b9d version=v1.2.1
Jan 02 16:12:40 node1 containerd[1119]: time="2019-01-02T16:12:40.704577289Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1
Jan 02 16:12:40 node1 containerd[1119]: time="2019-01-02T16:12:40.717921482Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1
Jan 02 16:12:40 node1 containerd[1119]: time="2019-01-02T16:12:40.718634192Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Jan 02 16:12:40 node1 containerd[1119]: time="2019-01-02T16:12:40.718821163Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.311105860Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.311671703Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.323692741Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.324241507Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.324273154Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.324322551Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/containerd/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.324346307Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.809974892Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." type=io.containerd.differ.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810046364Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." type=io.containerd.gc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810146257Z" level=info msg="loading plugin "io.containerd.service.v1.containers-service"..." type=io.containerd.service.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810194589Z" level=info msg="loading plugin "io.containerd.service.v1.content-service"..." type=io.containerd.service.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810233620Z" level=info msg="loading plugin "io.containerd.service.v1.diff-service"..." type=io.containerd.service.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810265645Z" level=info msg="loading plugin "io.containerd.service.v1.images-service"..." type=io.containerd.service.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810313526Z" level=info msg="loading plugin "io.containerd.service.v1.leases-service"..." type=io.containerd.service.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810345343Z" level=info msg="loading plugin "io.containerd.service.v1.namespaces-service"..." type=io.containerd.service.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810374493Z" level=info msg="loading plugin "io.containerd.service.v1.snapshots-service"..." type=io.containerd.service.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810405283Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." type=io.containerd.runtime.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810598859Z" level=info msg="loading plugin "io.containerd.runtime.v2.task"..." type=io.containerd.runtime.v2
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.810743549Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." type=io.containerd.monitor.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811431890Z" level=info msg="loading plugin "io.containerd.service.v1.tasks-service"..." type=io.containerd.service.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811478582Z" level=info msg="loading plugin "io.containerd.internal.v1.restart"..." type=io.containerd.internal.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811573944Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811608875Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811638061Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811666848Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811694655Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811723269Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811752069Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811780686Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.811809390Z" level=info msg="loading plugin "io.containerd.internal.v1.opt"..." type=io.containerd.internal.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.825527132Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.825594559Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:41.825627273Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:43.618229638Z" level=info msg="loading plugin "io.containerd.grpc.v1.cri"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:43.618524729Z" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntime:{Type:io.containerd.runtime.v1.linux Engine: Root: Options:<nil>} UntrustedWorkloadRuntime:{Type: Engine: Root: Options:<nil>} Runtimes:map[] NoPivot:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginConfTemplate:} Registry:{Mirrors:map[docker.io:{Endpoints:[https://registry-1.docker.io]}] Auths:map[]} StreamServerAddress:127.0.0.1 StreamServerPort:0 EnableSelinux:false SandboxImage:k8s.gcr.io/pause:3.1 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:43.618663252Z" level=info msg="Connect containerd service"
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:43.618975085Z" level=info msg="Get image filesystem path "/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs""
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:44.568049691Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." type=io.containerd.grpc.v1
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:44.568153282Z" level=info msg="Start subscribing containerd event"
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:44.568232670Z" level=info msg="Start recovering state"
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:44.575313027Z" level=info msg=serving... address="/run/containerd/containerd.sock"
Jan 02 16:12:45 node1 containerd[1119]: time="2019-01-02T16:12:44.575369347Z" level=info msg="containerd successfully booted in 4.100130s"
Jan 02 16:12:56 node1 containerd[1119]: time="2019-01-02T16:12:56.662098389Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name "kube-scheduler-node1_kube-system_705e7ce1217a37349a5567101e60165d_2": name "kube-scheduler-node1_kube-system_705e7ce1217a37349a5567101e60165d_2" is reserved for "139bb0ac7e050e9e28b994e78f651a8609f426f1b5bbfc887a0d4a3350b4eee2""
Jan 02 16:12:56 node1 systemd[1]: containerd.service: Main process exited, code=exited, status=1/FAILURE
Jan 02 16:12:56 node1 systemd[1]: containerd.service: Failed with result 'exit-code'. |
@steven-sheehy Can you search Let's see whether we can find something useful, maybe ont. |
Unfortunately, I don't see any mention of kube-scheduler in the kubelet logs. I do see that the system restarted on its own yesterday but before that hadn't rebooted since May. And I don't think there's anyway I can reproduce this either. $ sudo journalctl --list-boots
-3 325b4653bae24ff5a2271c54f8017ee2 Thu 2018-05-24 20:35:46 UTC—Thu 2018-05-24 20:42:30 UTC
-2 64c780b5ccfe4835a38c263471e4765a Thu 2018-05-24 20:43:12 UTC—Thu 2018-05-24 20:47:51 UTC
-1 fe79afade444401e8fcaca3a0efba8a3 Tue 2019-01-01 19:52:53 UTC—Wed 2019-01-02 15:42:13 UTC
0 48bf6b1af91f46a1aa89907ab7cd4517 Wed 2019-01-02 16:11:57 UTC—Wed 2019-01-02 19:38:42 UTC
$ sudo journalctl -u kubelet | grep -i "computePodActions got.*kube-scheduler"
$ sudo journalctl -u kubelet | grep -i kube-scheduler
$ |
If I can figure out how to start containerd, will there be any kubernetes audit events that are useful? |
If Kubelet creates different kube-scheduler instances with the same attempt, it should be rejected by containerd. So this is more likely an issue on the containerd side, either metadata corruption, or some bad logic. Can you run the following commands and post the result here?
We can check the created and updated timestamp, to see whether the metadata is touched after creation. And then let's see whether we can find something related in the containerd/kubelet log around that time. |
containerd is currently crashing with the aforementioned error, and it looks like ctr needs containerd to be running: $ sudo ctr -n=k8s.io containers info 095fa5bfdd75e411290d11e90ec84bc2dec87b4e15d5c6215761b2518c5f8683
ctr: failed to dial "/run/containerd/containerd.sock": context deadline exceeded
$ ps -aef | grep containerd
user+ 158624 148032 0 21:33 pts/1 00:00:00 grep --color=auto containerd
$ I think I could probably get containerd running by deleting /var/lib/containerd and /run/containerd, but was holding off in case you wanted me to debug anything further. |
you can start containerd with disable_plugins = [ cri ] in the
/etc/containerd/config.toml
…On Wed, Jan 2, 2019 at 1:36 PM Steven Sheehy ***@***.***> wrote:
containerd is currently crashing with the aforementioned error, and it
looks like ctr needs containerd to be running:
$ sudo ctr -n=k8s.io containers info 095fa5bfdd75e411290d11e90ec84bc2dec87b4e15d5c6215761b2518c5f8683
ctr: failed to dial "/run/containerd/containerd.sock": context deadline exceeded
$ ps -aef | grep containerd
user+ 158624 148032 0 21:33 pts/1 00:00:00 grep --color=auto containerd
$
I think I could probably get containerd running by deleting
/var/lib/containerd and /run/containerd, but was holding off in case you
wanted me to debug anything further.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1014 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFjVu_WAQkobBmO6A9gg825OuDeVYJSjks5u_SZAgaJpZM4Zm-Wc>
.
|
Collapsed output below: ctr -n=k8s.io containers info 095fa5bfdd75e411290d11e90ec84bc2dec87b4e15d5c6215761b2518c5f8683{
"ID": "095fa5bfdd75e411290d11e90ec84bc2dec87b4e15d5c6215761b2518c5f8683",
"Labels": {
"component": "kube-scheduler",
"io.cri-containerd.kind": "sandbox",
"io.kubernetes.pod.name": "kube-scheduler-node1",
"io.kubernetes.pod.namespace": "kube-system",
"io.kubernetes.pod.uid": "705e7ce1217a37349a5567101e60165d",
"tier": "control-plane"
},
"Image": "sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e",
"Runtime": {
"Name": "io.containerd.runtime.v1.linux",
"Options": {
"type_url": "containerd.linux.runc.RuncOptions"
}
},
"SnapshotKey": "095fa5bfdd75e411290d11e90ec84bc2dec87b4e15d5c6215761b2518c5f8683",
"Snapshotter": "overlayfs",
"CreatedAt": "2018-12-19T21:19:08.682812746Z",
"UpdatedAt": "2018-12-19T21:19:08.682812746Z",
"Extensions": {
"io.cri-containerd.sandbox.metadata": {
"type_url": "github.com/containerd/cri/pkg/store/sandbox/Metadata",
"value": "eyJWZXJzaW9uIjoidjEiLCJNZXRhZGF0YSI6eyJJRCI6IjA5NWZhNWJmZGQ3NWU0MTEyOTBkMTFlOTBlYzg0YmMyZGVjODdiNGUxNWQ1YzYyMTU3NjFiMjUxOGM1Zjg2ODMiLCJOYW1lIjoia3ViZS1zY2hlZHVsZXItZmlyZXNjb3BlMV9rdWJlLXN5c3RlbV83MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZF8wIiwiQ29uZmlnIjp7Im1ldGFkYXRhIjp7Im5hbWUiOiJrdWJlLXNjaGVkdWxlci1maXJlc2NvcGUxIiwidWlkIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSJ9LCJsb2dfZGlyZWN0b3J5IjoiL3Zhci9sb2cvcG9kcy83MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZCIsImRuc19jb25maWciOnsic2VydmVycyI6WyIxMC4wLjIyLjQ1IiwiMS4xLjEuMSJdLCJzZWFyY2hlcyI6WyJmaXJlc2NvcGUuaW50Il19LCJsYWJlbHMiOnsiY29tcG9uZW50Ijoia3ViZS1zY2hlZHVsZXIiLCJpby5rdWJlcm5ldGVzLnBvZC5uYW1lIjoia3ViZS1zY2hlZHVsZXItZmlyZXNjb3BlMSIsImlvLmt1YmVybmV0ZXMucG9kLm5hbWVzcGFjZSI6Imt1YmUtc3lzdGVtIiwiaW8ua3ViZXJuZXRlcy5wb2QudWlkIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJ0aWVyIjoiY29udHJvbC1wbGFuZSJ9LCJhbm5vdGF0aW9ucyI6eyJrdWJlcm5ldGVzLmlvL2NvbmZpZy5oYXNoIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJrdWJlcm5ldGVzLmlvL2NvbmZpZy5zZWVuIjoiMjAxOC0xMi0xOVQyMToxODo1NS41MjEwNjE5NzFaIiwia3ViZXJuZXRlcy5pby9jb25maWcuc291cmNlIjoiZmlsZSIsInNjaGVkdWxlci5hbHBoYS5rdWJlcm5ldGVzLmlvL2NyaXRpY2FsLXBvZCI6IiJ9LCJsaW51eCI6eyJjZ3JvdXBfcGFyZW50IjoiL2t1YmVwb2RzL2J1cnN0YWJsZS9wb2Q3MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZCIsInNlY3VyaXR5X2NvbnRleHQiOnsibmFtZXNwYWNlX29wdGlvbnMiOnsibmV0d29yayI6MiwicGlkIjoxfX19fSwiTmV0TlNQYXRoIjoiIiwiSVAiOiIiLCJSdW50aW1lSGFuZGxlciI6IiJ9fQ=="
}
},
"Spec": {
"ociVersion": "1.0.1-dev",
"process": {
"user": {
"uid": 0,
"gid": 0
},
"args": [
"/pause"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"effective": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"inheritable": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"permitted": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
]
},
"noNewPrivileges": true,
"oomScoreAdj": -998
},
"root": {
"path": "rootfs",
"readonly": true
},
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/dev/shm",
"type": "bind",
"source": "/run/containerd/io.containerd.grpc.v1.cri/sandboxes/095fa5bfdd75e411290d11e90ec84bc2dec87b4e15d5c6215761b2518c5f8683/shm",
"options": [
"rbind",
"ro"
]
}
],
"annotations": {
"io.kubernetes.cri.container-type": "sandbox",
"io.kubernetes.cri.sandbox-id": "095fa5bfdd75e411290d11e90ec84bc2dec87b4e15d5c6215761b2518c5f8683"
},
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
],
"cpu": {
"shares": 2
}
},
"cgroupsPath": "/kubepods/burstable/pod705e7ce1217a37349a5567101e60165d/095fa5bfdd75e411290d11e90ec84bc2dec87b4e15d5c6215761b2518c5f8683",
"namespaces": [
{
"type": "pid"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
}
],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
} ctr -n=k8s.io containers info 139bb0ac7e050e9e28b994e78f651a8609f426f1b5bbfc887a0d4a3350b4eee2{
"ID": "139bb0ac7e050e9e28b994e78f651a8609f426f1b5bbfc887a0d4a3350b4eee2",
"Labels": {
"component": "kube-scheduler",
"io.cri-containerd.kind": "sandbox",
"io.kubernetes.pod.name": "kube-scheduler-node1",
"io.kubernetes.pod.namespace": "kube-system",
"io.kubernetes.pod.uid": "705e7ce1217a37349a5567101e60165d",
"tier": "control-plane"
},
"Image": "k8s.gcr.io/pause:3.1",
"Runtime": {
"Name": "io.containerd.runtime.v1.linux",
"Options": {
"type_url": "containerd.linux.runc.RuncOptions"
}
},
"SnapshotKey": "139bb0ac7e050e9e28b994e78f651a8609f426f1b5bbfc887a0d4a3350b4eee2",
"Snapshotter": "overlayfs",
"CreatedAt": "2018-12-30T09:53:52.272719082Z",
"UpdatedAt": "2018-12-30T09:53:52.272719082Z",
"Extensions": {
"io.cri-containerd.sandbox.metadata": {
"type_url": "github.com/containerd/cri/pkg/store/sandbox/Metadata",
"value": "eyJWZXJzaW9uIjoidjEiLCJNZXRhZGF0YSI6eyJJRCI6IjEzOWJiMGFjN2UwNTBlOWUyOGI5OTRlNzhmNjUxYTg2MDlmNDI2ZjFiNWJiZmM4ODdhMGQ0YTMzNTBiNGVlZTIiLCJOYW1lIjoia3ViZS1zY2hlZHVsZXItZmlyZXNjb3BlMV9rdWJlLXN5c3RlbV83MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZF8yIiwiQ29uZmlnIjp7Im1ldGFkYXRhIjp7Im5hbWUiOiJrdWJlLXNjaGVkdWxlci1maXJlc2NvcGUxIiwidWlkIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImF0dGVtcHQiOjJ9LCJsb2dfZGlyZWN0b3J5IjoiL3Zhci9sb2cvcG9kcy83MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZCIsImRuc19jb25maWciOnsic2VydmVycyI6WyIxMC4wLjIyLjQ1IiwiMS4xLjEuMSJdLCJzZWFyY2hlcyI6WyJmaXJlc2NvcGUuaW50Il19LCJsYWJlbHMiOnsiY29tcG9uZW50Ijoia3ViZS1zY2hlZHVsZXIiLCJpby5rdWJlcm5ldGVzLnBvZC5uYW1lIjoia3ViZS1zY2hlZHVsZXItZmlyZXNjb3BlMSIsImlvLmt1YmVybmV0ZXMucG9kLm5hbWVzcGFjZSI6Imt1YmUtc3lzdGVtIiwiaW8ua3ViZXJuZXRlcy5wb2QudWlkIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJ0aWVyIjoiY29udHJvbC1wbGFuZSJ9LCJhbm5vdGF0aW9ucyI6eyJrdWJlcm5ldGVzLmlvL2NvbmZpZy5oYXNoIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJrdWJlcm5ldGVzLmlvL2NvbmZpZy5zZWVuIjoiMjAxOC0xMi0xOVQyMToxODo1NS41MjEwNjE5NzFaIiwia3ViZXJuZXRlcy5pby9jb25maWcuc291cmNlIjoiZmlsZSIsInNjaGVkdWxlci5hbHBoYS5rdWJlcm5ldGVzLmlvL2NyaXRpY2FsLXBvZCI6IiJ9LCJsaW51eCI6eyJjZ3JvdXBfcGFyZW50IjoiL2t1YmVwb2RzL2J1cnN0YWJsZS9wb2Q3MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZCIsInNlY3VyaXR5X2NvbnRleHQiOnsibmFtZXNwYWNlX29wdGlvbnMiOnsibmV0d29yayI6MiwicGlkIjoxfX19fSwiTmV0TlNQYXRoIjoiIiwiSVAiOiIiLCJSdW50aW1lSGFuZGxlciI6IiJ9fQ=="
}
},
"Spec": {
"ociVersion": "1.0.1-dev",
"process": {
"user": {
"uid": 0,
"gid": 0
},
"args": [
"/pause"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"effective": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"inheritable": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"permitted": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
]
},
"noNewPrivileges": true,
"oomScoreAdj": -998
},
"root": {
"path": "rootfs",
"readonly": true
},
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/dev/shm",
"type": "bind",
"source": "/run/containerd/io.containerd.grpc.v1.cri/sandboxes/139bb0ac7e050e9e28b994e78f651a8609f426f1b5bbfc887a0d4a3350b4eee2/shm",
"options": [
"rbind",
"ro"
]
}
],
"annotations": {
"io.kubernetes.cri.container-type": "sandbox",
"io.kubernetes.cri.sandbox-id": "139bb0ac7e050e9e28b994e78f651a8609f426f1b5bbfc887a0d4a3350b4eee2"
},
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
],
"cpu": {
"shares": 2
}
},
"cgroupsPath": "/kubepods/burstable/pod705e7ce1217a37349a5567101e60165d/139bb0ac7e050e9e28b994e78f651a8609f426f1b5bbfc887a0d4a3350b4eee2",
"namespaces": [
{
"type": "pid"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
}
],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
} ctr -n=k8s.io containers info 2428da7afb7fe092edb0a924c2a83b0aa1c37b71a0b572f47e064757e8f0e7c9{
"ID": "2428da7afb7fe092edb0a924c2a83b0aa1c37b71a0b572f47e064757e8f0e7c9",
"Labels": {
"component": "kube-scheduler",
"io.cri-containerd.kind": "sandbox",
"io.kubernetes.pod.name": "kube-scheduler-node1",
"io.kubernetes.pod.namespace": "kube-system",
"io.kubernetes.pod.uid": "705e7ce1217a37349a5567101e60165d",
"tier": "control-plane"
},
"Image": "k8s.gcr.io/pause:3.1",
"Runtime": {
"Name": "io.containerd.runtime.v1.linux",
"Options": {
"type_url": "containerd.linux.runc.RuncOptions"
}
},
"SnapshotKey": "2428da7afb7fe092edb0a924c2a83b0aa1c37b71a0b572f47e064757e8f0e7c9",
"Snapshotter": "overlayfs",
"CreatedAt": "2018-12-30T10:23:41.224499667Z",
"UpdatedAt": "2018-12-30T10:23:41.224499667Z",
"Extensions": {
"io.cri-containerd.sandbox.metadata": {
"type_url": "github.com/containerd/cri/pkg/store/sandbox/Metadata",
"value": "eyJWZXJzaW9uIjoidjEiLCJNZXRhZGF0YSI6eyJJRCI6IjI0MjhkYTdhZmI3ZmUwOTJlZGIwYTkyNGMyYTgzYjBhYTFjMzdiNzFhMGI1NzJmNDdlMDY0NzU3ZThmMGU3YzkiLCJOYW1lIjoia3ViZS1zY2hlZHVsZXItZmlyZXNjb3BlMV9rdWJlLXN5c3RlbV83MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZF8yIiwiQ29uZmlnIjp7Im1ldGFkYXRhIjp7Im5hbWUiOiJrdWJlLXNjaGVkdWxlci1maXJlc2NvcGUxIiwidWlkIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImF0dGVtcHQiOjJ9LCJsb2dfZGlyZWN0b3J5IjoiL3Zhci9sb2cvcG9kcy83MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZCIsImRuc19jb25maWciOnsic2VydmVycyI6WyIxMC4wLjIyLjQ1IiwiMS4xLjEuMSJdLCJzZWFyY2hlcyI6WyJmaXJlc2NvcGUuaW50Il19LCJsYWJlbHMiOnsiY29tcG9uZW50Ijoia3ViZS1zY2hlZHVsZXIiLCJpby5rdWJlcm5ldGVzLnBvZC5uYW1lIjoia3ViZS1zY2hlZHVsZXItZmlyZXNjb3BlMSIsImlvLmt1YmVybmV0ZXMucG9kLm5hbWVzcGFjZSI6Imt1YmUtc3lzdGVtIiwiaW8ua3ViZXJuZXRlcy5wb2QudWlkIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJ0aWVyIjoiY29udHJvbC1wbGFuZSJ9LCJhbm5vdGF0aW9ucyI6eyJrdWJlcm5ldGVzLmlvL2NvbmZpZy5oYXNoIjoiNzA1ZTdjZTEyMTdhMzczNDlhNTU2NzEwMWU2MDE2NWQiLCJrdWJlcm5ldGVzLmlvL2NvbmZpZy5zZWVuIjoiMjAxOC0xMi0xOVQyMToxODo1NS41MjEwNjE5NzFaIiwia3ViZXJuZXRlcy5pby9jb25maWcuc291cmNlIjoiZmlsZSIsInNjaGVkdWxlci5hbHBoYS5rdWJlcm5ldGVzLmlvL2NyaXRpY2FsLXBvZCI6IiJ9LCJsaW51eCI6eyJjZ3JvdXBfcGFyZW50IjoiL2t1YmVwb2RzL2J1cnN0YWJsZS9wb2Q3MDVlN2NlMTIxN2EzNzM0OWE1NTY3MTAxZTYwMTY1ZCIsInNlY3VyaXR5X2NvbnRleHQiOnsibmFtZXNwYWNlX29wdGlvbnMiOnsibmV0d29yayI6MiwicGlkIjoxfX19fSwiTmV0TlNQYXRoIjoiIiwiSVAiOiIiLCJSdW50aW1lSGFuZGxlciI6IiJ9fQ=="
}
},
"Spec": {
"ociVersion": "1.0.1-dev",
"process": {
"user": {
"uid": 0,
"gid": 0
},
"args": [
"/pause"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"effective": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"inheritable": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"permitted": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
]
},
"noNewPrivileges": true,
"oomScoreAdj": -998
},
"root": {
"path": "rootfs",
"readonly": true
},
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/dev/shm",
"type": "bind",
"source": "/run/containerd/io.containerd.grpc.v1.cri/sandboxes/2428da7afb7fe092edb0a924c2a83b0aa1c37b71a0b572f47e064757e8f0e7c9/shm",
"options": [
"rbind",
"ro"
]
}
],
"annotations": {
"io.kubernetes.cri.container-type": "sandbox",
"io.kubernetes.cri.sandbox-id": "2428da7afb7fe092edb0a924c2a83b0aa1c37b71a0b572f47e064757e8f0e7c9"
},
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
],
"cpu": {
"shares": 2
}
},
"cgroupsPath": "/kubepods/burstable/pod705e7ce1217a37349a5567101e60165d/2428da7afb7fe092edb0a924c2a83b0aa1c37b71a0b572f47e064757e8f0e7c9",
"namespaces": [
{
"type": "pid"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
}
],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
} |
I don't understand why this could happen... The only possibility I can imagine is that the previous sandbox failed to be loaded for some reason, so another sandbox with the same attempt got created. And later the old sandbox can be loaded again, so we get the naming conflict. Can you search |
The farthest my kubelet and containerd logs go back is Jan 02 03:22:26 and looks like containers above are created at 2018-12-30T10:23:41.224499667Z. Journalctl is set to only allow 1Gb of logs so that's why it doesn't go far back. I didn't notice it until back from holidays. At the time this occurred, the system was unusable and I couldn't even log into it. So it's possible that kubelet created a sandbox then failed right after even loading/using that same sandbox. Searching for "Failed to load sandbox" doesn't return anything. Is there any other place such events are stored? Looks like kubernetes events are only 1hr TTL. |
@steven-sheehy OK. Then just remove the bad container with Let's leave the issue open to see whether anyone else hits it. |
I highly suspect that the problem happened because:
The issue is that we don't currently have a good way to handle sandbox/container which fails to be loaded. The same issue was discussed in #884 (comment) and kubernetes/kubernetes#69060. We should probably keep the fail-to-load sandbox/container during recover, and represent them in a special state, e.g. unknown state. Kubelet should be aware of the unknown state, clean it up before starting new sandboxs/containers. |
Here is some lines from @steven-sheehy's containerd log:
This proves my theory #1014 (comment). Apparently your node is in a bad state @steven-sheehy, fixing that bad condition should help you eliminate this issue. However, we should better handle the case that sandbox/container failed to be loaded. We should still load them and keep them in unknown state. And kubelet should try to stop the sandbox/container before creating and starting new ones. |
@steven-sheehy Sorry for the delay. It looks like your node runs out of pid, thus you see the #1037 should fix the issue for you. And I'll fix the Kubernetes issue kubernetes/kubernetes#69060, which should completely fix the issue. |
Thanks @Random-Liu. Do you think containerd itself was causing the system to run out of pids or something else on the system? My system has open file descriptors set to 1048576. |
@steven-sheehy I don't think it is containerd itself. The issue is that containerd skips containers after load failure, which is the containerd problem. But pid exhaustion should be caused by something else on your node, e.g. your workload. |
PS: the config option for the workaround is |
In case anyone hits this issue in 2021 or beyond, it's worth noting that the plugin name is no longer "just" plain ole Therefore, one needs to set: -
in I'm using |
After a VM running Kubernetes became completely unresponsive, I had to forcefully restart the VM. Upon reboot, containerd fails to start due to the below error:
Most likely the abrupt shutdown caused the containerd database and filesystem to become out of sync, but I would hope containerd could be more aggressive in recovering from such an error. The container is stateless and should just be forcibly removed and re-added.
At minimum, is there any workaround to recover from the above? I've already tried deleting
/var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/139bb0ac7e050e9e28b994e78f651a8609f426f1b5bbfc887a0d4a3350b4eee2
and it didn't change anything.The text was updated successfully, but these errors were encountered: