Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to Reserve Sandbox Name #1666

Closed
ponderMuse opened this issue Oct 21, 2020 · 9 comments
Closed

Failed to Reserve Sandbox Name #1666

ponderMuse opened this issue Oct 21, 2020 · 9 comments

Comments

@ponderMuse
Copy link

inspection-report-20201019_090646.tar.gz

My microk8s master node is no longer in Ready state. Looking in logs I can see:

$ sudo journalctl -u snap.microk8s.daemon-containerd
...
Oct 17 10:42:33 pi-k8s-00 microk8s.daemon-containerd[44363]: time="2020-10-17T10:42:33.848409047Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"cert-manager-webhook>
Oct 17 10:42:33 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Main process exited, code=exited, status=1/FAILURE
Oct 17 10:42:33 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Scheduled restart job, restart counter is at 5.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: Stopped Service for snap application microk8s.daemon-containerd.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Start request repeated too quickly.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: Failed to start Service for snap application microk8s.daemon-containerd.
$ less /var/snap/microk8s/current/inspection-report/snap.microk8s.daemon-containerd/journal.log
Oct 18 14:48:03 pi-k8s-00 microk8s.daemon-containerd[239043]: time="2020-10-18T14:48:03.936439781Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"cert-manager-webhook-64b9b4fdfd-9d6tm_cert-manager_81fb08ac-7e87-42bd-9123-b0b8b098fe50_3\": name \"cert-manager-webhook-64b9b4fdfd-9d6tm_cert-manager_81fb08ac-7e87-42bd-9123-b0b8b098fe50_3\" is reserved for \"149b0aa92e3eb042f87353ead44a7247e756c8071f804bfbec3b781a5565e52c\""

I don’t know where else to look to get to the root of the problem and fix?

The master node which won’t start is running microk8s v1.18

$ ctr -v
ctr github.com/containerd/containerd 1.3.3-0ubuntu2

Thanks in advance,
PM.

@balchua
Copy link
Collaborator

balchua commented Oct 22, 2020

Hi @ponderMuse
Looks like you are using btrfs

Oct 19 09:05:52 pi-k8s-00 microk8s.daemon-containerd[1999169]: time="2020-10-19T09:05:52.188608334Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
Oct 19 09:05:52 pi-k8s-00 microk8s.daemon-containerd[1999169]: time="2020-10-19T09:05:52.189766852Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.btrfs (ext4) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1

Can you try these?
#1587 (comment)

@ponderMuse
Copy link
Author

Hi @balchua ,

From what I gathered in the comments within the link you provided I did the following on my setup:

Edited the file /var/snap/microk8s/current/args and changed line:

--feature-gates=DevicePlugins=true

To

--feature-gates="LocalStorageCapacityIsolation=false"

I then restarted microk8s:

$ microk8s stop
$ microk8s start

But microk8s inspect still shows containerd as not running with the same error you pointed out earlier:

Oct 22 13:37:16 pi-k8s-00 microk8s.daemon-containerd[182674]: time="2020-10-22T13:37:16.219161858Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.btrfs (ext4) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1

I've checked the filesystem too:

$ lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
NAME        FSTYPE    SIZE MOUNTPOINT          LABEL
loop0       squashfs 86.5M /snap/core/10131    
loop1       squashfs 60.8M /snap/lxd/17888     
loop2       squashfs 48.5M /snap/core18/1883   
loop3       squashfs 48.8M /snap/core18/1888   
loop4       squashfs   61M /snap/lxd/17938     
loop5       squashfs  174M /snap/microk8s/1711 
loop6       squashfs  174M /snap/microk8s/1670 
loop8       squashfs 26.9M /snap/snapd/9611    
loop10      squashfs 26.9M /snap/snapd/9730    
loop11      squashfs 86.5M /snap/core/10188    
mmcblk0              29.7G                     
├─mmcblk0p1 vfat      256M /boot/firmware      system-boot
└─mmcblk0p2 ext4     29.5G /                   writable

And indeed, the root partition is ext4 and not btrfs. Note however that the master node did run as ready until a certain point in time but I don't know when or what broke it. The most recent change I believe would have been the automated attempt to update from kubernetes from 1.18 to 1.19 (but I don't know if the master node was broken already by then or not).

$ kubectl get node
NAME        STATUS     ROLES    AGE   VERSION
pi-k8s-00   NotReady   <none>   82d   v1.18.6-1+b4f4cb0b7fe3c1
pi-k8s-01   Ready      <none>   82d   v1.19.2-34+37bbd8cebecb60

@balchua
Copy link
Collaborator

balchua commented Oct 22, 2020

@ponderMuse another thing i noticed, the /var/snap/microk8s/current/args/containerd.template.toml has this configuration at the end. disabled_plugins = ["cri"]. Which doesn't exist in MicroK8s source.

For 1.18 Microk8s version the containerd.template.toml looks like this https://github.com/ubuntu/microk8s/blob/1.18/microk8s-resources/default-args/containerd-template.toml

The logs you provided seems to be coming from a 1.19 version.

@ponderMuse
Copy link
Author

Hi @balchua

I had added disabled_plugins = ["cri"] myself recently in /var/snap/microk8s/current/args/containerd-template.toml in an attempt to fix this issue. I thought it might help after reading through comments in 'failed to reserve sandbox name' error after hard reboot #1014. I will remove it if this argument doesn't apply to microk8s v1.19 which, you are right, its what's currently installed:

$ snap info microk8s
name:      microk8s
summary:   Lightweight Kubernetes for workstations and appliances
publisher: Canonical✓
store-url: https://snapcraft.io/microk8s
contact:   https://github.com/ubuntu/microk8s
...
installed:          v1.19.2             (1711) 182MB classic

So it looks like:

On the master node, the snap package updated okay to v1.19.2 but its k8s node remains NotReady and at version 1.18.6.
On the worker node, the snap package updated okay to v1.19.2 and its k8s node is Ready and now at version 1.19.2.

I have now restored containerd-template.toml by removing the line disabled_plugins = ["cri"] and restarted microk8s but the error remains.

@balchua
Copy link
Collaborator

balchua commented Oct 23, 2020

Same error as before? I dont know if its too much to ask for the inspect tarball?

@ponderMuse
Copy link
Author

inspection-report-20201023_123733.tar.gz

Hi @balchua

Latest inspection tarball attached.

Looking in the tarball's content, one of the changes I have previously made (a while back) which I suppose is associated with containerd itself you can see in the file args/containerd-template.toml where I added my own docker registry (which is where my k8s pods' images get pulled from:

    # 'plugins."io.containerd.grpc.v1.cri".registry.mirrors' are namespace to mirror mapping for all namespaces.
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
        endpoint = ["https://registry-1.docker.io", ]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry:50001"]
        endpoint = ["http://registry:50001"]

registry:50001 is my own private docker registry where registry happens to be an alias for the same host on which the k8s master node is on. Thought I'd mention this since I've just come across this config while looking inside containerd-template.toml.

@balchua
Copy link
Collaborator

balchua commented Oct 24, 2020

Thanks @ponderMuse i think containerd data seems to be corrupted.

Checkout this issue #508 (comment)
Perhaps it can help bring containerd back online.

Btw, just letting you know that if you want to run a long lasting cluster and prefers to be on a more stable kubernetes version i suggest that you stick to a particular channel. Say for example 1.19/stable or 1.18/stable.

@ponderMuse
Copy link
Author

Hi @balchua,

The #508 (comment) you pointed me to did the trick.

I did:

$ microk8s.stop
$ mv /var/snap/microk8s/common/var/lib/containerd /var/snap/microk8s/common/var/lib/_containerd
$ microk8s.start

And indeed, containerd recreated itself from scratch and now is running okay again.

$ microk8s inspect
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
...

The kubernetes master node is now once again showing status Ready.

$ kubectl get node
NAME        STATUS   ROLES    AGE   VERSION
pi-k8s-00   Ready    <none>   84d   v1.19.2-34+37bbd8cebecb60
pi-k8s-01   Ready    <none>   84d   v1.19.2-34+37bbd8cebecb60

I have also now changed microk8s channel to 1.19/stable on both nodes as you suggested (I think I was on latest/stable before).

$ sudo snap refresh microk8s --channel=1.19/stable
$ kubectl get node
NAME        STATUS   ROLES    AGE   VERSION
pi-k8s-00   Ready    <none>   84d   v1.19.0-34+ff9309c628eb68
pi-k8s-01   Ready    <none>   84d   v1.19.0-34+ff9309c628eb68

Thanks for all your help balchua, much appreciated!
PM

@balchua
Copy link
Collaborator

balchua commented Oct 24, 2020

Great to hear that ot fixed your issue. 👍
Im going to close this one then! Thanks for using MicroK8s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants