-
-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cluster fails to start on cgroup v2 #493
Comments
I think I'm seeing the same or similar issue. When I rollback to
|
Same issue on Fedora 33:
Logs:
|
|
Seems like
started but keeps restarting for some reason.
|
Hi @derricms, thanks for opening this issue and Hi to the others who joined in 👋
@mj41-gdc you also saw an issue I guess with a filesystem like zfs or btrfs, right (/dev/mapper)? |
Yes, btfrs is Fedora 33 default filesystem. My disc is also encrypted (LUKS). Also selinux is enabled. I tried |
@mj41-gdc wait...
So k3s v1.20.4 works without issues but in k3d it throws the cgroup v2 incompatibility error? 🤔 Actually, on Fedora, you may as well have issues with |
@iwilltry42 My cluster started but container was restarting every few seconds. And I was not able to debug what is the root cause. I'm a newbie here. |
That's weird... @mj41-gdc , can you try |
Hi @iwilltry42, I did this (in short)
Full detail what I did: # setup
mkdir -p ~/devel/k3d
cd ~/devel/k3d
pwd
ls -al
# console 1
# check logs and send them to github if all seems ok
ls -als log*.txt
echo '```' ; head -n 10 log*.txt ; echo '```'
# cleanup previous run
k3d cluster delete default
rm ~/devel/k3d/log-*.txt
ls -al
cat README.md
# console 2
echo "#Start: `date --rfc-3339=ns`" > log-docker-events.txt ; docker events | tee -a log-docker-events.txt
# console 3
echo "#Start: `date --rfc-3339=ns`" > log-start-k3d.txt ; k3d cluster create default --image rancher/k3s:v1.20.4-k3s1 2>&1 | tee -a log-start-k3d.txt ; echo "#End: `date --rfc-3339=ns`" >> log-start-k3d.txt
# console 4
# a few times run these till you see the first restart
echo "#Start_ps: `date --rfc-3339=ns`" | tee -a log-docker-ps.txt ; docker ps | tee -a log-docker-ps.txt
# console 5
echo "#Start_logs: `date --rfc-3339=ns`" | tee -a log-docker-logs.txt ; docker logs --timestamps --details k3d-default-server-0 2>&1 | tee -a log-docker-logs.txt
# console 2
# Press ctrl+c
# check logs, repeat if needed Full logs are a few MBs so I put them here A few lines from each here:
Let me know what I should try next. And thank you very much for your time. |
Thanks for the input @mj41-gdc !
So the issue is with your filesystem. |
One step a time :-). Now I got
Full logs here |
That's a warning that you can ignore for now 👍 |
I tried to switch docker to cgroupfs
per kubernetes/kubeadm#1394 (comment)
But still the same
Per https://unix.stackexchange.com/questions/480747/how-to-find-out-if-systemd-uses-legacy-hybrid-or-unified-mode-cgroupsv1-vs-cgr and great
|
@iwilltry42 I get some time and motivation yesterday to find out that the error comes from https://github.com/kubernetes/kubernetes/blob/v1.20.4/pkg/kubelet/kubelet.go#L1366-L1369 ( Changes in master: |
I'm really lost here and have no idea at the moment what could be the issue 🤔 |
Hi,
Regards |
Hi @fr33ky , thanks for your input. |
Looks like @AkihiroSuda did a good job fixing issues with cgroup v2 in kind (see kubernetes-sigs/kind#2014). This could be a good starting point for us as well, even though our issue seems to be slightly different. |
Hello, I have the same issue on Arch Linux. I also have cgroup v2
|
I get exactly same errors with cgroup v2. Any hint to fix it? |
Using Debian Sid, in the meantime, I personally switched back to |
Works for me on Arch by executing |
For anyone running into this on NixOS, setting |
Thanks. Switching back to cgroup v1 works. |
For ArchLinux users that now run systemd v248+ and use systemd-boot here's how I fixed it for my system: ...
-options root=/dev/mapper/root
+options root=/dev/mapper/root systemd.unified_cgroup_hierarchy=0 Then verified with |
On Arch, the latest rancher-k3d-bin (v4.2.0) would just loop trying to start the servers... I followed what @wdv4758h suggested above. By executing This reverted me back to cgroup 1 verified by |
Issue still persist on k3d v4.4.2 with k3s v1.20.6-k3s1 Would be good if docs listed that |
I have the same issue on NixOS (unstable channel).
Also using cgroups v2. Just figuring out how to switch it to v1 with NixOS and I'll report back if it works. |
For cgroup v2, k3s/k3d needs to have a logic to evacuate the init process from the top-level cgroup to somewhere else, like this: https://github.com/moby/moby/blob/e0170da0dc6e660594f98bc66e7a98ce9c2abb46/hack/dind#L28-L37 |
Thanks for the hint @AkihiroSuda , I actually found a way to re-use your linked moby source and the changes you did in |
@iwilltry42 Yes, thanks |
#493 (comment) is most likely what you're after |
You can give this a try now on cgroupv2: I tested it without issues on Ubuntu 20.10 with cgroupv1 and cgroupv2 (systemd). |
@iwilltry42 I confirm that image works correctly with EDIT: I also confirm it works correctly with |
I just created a (temporary) fix/workaround using the entrypoint script that we can use until it was fixed upstream (in k3s). See PR #579 . |
@iwilltry42 i'm able to confirm that this adds v2 support on my system. thank you! |
Fixed by #579 (should not interfere with k3s-io/k3s#3242 later) |
works on Fedora 33 with Docker and cgroupv2. Great work. Thank you @iwilltry42 , @AkihiroSuda and others.
Detailed logs https://github.com/mj41-gdc/k3d-debug/tree/k3d-issues-493-mj7 |
What did you do
Start a minimal cluster on Kali Linux 2020.4
* How was the cluster created?
*
k3d cluster create
What did you do afterwards?
What did you expect to happen
That a minimal cluster would start
Screenshots or terminal output
Which OS & Architecture
Which version of
k3d
Which version of docker
Server: Engine: Version: 20.10.2+dfsg1 API version: 1.41 (minimum version 1.12) Go version: go1.15.6 Git commit: 8891c58 Built: Fri Jan 8 07:08:51 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.3~ds1 GitCommit: 1.4.3~ds1-1+b1 runc: Version: 1.0.0-rc92+dfsgl GitCommit: 1.0.0-rc92+dfsgl-5+b1 docker-init: Version: 0.19.0 GitCommit:
The text was updated successfully, but these errors were encountered: