Use "systemd" cgroup driver as default instead of dockers' "cgroupfs" #490

jjjms · 2020-06-09T15:49:53Z

What would you like to be added:
EKS AMI by default to use "systemd" cgroups driver for both kubelet and docker.

Why is this needed:
Since AL2 is using systemd and used systemd driver for cgroups managing, kubelet and docker using cgroupfs would result in systemd unaware of the resource allocation by cgroupfs and could result in system crash in certain cases.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

I have tested this by performing the following change in config files and adding the node back to master. In my testing the node was marked as Ready and I was able to create pods in this node.

### Cordoned a workernode:
k drain ip-192-168-0-171.us-west-2.compute.internal --ignore-daemonsets
node/ip-192-168-0-171.us-west-2.compute.internal already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/aws-node-jfnzw, kube-system/kube-proxy-sdx2q
node/ip-192-168-0-171.us-west-2.compute.internal drained


### Remove node entry from EKS so that node will be joined as a new entity altogether
k delete no ip-192-168-0-171.us-west-2.compute.internal


### Stopped kubelet:
[ec2-user@ip-192-168-0-171 ~]$ sudo systemctl stop kubelet docker

### Edited kubelet and docker config files to add systemd as cgroups manager:
[ec2-user@ip-192-168-0-171 ~]$ cat /etc/docker/daemon.json
{
  "bridge": "none",
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "10"
  },
  "live-restore": true,
  "max-concurrent-downloads": 10
}

### Since I'm using EKSCTl to create my cluster and nodegroup, I have modified the following file:
[root@ip-192-168-66-171 ec2-user]# cat /etc/eksctl/kubelet.yaml
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/eksctl/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
clusterDNS:
- 10.100.0.10
clusterDomain: cluster.local
featureGates:
  RotateKubeletServerCertificate: true
kind: KubeletConfiguration
kubeReserved:
  cpu: 70m
  ephemeral-storage: 1Gi
  memory: 200Mi
systemReserved:
  cpu: 1000m
  ephemeral-storage: 1Gi
  memory: 2Gi
serverTLSBootstrap: true

### Ran bootstrap.sh for the node to join master:
sudo /etc/eks/bootstrap.sh myclustername

### Found that new node came up healthy and was able to successfully run some nginx test pods on it:
k get no
NAME                                          STATUS   ROLES    AGE   VERSION
ip-192-168-0-171.us-west-2.compute.internal   Ready    <none>   10m   v1.15.11-eks-af3caf
ip-192-168-35-63.us-west-2.compute.internal   Ready    <none>   99m   v1.15.11-eks-af3caf

k get po -owide | grep 171 -c
8

Can we move into "systemd" driver for eks-optimized AMIs ?

Note: Found following GH Issue where setting kube-reserved/system-reserved memory was not taken into while calculating kubepods.slice "MemoryLimit". It was using node memory as its value.
kubernetes/kubernetes#88197

The text was updated successfully, but these errors were encountered:

reegnz · 2020-08-07T09:48:33Z

I would also highlight this part of https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

A single cgroup manager will simplify the view of what resources are being allocated and will by default have a more consistent view of the available and in-use resources. When we have two managers we end up with two views of those resources. We have seen cases in the field where nodes that are configured to use cgroupfs for the kubelet and Docker, and systemd for the rest of the processes running on the node becomes unstable under resource pressure.

Changing the settings such that your container runtime and kubelet use systemd as the cgroup driver stabilized the system.

That would lead me to believe that the current config in EKS AMI-s needs this this improvement, since the current config might lead to unstable nodes.

reegnz · 2020-08-10T17:20:47Z

@jjjms I have created a PR with the changes you have described. Now we just need to get that merged!

reegnz · 2020-08-10T17:37:51Z

I also found that kubeadm now actively checks for the cgroup driver to be systemd: kubernetes/kubernetes#73837
They also had a discussion about it here: kubernetes/kubeadm#1394

reegnz · 2020-08-10T17:53:45Z

Another interesting issue that might block this change is this one: kubernetes-sigs/kubespray#5134 (comment)

Systemd had some dbus issues that only got fixed in systemd 242. Apparently RedHat has backported that to 219, so
If AWS has backported that fix to systemd in Amazon Linux 2, then it's fine to make the switch.

Checking my own cluster systemd version seems to be 219 and I'm on the latest AMI for 1.15:

[root@ip-x-x-x-x /]# curl 169.254.169.254/latest/meta-data/placement/region
us-west-2[root@ip-x-x-x-x /]# curl 169.254.169.254/latest/meta-data/ami-id
ami-0b4f1df0761911a2a[root@ip-x-x-x-x /]# systemctl --version
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN
[root@ip-x-x-x-x /]#

Kubernetes documentation indicates that for stability reasons one should run kubernetes with the systemd cgroup driver if the init system itself is systemd. https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers Fixes #490

reegnz · 2020-12-17T16:12:58Z

@abeer91 could you please reopen this ticket? since #521 got reverted in #587 , this ticket should stay open until that change is applied again.

mmerkes · 2021-11-02T17:43:30Z

FYI, we're picking this issue back up. Asked @reegnz to post a new PR; otherwise, I can post one with the same changes.

cartermckinnon · 2022-12-01T22:18:17Z

We made this change for containerd in #717 . We don't plan to change docker to systemd before we remove it from the AMI.

reegnz mentioned this issue Aug 10, 2020

Change cgroup driver to systemd #521

Merged

abeer91 closed this as completed in #521 Dec 16, 2020

Callisto13 mentioned this issue Dec 31, 2020

[EKS] [request]: Use systemd cgroupdriver by default aws/containers-roadmap#1210

Closed

reegnz mentioned this issue Jan 8, 2021

Change cgroup driver to systemd #593

Closed

rtripat reopened this Mar 19, 2021

cartermckinnon closed this as completed Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use "systemd" cgroup driver as default instead of dockers' "cgroupfs" #490

Use "systemd" cgroup driver as default instead of dockers' "cgroupfs" #490

jjjms commented Jun 9, 2020

reegnz commented Aug 7, 2020

reegnz commented Aug 10, 2020

reegnz commented Aug 10, 2020

reegnz commented Aug 10, 2020

reegnz commented Dec 17, 2020 •

edited

Loading

mmerkes commented Nov 2, 2021

cartermckinnon commented Dec 1, 2022

Use "systemd" cgroup driver as default instead of dockers' "cgroupfs" #490

Use "systemd" cgroup driver as default instead of dockers' "cgroupfs" #490

Comments

jjjms commented Jun 9, 2020

reegnz commented Aug 7, 2020

reegnz commented Aug 10, 2020

reegnz commented Aug 10, 2020

reegnz commented Aug 10, 2020

reegnz commented Dec 17, 2020 • edited Loading

mmerkes commented Nov 2, 2021

cartermckinnon commented Dec 1, 2022

reegnz commented Dec 17, 2020 •

edited

Loading