Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use "systemd" cgroup driver as default instead of dockers' "cgroupfs" #490

Closed
jjjms opened this issue Jun 9, 2020 · 7 comments · Fixed by #521
Closed

Use "systemd" cgroup driver as default instead of dockers' "cgroupfs" #490

jjjms opened this issue Jun 9, 2020 · 7 comments · Fixed by #521

Comments

@jjjms
Copy link

jjjms commented Jun 9, 2020

What would you like to be added:
EKS AMI by default to use "systemd" cgroups driver for both kubelet and docker.

Why is this needed:
Since AL2 is using systemd and used systemd driver for cgroups managing, kubelet and docker using cgroupfs would result in systemd unaware of the resource allocation by cgroupfs and could result in system crash in certain cases.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

I have tested this by performing the following change in config files and adding the node back to master. In my testing the node was marked as Ready and I was able to create pods in this node.

### Cordoned a workernode:
k drain ip-192-168-0-171.us-west-2.compute.internal --ignore-daemonsets
node/ip-192-168-0-171.us-west-2.compute.internal already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/aws-node-jfnzw, kube-system/kube-proxy-sdx2q
node/ip-192-168-0-171.us-west-2.compute.internal drained


### Remove node entry from EKS so that node will be joined as a new entity altogether
k delete no ip-192-168-0-171.us-west-2.compute.internal


### Stopped kubelet:
[ec2-user@ip-192-168-0-171 ~]$ sudo systemctl stop kubelet docker

### Edited kubelet and docker config files to add systemd as cgroups manager:
[ec2-user@ip-192-168-0-171 ~]$ cat /etc/docker/daemon.json
{
  "bridge": "none",
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "10"
  },
  "live-restore": true,
  "max-concurrent-downloads": 10
}

### Since I'm using EKSCTl to create my cluster and nodegroup, I have modified the following file:
[root@ip-192-168-66-171 ec2-user]# cat /etc/eksctl/kubelet.yaml
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/eksctl/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
clusterDNS:
- 10.100.0.10
clusterDomain: cluster.local
featureGates:
  RotateKubeletServerCertificate: true
kind: KubeletConfiguration
kubeReserved:
  cpu: 70m
  ephemeral-storage: 1Gi
  memory: 200Mi
systemReserved:
  cpu: 1000m
  ephemeral-storage: 1Gi
  memory: 2Gi
serverTLSBootstrap: true

### Ran bootstrap.sh for the node to join master:
sudo /etc/eks/bootstrap.sh myclustername

### Found that new node came up healthy and was able to successfully run some nginx test pods on it:
k get no
NAME                                          STATUS   ROLES    AGE   VERSION
ip-192-168-0-171.us-west-2.compute.internal   Ready    <none>   10m   v1.15.11-eks-af3caf
ip-192-168-35-63.us-west-2.compute.internal   Ready    <none>   99m   v1.15.11-eks-af3caf

k get po -owide | grep 171 -c
8

Can we move into "systemd" driver for eks-optimized AMIs ?

Note: Found following GH Issue where setting kube-reserved/system-reserved memory was not taken into while calculating kubepods.slice "MemoryLimit". It was using node memory as its value.
kubernetes/kubernetes#88197

@reegnz
Copy link
Contributor

reegnz commented Aug 7, 2020

I would also highlight this part of https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

A single cgroup manager will simplify the view of what resources are being allocated and will by default have a more consistent view of the available and in-use resources. When we have two managers we end up with two views of those resources. We have seen cases in the field where nodes that are configured to use cgroupfs for the kubelet and Docker, and systemd for the rest of the processes running on the node becomes unstable under resource pressure.

Changing the settings such that your container runtime and kubelet use systemd as the cgroup driver stabilized the system.

That would lead me to believe that the current config in EKS AMI-s needs this this improvement, since the current config might lead to unstable nodes.

@reegnz
Copy link
Contributor

reegnz commented Aug 10, 2020

@jjjms I have created a PR with the changes you have described. Now we just need to get that merged!

@reegnz
Copy link
Contributor

reegnz commented Aug 10, 2020

I also found that kubeadm now actively checks for the cgroup driver to be systemd: kubernetes/kubernetes#73837
They also had a discussion about it here: kubernetes/kubeadm#1394

@reegnz
Copy link
Contributor

reegnz commented Aug 10, 2020

Another interesting issue that might block this change is this one: kubernetes-sigs/kubespray#5134 (comment)

Systemd had some dbus issues that only got fixed in systemd 242. Apparently RedHat has backported that to 219, so
If AWS has backported that fix to systemd in Amazon Linux 2, then it's fine to make the switch.

Checking my own cluster systemd version seems to be 219 and I'm on the latest AMI for 1.15:

[root@ip-x-x-x-x /]# curl 169.254.169.254/latest/meta-data/placement/region
us-west-2[root@ip-x-x-x-x /]# curl 169.254.169.254/latest/meta-data/ami-id
ami-0b4f1df0761911a2a[root@ip-x-x-x-x /]# systemctl --version
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN
[root@ip-x-x-x-x /]#

abeer91 pushed a commit that referenced this issue Dec 16, 2020
Kubernetes documentation indicates that for stability reasons
one should run kubernetes with the systemd cgroup driver if the
init system itself is systemd.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

Fixes #490
@reegnz
Copy link
Contributor

reegnz commented Dec 17, 2020

@abeer91 could you please reopen this ticket? since #521 got reverted in #587 , this ticket should stay open until that change is applied again.

@mmerkes
Copy link
Member

mmerkes commented Nov 2, 2021

FYI, we're picking this issue back up. Asked @reegnz to post a new PR; otherwise, I can post one with the same changes.

@cartermckinnon
Copy link
Member

We made this change for containerd in #717 . We don't plan to change docker to systemd before we remove it from the AMI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants