Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RKE2 hardened cluster fails to provision for k8s 1.25.5 #39148

Closed
rishabhmsra opened this issue Sep 28, 2022 · 13 comments
Closed

[BUG] RKE2 hardened cluster fails to provision for k8s 1.25.5 #39148

rishabhmsra opened this issue Sep 28, 2022 · 13 comments
Assignees
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release priority/0 release-note Note this issue in the milestone's release notes team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support team/infracloud
Milestone

Comments

@rishabhmsra
Copy link
Contributor

rishabhmsra commented Sep 28, 2022

Rancher Server Setup

  • Rancher version: v2.7-head(8062f1e)
  • Installation option (Docker install/Helm Chart): Docker
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): NA

Information about the Cluster

  • Kubernetes version: v1.25.5+rke2r1
  • Cluster Type (Local/Downstream): Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Custom hardened(1-cp, 1-etcd, 1-w)

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Admin
    • If custom, define the set of permissions:

Describe the bug

  • RKE2 hardened cluster fails to come into Active state if the CIS profile is set to 1.23.

To Reproduce

For cis profile 1.23 ->

Followed the hardening steps from here #36629 (comment)

  • Create 3 VMs and on the vms following sysctl /etc/sysctl.d/90-kubelet.conf configured:
vm.panic_on_oom=0
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
  • Run sudo sysctl -p /etc/sysctl.d/90-kubelet.conf to enable the settings
  • Set sudo useradd -r -c "etcd user" -s /sbin/nologin -M etcd -U
  • Create a custom rke2 cluster with below parameters set in the yaml file: Set profile to cis-1.23
spec:
  rkeConfig:
    machineSelectorConfig:
      - config:
          profile: cis-1.23
          protect-kernel-defaults: true
  • Registered the nodes in the custom cluster

Result

  • Cluster gets stuck int Updating state.
Provisioning logs:
[INFO ] waiting for viable init node
[INFO ] configuring bootstrap node(s) custom-da8b2be1a942: waiting for agent to check in and apply initial plan
[INFO ] configuring bootstrap node(s) custom-da8b2be1a942: waiting for probes: etcd, kubelet
[INFO ] configuring bootstrap node(s) custom-da8b2be1a942: waiting for probes: etcd
[INFO ] non-ready bootstrap machine(s) custom-da8b2be1a942 and join url to be available on bootstrap node
[INFO ] configuring control plane node(s) custom-e02f1540307d: waiting for agent to check in and apply initial plan
[INFO ] configuring control plane node(s) custom-e02f1540307d: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring control plane node(s) custom-e02f1540307d: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring control plane node(s) custom-e02f1540307d: waiting for probes: calico
[INFO ] configuring control plane node(s) custom-e02f1540307d: waiting for cluster agent to connect

Below are the logs on etcd node:

rancher-system-agent:
systemd[1]: Started Rancher System Agent.
rancher-system-agent[16137]: level=info msg="Rancher System Agent version v0.2.13 (4fa9427) is starting"
rancher-system-agent[16137]: level=info msg="Using directory /var/lib/rancher/agent/work for work"
rancher-system-agent[16137]: level=info msg="Starting remote watch of plans"
rancher-system-agent[16137]: level=info msg="Starting /v1, Kind=Secret controller"
rancher-system-agent[16137]: level=info msg="Detected first start, force-applying one-time instruction set"
rancher-system-agent[16137]: level=info msg="[Applyinator] Applying one-time instructions for plan with checksum 9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7"
rancher-system-agent[16137]: level=info msg="[Applyinator] Extracting image rancher/system-agent-installer-rke2:v1.25.5-rke2r2 to directory /var/lib/rancher/agent/work/20230113-054527/9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0"
rancher-system-agent[16137]: level=info msg="Using private registry config file at /etc/rancher/agent/registries.yaml"
rancher-system-agent[16137]: level=info msg="Pulling image index.docker.io/rancher/system-agent-installer-rke2:v1.25.5-rke2r2"
rancher-system-agent[16137]: level=info msg="Extracting file installer.sh to /var/lib/rancher/agent/work/20230113-054527/9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0/installer.sh"
rancher-system-agent[16137]: level=info msg="Extracting file rke2.linux-amd64.tar.gz to /var/lib/rancher/agent/work/20230113-054527/9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0/rke2.linux-amd64.tar.gz"
rancher-system-agent[16137]: level=info msg="Extracting file sha256sum-amd64.txt to /var/lib/rancher/agent/work/20230113-054527/9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0/sha256sum-amd64.txt"
rancher-system-agent[16137]: level=info msg="Extracting file run.sh to /var/lib/rancher/agent/work/20230113-054527/9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0/run.sh"
rancher-system-agent[16137]: level=info msg="[Applyinator] Running command: sh [-c run.sh]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + SA_INSTALL_PREFIX=/usr/local"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + mkdir -p /var/lib/rancher/rke2"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + SAI_FILE_DIR=/var/lib/rancher/rke2/system-agent-installer"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + RESTART_STAMP_FILE=/var/lib/rancher/rke2/system-agent-installer/rke2_restart_stamp"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + RKE2_SA_ENV_FILE_NAME=rke2-sa.env"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ ! -d /var/lib/rancher/rke2/system-agent-installer ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + mkdir -p /var/lib/rancher/rke2/system-agent-installer"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + check_target_mountpoint"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + mountpoint -q /usr/local"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + check_target_ro"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + touch /usr/local/.rke2-ro-test"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + rm -rf /usr/local/.rke2-ro-test"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + test 0 -ne 0"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + SYSTEMD_BASE_PATH=/usr/local/lib/systemd/system"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + RKE2_SA_ENV_FILE_PATH=/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + RKE2_SA_ENV_SRV_REF=EnvironmentFile=-/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ -f /var/lib/rancher/rke2/system-agent-installer/rke2_restart_stamp ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ -n d9a697e49e2d969a22b1c18cfd103b2aea74de2090116706322459cd6804fb02 ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [  != d9a697e49e2d969a22b1c18cfd103b2aea74de2090116706322459cd6804fb02 ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + RESTART=true"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + env INSTALL_RKE2_ARTIFACT_PATH=/var/lib/rancher/agent/work/20230113-054527/9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5>
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stdout]: [INFO]  staging local checksums from /var/lib/rancher/agent/work/20230113-054527/9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199>
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stdout]: [INFO]  staging tarball from /var/lib/rancher/agent/work/20230113-054527/9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150>
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stdout]: [INFO]  verifying tarball"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stdout]: [INFO]  unpacking tarball file to /usr/local"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ -f /var/lib/rancher/rke2/system-agent-installer/rke2-sa.env ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + OLD_ENV_FILE_PATH_HASH=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + install -m 600 /dev/null /var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + env"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + grep ^RKE2_"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + true"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + RKE2_ENV="
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ -n  ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + + grep -Ei ^(NO|HTTP|HTTPS)_PROXY"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: env"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + true"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + PROXY_ENV_INFO="
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ -n  ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + sha256sum /var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + awk {print $1}"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + NEW_ENV_FILE_PATH_HASH=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 != e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b85>
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ -z  ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + INSTALL_RKE2_TYPE=server"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + grep -q EnvironmentFile=-/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env /usr/local/lib/systemd/system/rke2-server.service"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + echo EnvironmentFile=-/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ -n d9a697e49e2d969a22b1c18cfd103b2aea74de2090116706322459cd6804fb02 ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + echo d9a697e49e2d969a22b1c18cfd103b2aea74de2090116706322459cd6804fb02"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + systemctl daemon-reload"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [  = true ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ server = server ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + systemctl is-active --quiet rke2-agent"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + systemctl enable rke2-server"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: Created symlink /etc/systemd/system/multi-user.target.wants/rke2-server.service → /usr/local/lib/systemd/system/rke2-server.service."
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [  = true ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + [ true = true ]"
rancher-system-agent[16137]: level=info msg="[9bf8dea46c298a41721a645bbc80a05baee098c5778160d7e06199e4c5b150c7_0:stderr]: + systemctl --no-block restart rke2-server"
rke2-server.service
systemd[1]: Starting Rancher Kubernetes Engine v2 (server)...
sh[1936]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
sh[1940]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
rke2[1949]: level=info msg="Applying Pod Security Admission Configuration"
rke2[1949]: level=info msg="Starting rke2 v1.25.5+rke2r2 (2ec773e1225b12144dbfeba7188575ab075dcd0d)"
rke2[1949]: level=info msg="Managed etcd cluster initializing"
rke2[1949]: level=info msg="generated self-signed CA certificate CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32.228988377 +0000 UTC notAfter=2033-01-10 06:19:32.228988377 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=system:admin,O=system:masters signed by CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=system:kube-controller-manager signed by CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=system:kube-scheduler signed by CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=system:apiserver,O=system:masters signed by CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=system:kube-proxy signed by CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=system:rke2-controller signed by CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=rke2-cloud-controller-manager signed by CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="generated self-signed CA certificate CN=rke2-server-ca@1673590772: notBefore=2023-01-13 06:19:32.234643589 +0000 UTC notAfter=2033-01-10 06:19:32.234643589 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=kube-apiserver signed by CN=rke2-server-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="generated self-signed CA certificate CN=rke2-request-header-ca@1673590772: notBefore=2023-01-13 06:19:32.236094334 +0000 UTC notAfter=2033-01-10 06:19:32.236094334 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=system:auth-proxy signed by CN=rke2-request-header-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="generated self-signed CA certificate CN=etcd-server-ca@1673590772: notBefore=2023-01-13 06:19:32.237349861 +0000 UTC notAfter=2033-01-10 06:19:32.237349861 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=etcd-server signed by CN=etcd-server-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=etcd-client signed by CN=etcd-server-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="generated self-signed CA certificate CN=etcd-peer-ca@1673590772: notBefore=2023-01-13 06:19:32.239242896 +0000 UTC notAfter=2033-01-10 06:19:32.239242896 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=etcd-peer signed by CN=etcd-peer-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="Starting etcd for new cluster"
rke2[1949]: level=info msg="certificate CN=rke2,O=rke2 signed by CN=rke2-server-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=warning msg="dynamiclistener [::]:9345: no cached certificate available for preload - deferring certificate load until storage initialization or first client request"
rke2[1949]: level=info msg="Active TLS secret / (ver=) (count 10): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.31.3.63:172.31.3.63 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/>
rke2[1949]: level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.k>
rke2[1949]: level=info msg="Tunnel server egress proxy mode: agent"
rke2[1949]: level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
rke2[1949]: level=info msg="Applying network policies..."
rke2[1949]: level=info msg="Restricting automount..."
rke2[1949]: level=info msg="Server node token is available at /var/lib/rancher/rke2/server/token"
rke2[1949]: level=info msg="To join server node to cluster: rke2 server -s https://172.31.3.63:9345 -t ${SERVER_NODE_TOKEN}"
rke2[1949]: level=info msg="Agent node token is available at /var/lib/rancher/rke2/server/agent-token"
rke2[1949]: level=info msg="To join agent node to cluster: rke2 agent -s https://172.31.3.63:9345 -t ${AGENT_NODE_TOKEN}"
rke2[1949]: level=info msg="Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory\""
rke2[1949]: level=info msg="Wrote kubeconfig /etc/rancher/rke2/rke2.yaml"
rke2[1949]: level=info msg="Run: rke2 kubectl"
rke2[1949]: level=warning msg="remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory"
rke2[1949]: level=info msg="Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [127.0.0.1:9345]"
rke2[1949]: level=info msg="Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> []"
rke2[1949]: level=info msg="certificate CN=ip-172-31-3-63 signed by CN=rke2-server-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="certificate CN=system:node:ip-172-31-3-63,O=system:nodes signed by CN=rke2-client-ca@1673590772: notBefore=2023-01-13 06:19:32 +0000 UTC notAfter=2024-01-13 06:19:32 +0000 UTC"
rke2[1949]: level=info msg="Module overlay was already loaded"
rke2[1949]: level=info msg="Module br_netfilter was already loaded"
rke2[1949]: level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400"
rke2[1949]: level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600"
rke2[1949]: level=info msg="Set sysctl 'net/ipv4/conf/all/forwarding' to 1"
rke2[1949]: level=info msg="Set sysctl 'net/netfilter/nf_conntrack_max' to 131072"
rke2[1949]: level=info msg="Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.25.5-rke2r2"
rke2[1949]: level=warning msg="Failed to load runtime image index.docker.io/rancher/rke2-runtime:v1.25.5-rke2r2 from tarball: no local image available for index.docker.io/rancher/rke2-runtime:v1.25.5-rke2r2: not found in any file in /var/lib/rancher/rke2/ag>
rke2[1949]: level=info msg="Checking local image archives in /var/lib/rancher/rke2/agent/images for index.docker.io/rancher/rke2-runtime:v1.25.5-rke2r2"
rke2[1949]: level=warning msg="Failed to load runtime image index.docker.io/rancher/rke2-runtime:v1.25.5-rke2r2 from tarball: no local image available for index.docker.io/rancher/rke2-runtime:v1.25.5-rke2r2: not found in any file in /var/lib/rancher/rke2/ag>
rke2[1949]: level=info msg="Using private registry config file at /etc/rancher/rke2/registries.yaml"
rke2[1949]: level=info msg="Pulling runtime image index.docker.io/rancher/rke2-runtime:v1.25.5-rke2r2"
rke2[1949]: level=info msg="Creating directory /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin"
rke2[1949]: level=info msg="Extracting file bin/containerd to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/containerd"
rke2[1949]: level=info msg="Extracting file bin/containerd-shim to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/containerd-shim"
rke2[1949]: level=info msg="Extracting file bin/containerd-shim-runc-v1 to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/containerd-shim-runc-v1"
rke2[1949]: level=info msg="Extracting file bin/containerd-shim-runc-v2 to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/containerd-shim-runc-v2"
rke2[1949]: level=info msg="Extracting file bin/crictl to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/crictl"
rke2[1949]: level=info msg="Extracting file bin/ctr to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/ctr"
rke2[1949]: level=info msg="Extracting file bin/kubectl to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/kubectl"
rke2[1949]: level=info msg="Extracting file bin/kubelet to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/kubelet"
rke2[1949]: level=info msg="Extracting file bin/runc to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/bin/runc"
rke2[1949]: level=info msg="Creating directory /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/charts"
rke2[1949]: level=info msg="Extracting file charts/harvester-cloud-provider.yaml to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/charts/harvester-cloud-provider.yaml"
rke2[1949]: level=info msg="Extracting file charts/harvester-csi-driver.yaml to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/charts/harvester-csi-driver.yaml"
rke2[1949]: level=info msg="Extracting file charts/rancher-vsphere-cpi.yaml to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/charts/rancher-vsphere-cpi.yaml"
rke2[1949]: level=info msg="Extracting file charts/rancher-vsphere-csi.yaml to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08a/charts/rancher-vsphere-csi.yaml"
rke2[1949]: level=info msg="Extracting file charts/rke2-calico-crd.yaml to /var/lib/rancher/rke2/data/v1.25.5-rke2r2-a9acdcb2f08

Expected Result

  • Cluster should come into Active state with cis-1.23 profile
@rishabhmsra rishabhmsra added the kind/bug Issues that are defects reported by users or that we know have reached a real release label Sep 28, 2022
@rishabhmsra rishabhmsra added this to the v2.7.0 milestone Sep 28, 2022
@rishabhmsra rishabhmsra self-assigned this Sep 28, 2022
@rishabhmsra
Copy link
Contributor Author

/backport v2.6.9

@rishabhmsra rishabhmsra added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Sep 28, 2022
@Sahota1225 Sahota1225 modified the milestones: v2.7.0, v2.7.1 Sep 29, 2022
@Sahota1225 Sahota1225 modified the milestones: v2.7.1, v2.7.2 Oct 10, 2022
@snasovich
Copy link
Collaborator

@rishabhmsra , my understanding is that cis-1.23 profile is only supported on k8s 1.25+ which we don't currently support: https://docs.rke2.io/security/cis_self_assessment123/
See https://docs.rke2.io/security/hardening_guide/, section "RKE2 Releases Prior to v1.25" states:

The profile's flag only valid values are cis-1.5 or cis-1.6. It accepts a string value to allow for other profiles in the future.

If that's correct, let's close this bug, as well as the backport.

@MKlimuszka
Copy link
Collaborator

Now that 2.7-head runs k8s 1.25, there are no profiles that can run on a RKE2 hardened cluster. #39851 (comment) shows that CIS can run on other cluster types, even with the 1.23 profile. I am increasing the priority of this ticket to priority/0 until QA or product approves the behavior.

@Jono-SUSE-Rancher Jono-SUSE-Rancher added the release-note Note this issue in the milestone's release notes label Jan 3, 2023
@MKlimuszka
Copy link
Collaborator

@rishabhmsra , I talked to Sergey. Can you add/update the exact reproduction steps for the current 2.7head? thanks.

@doflamingo721
Copy link
Contributor

Hey @MKlimuszka @snasovich, The steps to reproduce this on 2.7 head are mentioned in the issue description. The behaviour which I saw on k8s 1.25 with cis-profile 1.6 was that it was stuck in updating state with the msg as follows:

rke2[19172]: time="2023-01-09T13:26:58Z" level=fatal msg="invalid value provided for --profile flag"
systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: rke2-server.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).

And while using the 1.23 profile it was stuck at:

[INFO ] configuring control plane node(s) custom-5f68374b869c: waiting for cluster agent to connect

@snasovich
Copy link
Collaborator

@doflamingo721 , per https://docs.rke2.io/security/hardening_guide cis-1.23 is the only option supported for k8s 1.25+ (while for previous versions cis-1.6 must be used).

The way I'm reading your previous comment it looks like RKE2 provisioning on k8s 1.25.5 seems to fail when cis-1.23 profile is specified and everything is configured per the hardening steps from #36629 (comment). If this is correct, could you please update the repro steps to reflect that and produce as much specifics on exact errors / logs that you generated.
Running RKE2 1.25.5 with cis-1.6 is not supported and logs/information from that test is not helping much.

@doflamingo721 doflamingo721 changed the title [BUG] RKE2 hardened cluster fails to provision when CIS profile is set to 1.23 [BUG] RKE2 hardened cluster fails to provision for k8s 1.25.5 Jan 13, 2023
@doflamingo721
Copy link
Contributor

@snasovich I have updated the issue description with the steps to reproduce the behaviour and the relevant logs.

@snasovich
Copy link
Collaborator

When cis-1.23 mode is used in RKE2, it

Configures the Pod Security Admission Controller to enforce restricted mode in all namespaces, with the exception of the kube-system, cis-operator-system, and tigera-operator namespaces.
(per https://docs.rke2.io/security/hardening_guide)

cluster-agent is running in cattle-system namespace and is not compliant with this policy, so it never starts and cluster can't communicate with Rancher as evident by the following messages in RKE2 log on Control Plane node (log slightly reformatted to highlight 4 requirements that are not satisfied) - credit to @jakefhyde for digging through the logs:

Jan 20 10:34:20 ip-x-x-x-x rke2[1806]: W0120 10:34:20.295443    1806 warnings.go:70] would violate PodSecurity "restricted:latest":
allowPrivilegeEscalation != false (container "cluster-register" must set securityContext.allowPrivilegeEscalation=false)
unrestricted capabilities (container "cluster-register" must set securityContext.capabilities.drop=["ALL"])
runAsNonRoot != true (pod or container "cluster-register" must set securityContext.runAsNonRoot=true)
seccompProfile (pod or container "cluster-register" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

Making cluster-agent compliant with these restrictions is unlikely to be possible as it needs to run as root per @Oats87.
Most likely we will be requiring to use Rancher PSACTs that exempt Rancher-specific namespaces from these restrictions (#39996).

@sowmyav27
Copy link
Contributor

@rishabhmsra please validate the same issue when this issue is fixed - #39994

@deniseschannon
Copy link

Is this for CIS scans or the CIS profile that's launched when RKE2 starts?

There is a difference between the two that people have been confusing.

@snasovich
Copy link
Collaborator

@deniseschannon , CIS profile.

@Sahota1225
Copy link
Contributor

Moving this issue to test as #39994 is ready to test

@vivek-shilimkar
Copy link
Contributor

Tested the issue on rancher v2.7-head. Hardened k8s cluster version v1.25.6+rke2r1 provisioned successfully with CIS profile 1.23.

Test steps.

  • Created a rancher server v2.7-head
  • Create 3 VMs and on the vms following sysctl /etc/sysctl.d/90-kubelet.conf configured:
vm.panic_on_oom=0
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
  • Run sudo sysctl -p /etc/sysctl.d/90-kubelet.conf to enable the settings
  • Set sudo useradd -r -c "etcd user" -s /sbin/nologin -M etcd -U
  • Create a custom rke2 cluster with below parameters set in the yaml file: Set profile to cis-1.23
spec:
  rkeConfig:
    machineSelectorConfig:
      - config:
          profile: cis-1.23
          protect-kernel-defaults: true
  • Registered the nodes in the custom cluster

Node registration is successful.

Hardened k8s cluster v1.25.6+rke2r1 provisioning was successful with CIS profile 1.23. Hence, closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release priority/0 release-note Note this issue in the milestone's release notes team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support team/infracloud
Projects
None yet
Development

No branches or pull requests

10 participants