Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootless: support Bottlerocket OS (probably by porting moby/daemon.getUnprivilegedMountFlags()) #3098

Closed
AkihiroSuda opened this issue Sep 9, 2022 · 6 comments · Fixed by #3697

Comments

@AkihiroSuda
Copy link
Member

On Bottlerocket OS, an emptyDir is still mounted with nosuid, nodev, so BuildKit fails to create bind mounts: Options:[rbind ro]}]: operation not permitted.

#3097 (comment)

Probably this can be fixed by porting moby/daemon.getUnprivilegedMountFlags() to containerd/mounts.Mount().

https://github.com/moby/moby/blob/v20.10.17/daemon/oci_linux.go#L420-L470

// Get the set of mount flags that are set on the mount that contains the given
// path and are locked by CL_UNPRIVILEGED. This is necessary to ensure that
// bind-mounting "with options" will not fail with user namespaces, due to
// kernel restrictions that require user namespace mounts to preserve
// CL_UNPRIVILEGED locked flags.
func getUnprivilegedMountFlags(path string) ([]string, error) {

#3097 (comment)

@emboss64
Copy link

Are there any plans on getting this working?

@nazarewk
Copy link

nazarewk commented Mar 3, 2023

I'm also very interested in this, note I found a bottlerocket issue bottlerocket-os/bottlerocket#1934

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Mar 9, 2023

I tried to look into this, but it looks like the current version of Bottlerocket OS doesn't even seem to support creating user namespaces:

$ kubectl logs buildkitd
time="2023-03-09T07:51:20Z" level=warning msg="/proc/sys/user/max_user_namespaces needs to be set to non-zero."
[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: no space left on device

Version: (Created with eksctl --version=1.25 -node-ami-family=Bottlerocket using eksctl 0.132.0)

$ kubectl get nodes -o wide
NAME                                                  STATUS   ROLES    AGE     VERSION               INTERNAL-IP      EXTERNAL-IP       OS-IMAGE                                KERNEL-VERSION   CONTAINER-RUNTIME
ip-XXX-XXX-XXX-XXX.ap-northeast-1.compute.internal    Ready    <none>   8m36s   v1.25.5-eks-c248520   192.168.35.80    XXX.XXX.XXX.XXX   Bottlerocket OS 1.12.0 (aws-k8s-1.25)   5.15.79          containerd://1.6.15+bottlerocket

The sysctl can't be modified in the pod securityContext due to 'Pod forbidden sysctl: "sys.user.max_user_namespaces" not allowlisted'.

I guess the sysctl can be still modified by sshing into the Bottlerocket nodes, but I guess typical users would rather prefer to just create non-Bottlerocket node group.

So I'm going to close this issue, but happy to reopen if Bottlerocket supports creating user namespaces once again.

@AkihiroSuda AkihiroSuda closed this as not planned Won't fix, can't repro, duplicate, stale Mar 9, 2023
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Mar 9, 2023

There seems to be some configuration knob?
https://github.com/bottlerocket-os/bottlerocket/pull/1158/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R358

But EKS doesn't seem to support specifying custom user-data (at least via eksctl), so I'm still not sure how it is applicable to EKS.

Workaround: DaemonSet

# Run `sysctl -w user.max_user_namespaces=63359` on all the nodes,
# for errors like "/proc/sys/user/max_user_namespaces needs to be set to non-zero"
# on running rootless buildkitd pods.
#
# This workaround is known to be needed on Bottlerocket OS.
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: sysctl-userns
  name: sysctl-userns
spec:
  selector:
    matchLabels:
      app: sysctl-userns
  template:
    metadata:
      labels:
        app: sysctl-userns
    spec:
      containers:
        - name: sysctl-userns
          image: busybox
          command: ["sh", "-euxc", "sysctl -w user.max_user_namespaces=63359 && sleep infinity"]
          securityContext:
            privileged: true

@nazarewk
Copy link

nazarewk commented Mar 9, 2023

sysctls were not the issue as they could be set on the node, the issue is with setting up nosuid and nodev on the /local volume on which everything is mounted and preventing buildkit from working (with current options)

@AkihiroSuda
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants