Rootless mode allows running BuildKit daemon as a non-root user.
Using Ubuntu kernel is recommended.
Make sure to have an emptyDir
volume below:
spec:
containers:
- name: buildkitd
volumeMounts:
# Dockerfile has `VOLUME /home/user/.local/share/buildkit` by default too,
# but the default VOLUME does not work with rootless on Google's Container-Optimized OS
# as it is mounted with `nosuid,nodev`.
# https://github.com/moby/buildkit/issues/879#issuecomment-1240347038
- mountPath: /home/user/.local/share/buildkit
name: buildkitd
volumes:
- name: buildkitd
emptyDir: {}
See also the example manifests.
Needs to run sysctl -w user.max_user_namespaces=N
(N=positive integer, like 63359) on the host nodes.
See ../examples/kubernetes/sysctl-userns.privileged.yaml
.
Old distributions
Add kernel.unprivileged_userns_clone=1
to /etc/sysctl.conf
(or /etc/sysctl.d
) and run sudo sysctl -p
.
This step is not needed for Debian GNU/Linux 11 and later.
Add user.max_user_namespaces=28633
to /etc/sysctl.conf
(or /etc/sysctl.d
) and run sudo sysctl -p
.
This step is not needed for RHEL/CentOS 8 and later.
You may have to disable SELinux, or run BuildKit with --oci-worker-snapshotter=fuse-overlayfs
.
- Using the
overlayfs
snapshotter requires kernel >= 5.11 or Ubuntu kernel. On kernel >= 4.18, thefuse-overlayfs
snapshotter is used instead ofoverlayfs
. On kernel < 4.18, thenative
snapshotter is used. - Network mode is always set to
network.host
.
RootlessKit needs to be installed.
$ rootlesskit buildkitd
$ buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
To isolate BuildKit daemon's network namespace from the host (recommended):
$ rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
RootlessKit needs to be installed.
Run containerd in rootless mode using rootlesskit following containerd's document.
$ containerd-rootless.sh
Then let buildkitd join the same namespace as containerd.
$ containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true --containerd-worker-snapshotter=native
Try running buildkitd
with --oci-worker-snapshotter=fuse-overlayfs
:
$ rootlesskit buildkitd --oci-worker-snapshotter=fuse-overlayfs
Try running buildkitd
with --oci-worker-snapshotter=native
:
$ rootlesskit buildkitd --oci-worker-snapshotter=native
See https://rootlesscontaine.rs/getting-started/common/subuid/
Make sure to mount an emptyDir
volume on /home/user/.local/share/buildkit
.
Error fork/exec /proc/self/exe: no space left on device
with level=warning msg="/proc/sys/user/max_user_namespaces needs to be set to non-zero."
Run sysctl -w user.max_user_namespaces=N
(N=positive integer, like 63359) on the host nodes.
See ../examples/kubernetes/sysctl-userns.privileged.yaml
.
This error is known to happen when BuildKit is executed in a container without the --oci-worker-no-sandbox
flag.
Make sure that --oci-worker-no-process-sandbox
is specified (See below).
$ docker run \
--name buildkitd \
-d \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--device /dev/fuse \
moby/buildkit:rootless --oci-worker-no-process-sandbox
$ buildctl --addr docker-container://buildkitd build ...
If you don't mind using --privileged
(almost safe for rootless), the docker run
flags can be shorten as follows:
$ docker run --name buildkitd -d --privileged moby/buildkit:rootless
Adding --device /dev/fuse
to the docker run
arguments is required only if you want to use fuse-overlayfs
snapshotter.
By adding --oci-worker-no-process-sandbox
to the buildkitd
arguments, BuildKit can be executed in a container without adding --privileged
to docker run
arguments.
However, you still need to pass --security-opt seccomp=unconfined --security-opt apparmor=unconfined
to docker run
.
Note that --oci-worker-no-process-sandbox
allows build executor containers to kill
(and potentially ptrace
depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
To allow running rootless buildkitd
without --oci-worker-no-process-sandbox
, run docker run
with --security-opt systempaths=unconfined
. (For Kubernetes, set securityContext.procMount
to Unmasked
.)
The --security-opt systempaths=unconfined
flag disables the masks for the /proc
mount in the container and potentially allows reading and writing dangerous kernel files, but it is safe when you are running buildkitd
as non-root.
The moby/buildkit:rootless
image has the following UID/GID configuration:
Actual ID (shown in the host and the BuildKit daemon container) | Mapped ID (shown in build executor containers) |
---|---|
1000 | 0 |
100000 | 1 |
... | ... |
165535 | 65536 |
$ docker exec buildkitd id
uid=1000(user) gid=1000(user)
$ docker exec buildkitd ps aux
PID USER TIME COMMAND
1 user 0:00 rootlesskit buildkitd --addr tcp://0.0.0.0:1234
13 user 0:00 /proc/self/exe buildkitd --addr tcp://0.0.0.0:1234
21 user 0:00 buildkitd --addr tcp://0.0.0.0:1234
29 user 0:00 ps aux
$ docker exec cat /etc/subuid
user:100000:65536
To change the UID/GID configuration, you need to modify and build the BuildKit image manually.
$ vi Dockerfile
$ make images
$ docker run ... moby/buildkit:local-rootless ...