Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerized SSH-less provisioner #134

Open
jp39 opened this issue Sep 16, 2024 · 7 comments
Open

Containerized SSH-less provisioner #134

jp39 opened this issue Sep 16, 2024 · 7 comments

Comments

@jp39
Copy link

jp39 commented Sep 16, 2024

Hi,

The documentation states:

Making a container image and creating ZFS datasets from a container is not exactly easy, as ZFS runs in kernel. While it's possible to pass /dev/zfs to a container so it can create and destroy datasets within the container, sharing the volume with NFS does not work.

Setting sharenfs property to anything other than off invokes exportfs(8), that requires also running the NFS Server to reload its exports. Which is not the case in a container (see zfs(8)).

But most importantly: Mounting /dev/zfs inside the provisioner container would mean that the datasets will only be created on the same host as the container currently runs.

So, in order to "break out" of the container the zfs calls are wrapped and redirected to another host over SSH. This requires SSH private keys to be mounted in the container for a SSH user with sufficient permissions to run zfs commands on the target host.

I spent some time working on a small proof of concept that shows it is possible to create ZFS dataset from within a container and have the volumes shared with NFS by the container. Also, the volume mounts are visible by both the host and the container, making it shareable using HostPath.

I'm using this Dockerfile:

FROM docker.io/library/alpine:3.20 as runtime

ENTRYPOINT ["/entrypoint.sh"]

RUN apk add bash zfs nfs-utils

COPY kubernetes-zfs-provisioner /usr/bin/
COPY entrypoint.sh /

With this entrypoint.sh:

#!/bin/sh

rpcbind
rpc.statd --no-notify --port 32765 --outgoing-port 32766
rpc.mountd --port 32767
rpc.idmapd
rpc.nfsd --tcp --udp --port 2049 8

exec /usr/bin/kubernetes-zfs-provisioner

The secret sauce is to use mountPropagation: Bidirectional for the dataset volume mount, so each dataset mounted by the container is also visible in the host and vice versa:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: zfs-provisioner
  namespace: zfs-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: zfs-provisioner
  template:
    metadata:
      labels:
        app.kubernetes.io/name: zfs-provisioner
      namespace: zfs-system
    spec:
      serviceAccountName: zfs-provisioner
      containers:
      - name: provisionner
        image: jp39/zfs:latest
        volumeMounts:
        - name: dev-zfs
          mountPath: /dev/zfs
        - name: dataset
          mountPath: /tank/kubernetes
          mountPropagation: Bidirectional
        securityContext:
          privileged: true
          procMount: Unmasked
        ports:
        - containerPort: 2049
          protocol: TCP
        - containerPort: 111
          protocol: UDP
        - containerPort: 32765
          protocol: UDP
        - containerPort: 32767
          protocol: UDP
        env:
        - name: ZFS_NFS_HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
      volumes:
      - name: dev-zfs
        hostPath:
          path: /dev/zfs
      - name: dataset
        hostPath:
          path: /tank/kubernetes
      nodeSelector:
        kubernetes.io/hostname: zfsnode

Note that I had to make a small patch within kubernetes-zfs-provisioner so that the pod IP address (contained in the ZFS_NFS_HOSTNAME environment variable) gets used a the NFSVolumeSource's server address instead of the storage class's
hostname parameter.

Is this something that would be worth having as a default configuration? It needs the ZFS host to be part of the cluster, but has the advantage not to require extra setup such as SSH keys.

@ccremer
Copy link
Owner

ccremer commented Sep 16, 2024

Hi
This is indeed interesting! Thanks for experimenting with this!
Is the privileged security context really necessary? This makes it cumbersome to make it the default, as that requires elevated permissions to install the provisioner in certain contexts, e.g. OpenShift or ArgoCD (or at least make it more difficult).

@jp39
Copy link
Author

jp39 commented Sep 16, 2024

I will try without it. I assumed it was necessary because of the /dev/zfs hostPath mount, but maybe I'm wrong.

@jp39
Copy link
Author

jp39 commented Sep 16, 2024

It does not seem to work without the privileged security context:

# * spec.template.spec.containers[0].volumeMounts.mountPropagation: Forbidden: Bidirectional mount propagation is available only to privileged containers

@jp39
Copy link
Author

jp39 commented Sep 16, 2024

It does seem legit though that a container having access to the host mount namespace as well as the ability to create or destroy datasets on the host requires elevated permissions.

@ccremer
Copy link
Owner

ccremer commented Sep 17, 2024

Hm, yeah makes sense.
I would refrain from making it the default configuration if it requires privileged containers and running on ZFS-enabled hosts. I would opt to provide a preset for it, e.g. an additional values yaml file, or an entirely different deployment template if a parameter is given.

@jp39
Copy link
Author

jp39 commented Sep 19, 2024

It seems that there would be some architectural changes to make in the project if we wanted to integrate this feature.

For example, the container running the provisioner would have to be pinned on the ZFS node at the deployment level, therefore rendering the storage class node and hostname parameters useless.

Similarly, the mount path of the parent dataset has to be provided at the deployment level too, making it impossible to select a different parent dataset using storage class parameters.

Overall, it feels like it would make things less "flexible" although I'm pretty sure most users only use a single parent dataset on a single ZFS host (with a single storage class).

I think it will be very difficult to make the SSH-less and the SSH use case coexist, so if you're not prepared to give up the SSH use case, it's probably not worth carrying on with it.

Since it fits my own use-case better, I'm actually considering creating a fork of this project to implement this fully, if you don't have any objection.

@ccremer
Copy link
Owner

ccremer commented Sep 20, 2024

I agree with you.

I have no data what most users use for their setup, so any statement about usage are probably equally true :D

I think it will be very difficult to make the SSH-less and the SSH use case coexist

I agree.

Since it fits my own use-case better, I'm actually considering creating a fork of this project to implement this fully, if you don't have any objection.

No objection at all. I may even link to your repo for users that may interested in SSH-less version ;)

jp39 pushed a commit to jp39/zfs-provisioner that referenced this issue Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants