Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multus 4.x broken with the lastest rke versions #4568

Closed
xhejtman opened this issue Aug 3, 2023 · 11 comments · Fixed by rancher/image-build-multus#27
Closed

Multus 4.x broken with the lastest rke versions #4568

xhejtman opened this issue Aug 3, 2023 · 11 comments · Fixed by rancher/image-build-multus#27

Comments

@xhejtman
Copy link

xhejtman commented Aug 3, 2023

Environmental Info:
RKE2 Version: v1.26.7+rke2r1

Node(s) CPU architecture, OS, and Version:
Linux kub-b14.priv.cerit-sc.cz 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
cni: calico,multus

Describe the bug:
If using additional network attachement, pod does not start:

Events:
  Type     Reason                  Age              From               Message
  ----     ------                  ----             ----               -------
  Normal   Scheduled               26s              default-scheduler  Successfully assigned beegfs/beegfs-mgmtd-0 to kub-b10.priv.cerit-sc.cz
  Normal   AddedInterface          25s              multus             Add eth0 [10.42.11.18/32 2001:718:801:42cb:8:2:f625:7e21/128] from k8s-pod-network
  Normal   AddedInterface          25s              multus             Add ib0 [10.16.59.2/24] from beegfs/ibnet-beegfs
  Normal   AddedInterface          25s              multus             Add eth0 [10.42.11.18/32 2001:718:801:42cb:8:2:f625:7e21/128] from multus-cni-network
  Warning  FailedCreatePodSandBox  24s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e32935913443085ffc5aa0044dc4d893147d7d539e8358f13330bcdc7e2670d5": plugin type="multus" name="multus-cni-network" failed (add): [beegfs/beegfs-mgmtd-0/e34930ae-588e-4295-890c-7ac7a3a83ea2:ibnet]: error adding container to network "ibnet": DelegateAdd: cannot set "ipoib" interface name to "ib0": validateIfName: interface name ib0 already exists
  Normal   AddedInterface          23s              multus             Add eth0 [10.42.11.136/32 2001:718:801:42cb:8:2:f625:7e22/128] from k8s-pod-network
  Normal   AddedInterface          23s              multus             Add ib0 [10.16.59.2/24] from beegfs/ibnet-beegfs
  Normal   AddedInterface          23s              multus             Add eth0 [10.42.11.136/32 2001:718:801:42cb:8:2:f625:7e22/128] from multus-cni-network
  Warning  FailedCreatePodSandBox  22s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3547a281ec36762a2d08719401e8adea05f9cdb926d393028d16e7bd71409be4": plugin type="multus" name="multus-cni-network" failed (add): [beegfs/beegfs-mgmtd-0/e34930ae-588e-4295-890c-7ac7a3a83ea2:ibnet]: error adding container to network "ibnet": DelegateAdd: cannot set "ipoib" interface name to "ib0": validateIfName: interface name ib0 already exists
  Normal   AddedInterface          21s              multus             Add eth0 [10.42.11.89/32 2001:718:801:42cb:8:2:f625:7e1c/128] from k8s-pod-network
  Normal   AddedInterface          21s              multus             Add ib0 [10.16.59.2/24] from beegfs/ibnet-beegfs
  Normal   AddedInterface          21s              multus             Add eth0 [10.42.11.89/32 2001:718:801:42cb:8:2:f625:7e1c/128] from multus-cni-network
  Warning  FailedCreatePodSandBox  20s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "130f86345ef798dd736b3163d5fafaa5a6dd48fa4cc8f5e1b22a0450b97a1c76": plugin type="multus" name="multus-cni-network" failed (add): [beegfs/beegfs-mgmtd-0/e34930ae-588e-4295-890c-7ac7a3a83ea2:ibnet]: error adding container to network "ibnet": DelegateAdd: cannot set "ipoib" interface name to "ib0": validateIfName: interface name ib0 already exists
  Normal   AddedInterface          19s              multus             Add eth0 [10.42.11.102/32 2001:718:801:42cb:8:2:f625:7e01/128] from k8s-pod-network
  Normal   AddedInterface          19s              multus             Add ib0 [10.16.59.2/24] from beegfs/ibnet-beegfs
  Normal   AddedInterface          19s              multus             Add eth0 [10.42.11.102/32 2001:718:801:42cb:8:2:f625:7e01/128] from multus-cni-network
  Warning  FailedCreatePodSandBox  18s              kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "49f3583246b7e6cb8d83546867a3c9f1e3fe834687a56938f036f05c085ebe12": plugin type="multus" name="multus-cni-network" failed (add): [beegfs/beegfs-mgmtd-0/e34930ae-588e-4295-890c-7ac7a3a83ea2:ibnet]: error adding container to network "ibnet": DelegateAdd: cannot set "ipoib" interface name to "ib0": validateIfName: interface name ib0 already exists

It worked in the version v1.26.6+rke2r1, which has multus 3.x. Maybe related to:
k8snetworkplumbingwg/multus-cni#1130 ?

@thomasferrandiz
Copy link
Contributor

Hi @xhejtman
I tried to reproduce your issue but my deployment is working.

Can you share more details on what you're doing?
Especially, the manifest for the NetworkAttachmentDefinition and pod definition would be useful to reproduce.

@xhejtman
Copy link
Author

xhejtman commented Aug 3, 2023

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ibnet-beegfs
spec:
  config: '{
    "cniVersion": "0.3.1",
    "name": "ibnet",
    "type": "ipoib",
    "master": "ibp225s0",
    "ipam": {
      "type": "whereabouts",
      "range": "10.16.59.0/24",
      "range_start": "10.16.59.2",
      "range_end": "10.16.59.2"
     }
    }'

this is infiniband device and a pool with a single address (emulating service for a statefulset).

@xhejtman
Copy link
Author

xhejtman commented Aug 3, 2023

and the statefulset

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: beegfs-mgmtd
spec:
  replicas: 1
  serviceName: beegfs
  selector:
    matchLabels:
      app: beegfs-mgmtd
  template:
    metadata:
      labels:
        app: beegfs-mgmtd
      annotations:
         k8s.v1.cni.cncf.io/networks: "beegfs/ibnet-beegfs"
    spec:
      containers:
      - name: beegfs-mgmtd
        image: cerit.io/os/beegfs:7.3.3
        command:
          - /opt/beegfs/sbin/beegfs-mgmtd
          - cfgFile=/etc/beegfs/beegfs-mgmtd.conf
          - runDaemonized=false
        imagePullPolicy: IfNotPresent
        securityContext:
          runAsUser: 999
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
        resources:
          limits:
            rdma/hca: 1
        volumeMounts:
          - mountPath: /mnt
            name: mgmt-dir
          - name: connauth
            mountPath: /etc/beegfs-auth
          - name: config
            mountPath: /etc/beegfs
      securityContext:
        fsGroup: 999
        runAsNonRoot: true
        seccompProfile:
           type: RuntimeDefault
      volumes:
      - name: mgmt-dir
        persistentVolumeClaim:
          claimName: pvc-beegfs-mgmt
      - name: connauth
        secret:
          secretName: connauthfile
      - name: config
        configMap:
          name: mgmtd-conf

@thomasferrandiz
Copy link
Contributor

@xhejtman
Thanks for the info.
I still couldn't reproduce the problem as I don't have access to infiniband hardware.

I had no issue using multus 4 with sriov+dpdk or in a VM without special hardware.

It might be worth opening an issue upstream as I don't think we can help more without a way to reproduce the bug.

@work-smalin
Copy link

Hi @thomasferrandiz,

maybe I can help - I observed the same problem without using any special hardware.
The behavior can be reproduced as follows:

  1. install rke2 v1.27.4 with cilium/multus on (at least 3) nodes, each node having two network adapters
  2. configure a multus network (including whereabouts) which uses the second network adapter
  3. deploy longhorn 1.5.1 using the multus network as storage network
  4. drain, reboot and uncordon the cluster nodes

After the reboot the longhorn instance-manager pods don't start and the log messages state that the adapter lhnet1 already exists.

This setup works fine using rke2 v1.26.6 with multus 3.9.3 or kubeadm 1.27.4 with multus 4.0.2.

@xhejtman
Copy link
Author

xhejtman commented Aug 5, 2023

I can confirm that I made similar - draining node and reboot before upgrade to 1.26.7, so maybe draining when address/interface is in use, is causing this.

@thomasferrandiz
Copy link
Contributor

@work-smalin Thanks for the details.
I will check that.

thomasferrandiz added a commit to thomasferrandiz/image-build-multus that referenced this issue Aug 9, 2023
and breaks pod networking by try multiple times to create the same interface.

Issue fixed: rancher/rke2#4568
thomasferrandiz added a commit to thomasferrandiz/image-build-multus that referenced this issue Aug 11, 2023
and breaks pod networking by try multiple times to create the same interface.

Issue fixed: rancher/rke2#4568
@thomasferrandiz
Copy link
Contributor

I think I found a fix for the issue and submitted a PR upstream: k8snetworkplumbingwg/multus-cni#1137

There is also an updated rancher's multus image available with the fix: rancher/hardened-multus-cni:v4.0.2-build20230811

@manuelbuil
Copy link
Contributor

Issue should be closed by QA once tested

Copy link
Contributor

github-actions bot commented Feb 9, 2024

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 2, 2024
@micw
Copy link

micw commented Apr 22, 2024

Hello,
although the issue was closed, it's still pressent in multus-cni. rancher/hardened-multus-cni:v4.0.2-build20230811 fixes it for me.

Edit: sorry, thought this issue was on the multus repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants