Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host-device: failure when using multiple mellanox_hostdevice resources and multiple entries in the NetAttachDef #721

Closed
f18m opened this issue Mar 25, 2022 · 4 comments
Labels

Comments

@f18m
Copy link

f18m commented Mar 25, 2022

Hi,

I'm trying to deploy a POD in kubernetes 1.20.14 with multus-cni plugin (and host-device plugin) on a worker node which has 3 Mellanox ports available.

My POD specification contains in the "resources" section:

intel.com/mellanox_hostdevice = 3

and my NetworkAttachmentDefinition contains:

> kubectl describe network-attachment-definitions.k8s.cni.cncf.io  -n empirix-cloud
Name:         eva-capture-test-mellanox-pci-passthrough
Namespace:    empirix-cloud
Labels:       app.kubernetes.io/instance=eva-capture-test
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=eva-capture
              app.kubernetes.io/version=8.0.0-local
              helm.sh/chart=eva-capture-8.0.0-local
              ranchercluster=edge
Annotations:  k8s.v1.cni.cncf.io/resourceName: intel.com/mellanox_hostdevice
              meta.helm.sh/release-name: eva-capture-test
              meta.helm.sh/release-namespace: empirix-cloud
API Version:  k8s.cni.cncf.io/v1
Kind:         NetworkAttachmentDefinition
Metadata:
...
Spec:
  Config:  {
  "cniVersion": "0.3.0",
  "plugins": [
    {
      "type": "host-device",
      "pciBusID": "0000:3b:00.0"
    },
    {
      "type": "host-device",
      "pciBusID": "0000:3b:00.1"
    },
    {
      "type": "host-device",
      "pciBusID": "0000:60:00.0"
    }

  ]
}

However with this configuration my POD fails to deploy (not even the init containers are able to run) with the following error in the "events" of the POD:

 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "57196486b1e1edc0eb4df92489d48c4196f9cb2c5c2c65e2a061013fdb7adf69" network for pod "eva-capture-test-edge-0": networkPlugin cni failed to set up pod "eva-capture-test-edge-0_empirix-cloud" network: [empirix-cloud/eva-capture-test-edge-0:eva-capture-test-mellanox-pci-passthrough]: error adding container to network "eva-capture-test-mellanox-pci-passthrough": failed to find host device: failed to find device name for pci address 0000:3b:00.0, failed to clean up sandbox container "57196486b1e1edc0eb4df92489d48c4196f9cb2c5c2c65e2a061013fdb7adf69" network for pod "eva-capture-test-edge-0": networkPlugin cni failed to teardown pod "eva-capture-test-edge-0_empirix-cloud" network: delegateDel: error invoking ConflistDel - "eva-capture-test-mellanox-pci-passthrough": conflistDel: error in getting result from DelNetworkList: failed to find "net3": Link not found / delegateDel: error invoking ConflistDel - "eva-capture-test-mellanox-pci-passthrough": conflistDel: error in getting result from DelNetworkList: failed to find "net2": Link not found / delegateDel: error invoking ConflistDel - "eva-capture-test-mellanox-pci-passthrough": conflistDel: error in getting result from DelNetworkList: failed to find "net1": Link not found]

Please note that if instead the NetworkAttachDefinition contains only 1 PCI address, (but the POD resources section still specifies intel.com/mellanox_hostdevice = 3), the POD works just fine and inside the POD "ip a" shows 3 network interfaces (additional to the eth0 interface):

[root@eva-capture-test-edge-0 /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if299: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether be:3d:a2:60:58:1e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.42.0.245/24 brd 10.42.0.255 scope global eth0
       valid_lft forever preferred_lft forever
69: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 04:3f:72:b2:af:78 brd ff:ff:ff:ff:ff:ff
70: net2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 04:3f:72:b2:af:79 brd ff:ff:ff:ff:ff:ff
71: net3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 98:03:9b:7f:bf:00 brd ff:ff:ff:ff:ff:ff

What am I doing wrong?

Thanks

@jellonek
Copy link
Member

Code which is handling that is in https://github.com/containernetworking/plugins/blob/main/plugins/main/host-device/host-device.go#L362
Can you put there output of ls -l /sys/bus/pci/devices/0000:3b:00.0/net because looks like that dir seems to be empty in your case.

@jellonek
Copy link
Member

Related to #300

@f18m
Copy link
Author

f18m commented Apr 10, 2022

Hi @jellonek ,
sorry for the delay. I currently did a workaround in my C++ code which is basically moving the ethernet device (my Mellanox NIC) in the network namespace of the POD. This workaround works fine (I can initialize from DPDK-based POD just 1 Mellanox port instead of all 3) and actually works without the "host-device" plugin. Nonetheless, I would like to help fixing this bug...

You suggested to modify the Go source code of the plugin to get additional debug info (ls -l /sys/bus/pci/devices/0000:3b:00.0/net) right? However where will I read these additional logs? For example I guess I should see "failed to read net directory %s: %q" somewhere emitted by host-device plugin right?

@jellonek
Copy link
Member

Nope. My suggestion was to check output of that command on the host node, but after some time I guess that this will be not helpful as everything works fine in case of a single pci address in NetAttachDef.
As for log messages - i guess that it should be collected by your container runtime implementation (most probably you are using dockershim so that will be probably in kubelet log, or kubelet journal, otherwise it could be in containerd logs).

Anyway, NetAttachDefs are out of scope of this organization/project, also k8s.v1.cni.cncf.io/resourceName: intel.com/mellanox_hostdevice annotation has nothing to do with containernetworking. Looks like you are using combo based on Multus (which is handling NetAttachDef) with https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin which probably is doing something what leads to having 3 interfaces at the end in effect, even if config is specifying only one address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants