Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual Function Unhealthy when pci address map to two interfaces #58

Closed
aweimeow opened this issue Jan 8, 2019 · 8 comments
Closed

Comments

@aweimeow
Copy link

aweimeow commented Jan 8, 2019

Environment

Hardware: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Branch: master

Problem

We have the virtual function enabled on SR-IOV supported device, but this network interface card has 2 ports: enp3s0 and enp3s0d0. If we don't have these two interfaces enabled SR-IOV both, it will fail when healthy check.

Here is one interface of the card with SR-IOV enabled, it has 4 virtual functions.

6: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:02:c9:1e:b4:60 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:52:44:11:22:33, vlan 4095, spoof checking off, link-state enable
    vf 1 MAC 00:52:44:11:22:34, vlan 4095, spoof checking off, link-state enable
    vf 2 MAC 00:52:44:11:22:35, vlan 4095, spoof checking off, link-state enable
    vf 3 MAC 00:52:44:11:22:36, vlan 4095, spoof checking off, link-state enable

But the interface's PCI address is mapping to 2 devices: enp3s0 and enp3s0d1.

ubuntu@master:~$ ls -al /sys/bus/pci/devices/0000\:03\:00.0/net
total 0
drwxr-xr-x  4 root root 0 Jan  8 00:21 .
drwxr-xr-x 10 root root 0 Jan  8 00:21 ..
drwxr-xr-x 10 root root 0 Jan  8 00:21 enp3s0
drwxr-xr-x 10 root root 0 Jan  8 00:21 enp3s0d1

And in the source code pkg/resources/netDevicePool.go, it will check by pf parameter, I believe it is PCI address we feed in /etc/pcidp/config.json.

for _, pf := range rc.RootDevices {
    // If the PF link is not up = "Unhealthy"
    if !utils.IsNetlinkStatusUp(pf) {
        healthValue = pluginapi.Unhealthy
    }
    ... snip ...

Then it will check pf by IsNetlinkStatusUp function, in utils/utils.go, and because we can't make sure what is the interface name for this PCI address, so we use the wildcard to find interface's operstate.

if opsFiles, err := filepath.Glob(filepath.Join(sysBusPci, dev, "net", "*", "operstate")); err == nil {
    for _, f := range opsFiles {
        bytes, err := ioutil.ReadFile(f)
        if err != nil || strings.TrimSpace(string(bytes)) != "up" {
            return false
        }
    }
}

But once we don't have both enp3s0 and enp3s0d0 keep in up status, it failed on the health check.

Further question

this is two isolated interfaces share with same pci address. When we locate VFs into the container, how can I know which interface's VF is located to the container?

@zshi-redhat
Copy link
Collaborator

I'd guess this is a special case for Mellanox card that two interfaces share the same PCI address.
Is there an easy way to identify which PF interface a VF belongs to?
In current implementation, we are using PCI address as a unique identifier for each PF device, basically one to one mapping between PCI address and PF device. This issue breaks the assumption and may require a major change to support in current code base.

@moshe010
Copy link
Contributor

So in openstack we had the some problem so we just used config option exclude_devices to exclude all the unneeded PF device to VF PCI address mapping see [1] under SR-IOV with ConnectX-3/ConnectX-3 Pro Dual Port Ethernet. This allow to keep the same logic as today and the user just need to provide unneeded PF device to VF PCI address per mlx4_core num_vfs config.
[1] - https://github.com/openstack/neutron/blob/master/doc/source/admin/config-sriov.rst

@zshi-redhat
Copy link
Collaborator

zshi-redhat commented Jan 16, 2019

Since we are already using pcidp/config.json file to specify which rootDevice shall be discovered and managed by device plugin, I think it might make sense to use the same mechinsim for the exclude_devices config. Instead of adding a new field, I'm thinking if it's possible to change the current format of rootDevices and add associated device name for each PCI address if there is a need to distinguish different device names under same PCI address. for example:
assuming there are two interfaces (pf_name_0, pf_name_1) share the same PCI address 02:00.0

"rootDevices": ["02:00.0/pf_name_0", "02:00.0/pf_name_1"]

@moshe010
Copy link
Contributor

But the rootDevices are list of PF PCI address and we need to map the VF pci to PF netdevice

@zshi-redhat
Copy link
Collaborator

But the rootDevices are list of PF PCI address and we need to map the VF pci to PF netdevice

Yes, with both PCI address and PF interface name provided, we can uniquely identify which rootDevice a certain VF belongs to, which means if one wants to query the link status of a VF, PCI address and PF name can be used together to decide which sys path to query. For example. in the issue description, we can decide either use /sys/bus/pci/devices/0000\:03\:00.0/net/enp3s0 or use /sys/bus/pci/devices/0000\:03\:00.0/net/enp3s0d1 to get VF status.

@zshi-redhat
Copy link
Collaborator

@ahalim-intel is working on a new approach of defining the resource groups ( see comments in #59 ) which might change how the config.json is configured. let's hold the discussion until the new approach is ready.

@killianmuldoon
Copy link
Collaborator

@moshe010 do you think we can close this issue?

@moshe010
Copy link
Contributor

moshe010 commented Jul 2, 2020

yes, the The "pfNames" selector which can specify a range of VFs should solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants