Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support VFIO backed VFs #63

Closed
booxter opened this issue Feb 14, 2019 · 7 comments
Closed

Support VFIO backed VFs #63

booxter opened this issue Feb 14, 2019 · 7 comments

Comments

@booxter
Copy link
Contributor

booxter commented Feb 14, 2019

When SR-IOV device plugin allocates a VFIO typed VF, the device doesn't have a netlink representation. So when SR-IOV CNI then tries to move the device into the pod, it fails because netlink interface is not present.

While one may argue that in case of VFIO devices we may not need a CNI at all, Multus requires that a network has type specified. If it's not sriov then it should be something else. We could write a new CNI plugin that would do nothing in terms of device binding into the namespace; but it's not convenient if we already have SR-IOV CNI with meaningful type name. We could also benefit from SR-IOV CNI handling the binding if we e.g. want IPAM info to be allocated for the device (perhaps because later we will use this information to pass it into software running inside the pod; in case of kubevirt it would be libvirt-backed VM that receives IPAM configuration via cloud-init; #37 is the place where one of potential approaches to pass IPAM into pod is discussed.)

It's my belief that SR-IOV CNI should gracefully handle VFs that are registered with VFIO driver like it already does for non-VFIO devices. This issue is just that - making the plugin not fail on VFIO registered VF and return desired IPAM information in result returned to the caller.

@booxter
Copy link
Contributor Author

booxter commented Feb 15, 2019

@zshi-redhat @ahalim-intel what's the official strategy for userspace devices for this CNI plugin? Would a PR that would make binding for e.g. VFIO devices be accepted into the CNI plugin, or should we go and contribute to another repo? (Perhaps start a new one.)

I'd like to make sure that we are on the same page before I start working on the implementation.

The implementation would resemble the shell CNI plugin mentioned in #37 (comment) but as part of the SR-IOV CNI plugin.

@booxter
Copy link
Contributor Author

booxter commented Feb 15, 2019

/assign booxter

@zshi-redhat
Copy link
Collaborator

zshi-redhat commented Feb 18, 2019

@zshi-redhat @ahalim-intel what's the official strategy for userspace devices for this CNI plugin?

There is a repo for userspace CNI plugins, it supports userspace interfaces with OVS or VPP as backend. But it doesn't support DPDK over VF.

Would a PR that would make binding for e.g. VFIO devices be accepted into the CNI plugin, or should we go and contribute to another repo? (Perhaps start a new one.)

Could you please elaborate a little bit on how SR-IOV CNI knows which driver the VF shall be bind to?

I'd like to make sure that we are on the same page before I start working on the implementation.

The implementation would resemble the shell CNI plugin mentioned in #37 (comment) but as part of the SR-IOV CNI plugin.

@zshi-redhat
Copy link
Collaborator

zshi-redhat commented Feb 18, 2019

@zshi-redhat @ahalim-intel what's the official strategy for userspace devices for this CNI plugin?

There is a repo for userspace CNI plugins, it supports userspace interfaces with OVS or VPP as backend.

Would a PR that would make binding for e.g. VFIO devices be accepted into the CNI plugin, or should we go and contribute to another repo? (Perhaps start a new one.)

Could you please elaborate a little bit on how SR-IOV CNI knows which driver the VF shall be bind to?

I'd like to make sure that we are on the same page before I start working on the implementation.
The implementation would resemble the shell CNI plugin mentioned in #37 (comment) but as part of the SR-IOV CNI plugin.

Would it make sense that we do the binding of VF in SR-IOV CNI by providing a net-attach-def Custom Resource with dpdk-driver as one of its parameters? Then we could have SR-IOV CNI check whether additional driver(dpdk-driver) is configured and do the binding accordingly. This doesn't solve the problem of publishing IPAM information to container Pod, but it avoid device plugin to track all the vfio device and
avoid having another shell CNI to do the binding since SR-IOV CNI already supports binding with driver provided.

@ahalimx86
Copy link
Collaborator

@zshi-redhat the device cgroup permissions for the container cannot be taken care of with a CNI plugin. Thats why it makes more sense to leave driver binding/unbinding out of a CNI plugin. We had this in sriov-cni as a legacy feature(before any notion of device plugin). A device plugin is right place for device file permission related issue. The SRIOV DP takes care of it already, thus we no longer should do anything with driver changes in the CNI plugin.

@booxter
Copy link
Contributor Author

booxter commented Feb 18, 2019

@ahalim-intel @zshi-redhat I think we can determine the current driver for a device using sysfs. If it's VFIO, do nothing to move (non-existing) netlink interface into namespace. (But still we can create a veth to carry IPAM information. That would be a separate follow-up feature.)

@booxter
Copy link
Contributor Author

booxter commented Feb 18, 2019

@zshi-redhat

Would it make sense that we do the binding of VF in SR-IOV CNI by providing a net-attach-def Custom Resource with dpdk-driver as one of its parameters?

Note that in my use case, it's not DPDK at all, it's a regular VF that happens to be registered with VFIO userspace driver. So using dpdk_driver for this need seems incorrect. I also think that it should be easy to automatically detect whether a device is bound to vfio driver and act accordingly (by skipping the move-netlink-to-namespace step of the plugin).

booxter added a commit to booxter/sriov-cni that referenced this issue Feb 19, 2019
If a pci device allocated to a container by SR-IOV device plugin is of
vfio type then it doesn't have a netlink representation, so moving the
interface into container namespace is useless and actually breaks the
CNI plugin.

We could perhaps go with a separate noop CNI plugin to do the same but
there are plans to, in the future, allocate IPAM info for the device
and expose it by one means or another (it could be a fake veth pair to
carry the information: see issue k8snetworkplumbingwg#37; or some other mechanism).

Fixes k8snetworkplumbingwg#63
booxter added a commit to booxter/sriov-cni that referenced this issue Feb 19, 2019
If a pci device allocated to a container by SR-IOV device plugin is of
vfio type then it doesn't have a netlink representation, so moving the
interface into container namespace is useless and actually breaks the
CNI plugin.

We could perhaps go with a separate noop CNI plugin to do the same but
there are plans to, in the future, allocate IPAM info for the device
and expose it by one means or another (it could be a fake veth pair to
carry the information: see issue k8snetworkplumbingwg#37; or some other mechanism).

Fixes k8snetworkplumbingwg#63
booxter added a commit to booxter/sriov-cni that referenced this issue Feb 19, 2019
If a pci device allocated to a container by SR-IOV device plugin is of
vfio type then it doesn't have a netlink representation, so moving the
interface into container namespace is useless and actually breaks the
CNI plugin.

We could perhaps go with a separate noop CNI plugin to do the same but
there are plans to, in the future, allocate IPAM info for the device
and expose it by one means or another (it could be a fake veth pair to
carry the information: see issue k8snetworkplumbingwg#37; or some other mechanism).

Fixes k8snetworkplumbingwg#63
booxter added a commit to booxter/sriov-cni that referenced this issue Feb 19, 2019
If a pci device allocated to a container by SR-IOV device plugin is of
vfio type then it doesn't have a netlink representation, so moving the
interface into container namespace is useless and actually breaks the
CNI plugin.

We could perhaps go with a separate noop CNI plugin to do the same but
there are plans to, in the future, allocate IPAM info for the device
and expose it by one means or another (it could be a fake veth pair to
carry the information: see issue k8snetworkplumbingwg#37; or some other mechanism).

Fixes k8snetworkplumbingwg#63
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants