[feature-request] Direct Attachable NICs for VM-based containers #837

bergwolf · 2021-05-28T06:17:57Z

Background

Kata Containers is an open source container runtime, building lightweight virtual machines that seamlessly plug into the containers ecosystem. It aims to bring the speed of a container and the security of a virtual machine to its users.

As Kata Containers matures, how it interacts with Kubernetes CNI and connects to the outside network, has become increasingly important. The issue covers the current status of the Kata Containers networking model, its pros and cons, and a proposal to further improve it. We'd like to work with the kube-ovn community to implement an optimized network solution for VM-based containers like Kata Containers.

Status

A classic CNI deployment would result in a networking model like below:

Where a pod sits inside a network namespace, and connects to the outside world via a veth pair. In order to work with this networking model, Kata Containers has implemented a TC based networking model.

Where inside the pod network namespace, a tap device tap0_kata is created and Kata sets up TC mirror rules to copy packages between eth0 and tap0_kata. The eth0 device is a veth pair endpoint and its peer is a veth device attached to the host bridge. So the data flow is like:

As we can see, there are as many as five jumps before a package can reach the guest on the host. The network stack jumps are costly and the architecture needs to be simplified.

Proposal

We can see that all Kata need is a tap device on the host, and it doesn't care how it is created (being it a tuntap, or a ovs tap, or a ipvtap, or a macvtap). So we can create a simple architecture and use tap devices (or similar devices) as the pod network setup entrypoint rather than veth pairs. Something like:

With this architecture, we can remove the need for a host network namespace, and the veth-pair to connect through it. And we don't care how the tap device is created so that CNI plugins can still have different implementation details hidden from us.

A possible control flow for the direct attachable CNIs:

To make it work, kube-ovn will need to be notified that the CNI ADD command is to create a direct attachable network device, and return its information back to CRI runtime (e.g., containerd). Then CRI runtime can pass the NIC information to Kata Containers and it will be further handled there.

Please help to review/comment if the proposal is reasonable and doable. Thanks a lot!

Ref: Kata Containers corresponding issue kata-containers/kata-containers#1922

The text was updated successfully, but these errors were encountered:

oilbeater · 2021-05-31T03:30:31Z

After some investigation, the issue that kubelet will try to enter pod netns and inspect eth0 address is an implementation of dockershim. Unfortunately most of our users still use docker, so we have to adapt it in Kube-OVN side.

The step will like this:

The Pod need a new annotation to tell kube-ovn to create a tap device rather than default veth pair. The annotation may like ovn.kubernetes.io/pod_nic_type=tap and it can be an installation options to set the default nic type to tap later.
When CNI ADD is invoked kube-ovn-cni will read the annotation above to decide the pod nic type. For tap nic, it will create a tap device, link it to ovs and move it to the Pod netns and set the ip/mac/route. For compatible with dockershim, we also need to create a dummy eth0 with the same ip but the link status is down.
kube-ovn-cni then return CNI response with the tap device name in the interface field https://github.com/containernetworking/cni/blob/v0.8.1/pkg/types/current/types.go#L127
Then containerd and kata can use this response to setup its own network.

We know that for kata the extra netns and addresses on the tap device is not required. But for other CRIs especially docker, these steps are required.

bergwolf · 2021-05-31T10:56:08Z

@oilbeater Thanks a lot! I agree that it is better to keep netns and addresses on the tap device for compatibility with other container runtimes.

As for the pod annotation ovn.kubernetes.io/pod_nic_type=tap, can we make it something like ovn.kubernetes.io/pod_nic_direct_attachable=true as we discussed during the hackathon? The idea is to make the interface general enough to allow VFIO or vhost-user based NICs to be usable in the same workflow.

While kube-ovn only implements tap-based NICs at the moment, we want the interface to be future-proof and allow more possibilities. And kube-ovn or other CNIs can choose to implement more NIC types in the future.

wdyt?

oilbeater · 2021-06-02T09:27:01Z

As for the pod annotation ovn.kubernetes.io/pod_nic_type=tap, can we make it something like ovn.kubernetes.io/pod_nic_direct_attachable=true

@bergwolf we already use this annotation to support veth and ovs internal port type nic. It's more nature to reuse this annotation and we can use different annotation values to implement different type interface in the future

bergwolf · 2021-06-02T09:53:24Z

@oilbeater Fair enough. We can make it (the annotation entirely) a config option for containerd so that kata can request different nic types via runtime handler config. Something like runtime_cni_annotations = ["annotation ovn.kubernetes.io/pod_nic_type=tap"] for each runtime handler.

oilbeater · 2021-06-15T05:10:47Z

@bergwolf as we have discussed that when tap device is moved to netns, OVS will lost connection to it. That means we have to leave the tap device in host netns, however it will break other CRI‘s assumption about network.

Another way is to use ovs internal port which can be moved into netns and has better performance than veth-pair. Can you help to provide some guide about how qemu can integrated with ovs internal port? So that we can check if this method can work.

bergwolf · 2021-06-15T07:18:14Z

@oilbeater What is special with ovs internal port? QEMU works well with tap devices on the host. IIUC, ovs internal port is still a tap device to its users. If so, it should JUST WORK (TM) ;)

zhaojizhuang · 2022-01-11T13:26:13Z

Any progress ?

fizzers123 · 2023-04-07T08:34:14Z

I would be interested in this feature as well.

Does Kube-OVN provide the functionality to add a veth (or any other interface) to a Subnet ?
Thanks to kubevirt/macvtap-cni#98 we would then be able to connect to KubeVirt.

fizzers123 · 2023-04-08T12:08:13Z

I managed to get this to working by adding a veth1 to the VMs via macvtap.

ip link add veth0 type veth peer name veth1
ip link set veth0 up
ip link set veth1 up

Then adding veth0 to kube-ovn with the following commands

# first node
kubectl ko vsctl node1 add-port br-int veth0
kubectl ko vsctl node1 set Interface veth0 external_ids:iface-id=veth0.node1
kubectl ko nbctl lsp-add subnet1 veth0.node1

# second node
kubectl ko vsctl node2 add-port br-int veth0
kubectl ko vsctl node2 set Interface veth0 external_ids:iface-id=veth0.node2
kubectl ko nbctl lsp-add subnet1 veth0.node2

github-actions · 2023-12-11T00:00:53Z

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

oilbeater assigned hongzhen-ma May 28, 2021

bergwolf mentioned this issue May 31, 2021

[RFC] Direct Attachable CNIs For Kata Containers kata-containers/kata-containers#1922

Closed

jiangliu mentioned this issue Jul 14, 2021

[CRI] Support CRI plugins containerd/containerd#5728

Closed

xujunjie-cover mentioned this issue Aug 15, 2022

[Feature Request] Kube-OVN provide network capabilities without netNS. #1811

Closed

github-actions bot added the no-issue-activity label Dec 11, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature-request] Direct Attachable NICs for VM-based containers #837

[feature-request] Direct Attachable NICs for VM-based containers #837

bergwolf commented May 28, 2021

oilbeater commented May 31, 2021

bergwolf commented May 31, 2021

oilbeater commented Jun 2, 2021

bergwolf commented Jun 2, 2021

oilbeater commented Jun 15, 2021

bergwolf commented Jun 15, 2021

zhaojizhuang commented Jan 11, 2022

fizzers123 commented Apr 7, 2023 •

edited

Loading

fizzers123 commented Apr 8, 2023

github-actions bot commented Dec 11, 2023

[feature-request] Direct Attachable NICs for VM-based containers #837

[feature-request] Direct Attachable NICs for VM-based containers #837

Comments

bergwolf commented May 28, 2021

Background

Status

Proposal

oilbeater commented May 31, 2021

bergwolf commented May 31, 2021

oilbeater commented Jun 2, 2021

bergwolf commented Jun 2, 2021

oilbeater commented Jun 15, 2021

bergwolf commented Jun 15, 2021

zhaojizhuang commented Jan 11, 2022

fizzers123 commented Apr 7, 2023 • edited Loading

fizzers123 commented Apr 8, 2023

github-actions bot commented Dec 11, 2023

fizzers123 commented Apr 7, 2023 •

edited

Loading