Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature-request] Direct Attachable NICs for VM-based containers #837

Closed
bergwolf opened this issue May 28, 2021 · 10 comments
Closed

[feature-request] Direct Attachable NICs for VM-based containers #837

bergwolf opened this issue May 28, 2021 · 10 comments
Assignees

Comments

@bergwolf
Copy link

Background

Kata Containers is an open source container runtime, building lightweight virtual machines that seamlessly plug into the containers ecosystem. It aims to bring the speed of a container and the security of a virtual machine to its users.

As Kata Containers matures, how it interacts with Kubernetes CNI and connects to the outside network, has become increasingly important. The issue covers the current status of the Kata Containers networking model, its pros and cons, and a proposal to further improve it. We'd like to work with the kube-ovn community to implement an optimized network solution for VM-based containers like Kata Containers.

Status

A classic CNI deployment would result in a networking model like below:
image

Where a pod sits inside a network namespace, and connects to the outside world via a veth pair. In order to work with this networking model, Kata Containers has implemented a TC based networking model.
image
Where inside the pod network namespace, a tap device tap0_kata is created and Kata sets up TC mirror rules to copy packages between eth0 and tap0_kata. The eth0 device is a veth pair endpoint and its peer is a veth device attached to the host bridge. So the data flow is like:
image
As we can see, there are as many as five jumps before a package can reach the guest on the host. The network stack jumps are costly and the architecture needs to be simplified.

Proposal

We can see that all Kata need is a tap device on the host, and it doesn't care how it is created (being it a tuntap, or a ovs tap, or a ipvtap, or a macvtap). So we can create a simple architecture and use tap devices (or similar devices) as the pod network setup entrypoint rather than veth pairs. Something like:
image
With this architecture, we can remove the need for a host network namespace, and the veth-pair to connect through it. And we don't care how the tap device is created so that CNI plugins can still have different implementation details hidden from us.

A possible control flow for the direct attachable CNIs:
image

To make it work, kube-ovn will need to be notified that the CNI ADD command is to create a direct attachable network device, and return its information back to CRI runtime (e.g., containerd). Then CRI runtime can pass the NIC information to Kata Containers and it will be further handled there.

Please help to review/comment if the proposal is reasonable and doable. Thanks a lot!

Ref: Kata Containers corresponding issue kata-containers/kata-containers#1922

@oilbeater
Copy link
Collaborator

After some investigation, the issue that kubelet will try to enter pod netns and inspect eth0 address is an implementation of dockershim. Unfortunately most of our users still use docker, so we have to adapt it in Kube-OVN side.

The step will like this:

  1. The Pod need a new annotation to tell kube-ovn to create a tap device rather than default veth pair. The annotation may like ovn.kubernetes.io/pod_nic_type=tap and it can be an installation options to set the default nic type to tap later.
  2. When CNI ADD is invoked kube-ovn-cni will read the annotation above to decide the pod nic type. For tap nic, it will create a tap device, link it to ovs and move it to the Pod netns and set the ip/mac/route. For compatible with dockershim, we also need to create a dummy eth0 with the same ip but the link status is down.
  3. kube-ovn-cni then return CNI response with the tap device name in the interface field https://github.com/containernetworking/cni/blob/v0.8.1/pkg/types/current/types.go#L127
  4. Then containerd and kata can use this response to setup its own network.

We know that for kata the extra netns and addresses on the tap device is not required. But for other CRIs especially docker, these steps are required.

@bergwolf
Copy link
Author

@oilbeater Thanks a lot! I agree that it is better to keep netns and addresses on the tap device for compatibility with other container runtimes.

As for the pod annotation ovn.kubernetes.io/pod_nic_type=tap, can we make it something like ovn.kubernetes.io/pod_nic_direct_attachable=true as we discussed during the hackathon? The idea is to make the interface general enough to allow VFIO or vhost-user based NICs to be usable in the same workflow.

While kube-ovn only implements tap-based NICs at the moment, we want the interface to be future-proof and allow more possibilities. And kube-ovn or other CNIs can choose to implement more NIC types in the future.

wdyt?

@oilbeater
Copy link
Collaborator

As for the pod annotation ovn.kubernetes.io/pod_nic_type=tap, can we make it something like ovn.kubernetes.io/pod_nic_direct_attachable=true

@bergwolf we already use this annotation to support veth and ovs internal port type nic. It's more nature to reuse this annotation and we can use different annotation values to implement different type interface in the future

@bergwolf
Copy link
Author

bergwolf commented Jun 2, 2021

@oilbeater Fair enough. We can make it (the annotation entirely) a config option for containerd so that kata can request different nic types via runtime handler config. Something like runtime_cni_annotations = ["annotation ovn.kubernetes.io/pod_nic_type=tap"] for each runtime handler.

@oilbeater
Copy link
Collaborator

@bergwolf as we have discussed that when tap device is moved to netns, OVS will lost connection to it. That means we have to leave the tap device in host netns, however it will break other CRI‘s assumption about network.

Another way is to use ovs internal port which can be moved into netns and has better performance than veth-pair. Can you help to provide some guide about how qemu can integrated with ovs internal port? So that we can check if this method can work.

@bergwolf
Copy link
Author

@oilbeater What is special with ovs internal port? QEMU works well with tap devices on the host. IIUC, ovs internal port is still a tap device to its users. If so, it should JUST WORK (TM) ;)

@zhaojizhuang
Copy link

Any progress ?

@fizzers123
Copy link

fizzers123 commented Apr 7, 2023

I would be interested in this feature as well.

Does Kube-OVN provide the functionality to add a veth (or any other interface) to a Subnet ?
Thanks to kubevirt/macvtap-cni#98 we would then be able to connect to KubeVirt.

@fizzers123
Copy link

I managed to get this to working by adding a veth1 to the VMs via macvtap.

ip link add veth0 type veth peer name veth1
ip link set veth0 up
ip link set veth1 up

Then adding veth0 to kube-ovn with the following commands

# first node
kubectl ko vsctl node1 add-port br-int veth0
kubectl ko vsctl node1 set Interface veth0 external_ids:iface-id=veth0.node1
kubectl ko nbctl lsp-add subnet1 veth0.node1

# second node
kubectl ko vsctl node2 add-port br-int veth0
kubectl ko vsctl node2 set Interface veth0 external_ids:iface-id=veth0.node2
kubectl ko nbctl lsp-add subnet1 veth0.node2

Copy link
Contributor

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants