-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature-request] Direct Attachable NICs for VM-based containers #837
Comments
After some investigation, the issue that kubelet will try to enter pod netns and inspect eth0 address is an implementation of dockershim. Unfortunately most of our users still use docker, so we have to adapt it in Kube-OVN side. The step will like this:
We know that for kata the extra netns and addresses on the tap device is not required. But for other CRIs especially docker, these steps are required. |
@oilbeater Thanks a lot! I agree that it is better to keep netns and addresses on the tap device for compatibility with other container runtimes. As for the pod annotation While kube-ovn only implements tap-based NICs at the moment, we want the interface to be future-proof and allow more possibilities. And kube-ovn or other CNIs can choose to implement more NIC types in the future. wdyt? |
@bergwolf we already use this annotation to support |
@oilbeater Fair enough. We can make it (the annotation entirely) a config option for containerd so that kata can request different nic types via runtime handler config. Something like |
@bergwolf as we have discussed that when Another way is to use ovs internal port which can be moved into netns and has better performance than veth-pair. Can you help to provide some guide about how qemu can integrated with ovs internal port? So that we can check if this method can work. |
@oilbeater What is special with ovs internal port? QEMU works well with tap devices on the host. IIUC, ovs internal port is still a tap device to its users. If so, it should JUST WORK (TM) ;) |
Any progress ? |
I would be interested in this feature as well. Does Kube-OVN provide the functionality to add a veth (or any other interface) to a Subnet ? |
I managed to get this to working by adding a veth1 to the VMs via macvtap. ip link add veth0 type veth peer name veth1
ip link set veth0 up
ip link set veth1 up Then adding veth0 to kube-ovn with the following commands # first node
kubectl ko vsctl node1 add-port br-int veth0
kubectl ko vsctl node1 set Interface veth0 external_ids:iface-id=veth0.node1
kubectl ko nbctl lsp-add subnet1 veth0.node1
# second node
kubectl ko vsctl node2 add-port br-int veth0
kubectl ko vsctl node2 set Interface veth0 external_ids:iface-id=veth0.node2
kubectl ko nbctl lsp-add subnet1 veth0.node2 |
Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed. |
Background
Kata Containers is an open source container runtime, building lightweight virtual machines that seamlessly plug into the containers ecosystem. It aims to bring the speed of a container and the security of a virtual machine to its users.
As Kata Containers matures, how it interacts with Kubernetes CNI and connects to the outside network, has become increasingly important. The issue covers the current status of the Kata Containers networking model, its pros and cons, and a proposal to further improve it. We'd like to work with the kube-ovn community to implement an optimized network solution for VM-based containers like Kata Containers.
Status
A classic CNI deployment would result in a networking model like below:
Where a pod sits inside a network namespace, and connects to the outside world via a veth pair. In order to work with this networking model, Kata Containers has implemented a TC based networking model.
Where inside the pod network namespace, a tap device tap0_kata is created and Kata sets up TC mirror rules to copy packages between eth0 and tap0_kata. The eth0 device is a veth pair endpoint and its peer is a veth device attached to the host bridge. So the data flow is like:
As we can see, there are as many as five jumps before a package can reach the guest on the host. The network stack jumps are costly and the architecture needs to be simplified.
Proposal
We can see that all Kata need is a tap device on the host, and it doesn't care how it is created (being it a tuntap, or a ovs tap, or a ipvtap, or a macvtap). So we can create a simple architecture and use tap devices (or similar devices) as the pod network setup entrypoint rather than veth pairs. Something like:
With this architecture, we can remove the need for a host network namespace, and the veth-pair to connect through it. And we don't care how the tap device is created so that CNI plugins can still have different implementation details hidden from us.
A possible control flow for the direct attachable CNIs:
To make it work, kube-ovn will need to be notified that the CNI
ADD
command is to create a direct attachable network device, and return its information back to CRI runtime (e.g., containerd). Then CRI runtime can pass the NIC information to Kata Containers and it will be further handled there.Please help to review/comment if the proposal is reasonable and doable. Thanks a lot!
Ref: Kata Containers corresponding issue kata-containers/kata-containers#1922
The text was updated successfully, but these errors were encountered: