agent-protocol-forwarder failing to bootstrap because of routes conflicting #1909

beraldoleal · 2024-07-05T14:52:07Z

I'm provisioning a new provider (GCP/GKE) and for some reason on the PodVM the agent-protocol-forwarder is having some issues to bootstrap.

After realizing the forwarder was not starting due to vxlan conflicts, I deleted the vxlan manually and executed the following tests:

Here is the network on the PodVM before:

[root@fedora ~]# ip netns exec podns ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
5: veth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:e0:a7:64:cd:11 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::54e0:a7ff:fe64:cd11/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

[root@fedora ~]# ip netns exec podns ip r
169.254.169.254 dev veth2 scope link

Note that there is no vxlan interface and no routes at this point. Then I start the agent-protocol-forwarder:

/usr/local/bin/agent-protocol-forwarder -kata-agent-namespace /run/netns/podns -kata-agent-socket /run/kata-containers/agent.sock $TLS_OPTIONS $OPTIONS

agent-protocol-forwarder version v0.8.2-dev
  commit: 277e518665c96e5df62b3e63b2e9dae3ac5bf587-dirty
  go: go1.21.11
/usr/local/bin/agent-protocol-forwarder: error running a service *forwarder.daemon: failed to set up pod network: failed to set up tunnel "vxlan": failed to add a route to 10.120.1.0/24 via 10.120.1.1 on pod network namespace /run/netns/podns: failed to create a route (table: 0, dest: 10.120.1.0/24, gw: 10.120.1.1) with flags 0: file exists

It is actually complaining the route already exists, but wait, there was no route before on the podns, but If check now:

# ip netns exec podns ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
5: veth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:e0:a7:64:cd:11 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::54e0:a7ff:fe64:cd11/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
13: vxlan-peerpods@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 66:a2:52:de:5d:35 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.120.1.13/24 brd 10.120.1.255 scope global vxlan-peerpods
       valid_lft forever preferred_lft forever
    inet6 fe80::64a2:52ff:fede:5d35/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

[root@fedora ~]# ip netns exec podns ip r
default via 10.120.1.1 dev vxlan-peerpods
10.120.1.0/24 dev vxlan-peerpods proto kernel scope link src 10.120.1.13
10.120.1.1 dev vxlan-peerpods scope link
169.254.169.254 dev veth2 scope link

GKE network is currently a "default" VPC with one subnet for each region (also called "default"). Both VMs and GKE nodes are using this default VPC. What I tried so far:

Changed the VXLAN_PORT as suggested by @bpradipt
Tried GKE cluster with -no-alias-ip
Investigated the possibility of using Calico, but its not supported by GKE to change the CNI plugin

Here is also the daemon.json:

    "pod-network": {
        "podip": "10.84.0.9/24",
        "pod-hw-addr": "ba:47:d0:73:da:b9",
        "interface": "eth0",
        "worker-node-ip": "10.138.0.9/32",
        "tunnel-type": "vxlan",
        "routes": [
            {
                "Dst": "0.0.0.0/0",
                "GW": "10.84.0.1",
                "Dev": "eth0"
            },
            {
                "Dst": "10.84.0.0/24",
                "GW": "10.84.0.1",
                "Dev": "eth0"
            },
            {
                "Dst": "10.84.0.1/32",
                "GW": "",
                "Dev": "eth0"
            }
        ],
        "mtu": 1460,
        "index": 0,
        "vxlan-port": 4790,
        "vxlan-id": 555000,
        "dedicated": false
    },
    "pod-namespace": "coco-pp-e2e-test-5fa2ec1f",
    "pod-name": "simple-test",

And detailed interfaces for both PodVM and podns:

[root@fedora ~]# ip -d a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 42:01:0a:8a:00:0d brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 parentbus virtio parentdev virtio1 
    altname enp0s4
    inet 10.138.0.13/32 metric 1024 scope global dynamic ens4
       valid_lft 3444sec preferred_lft 3444sec
    inet6 fe80::4001:aff:fe8a:d/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever
4: veth1@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:e1:0f:21:53:2a brd ff:ff:ff:ff:ff:ff link-netns podns promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 
    veth numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet 169.254.99.99/32 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::4e1:fff:fe21:532a/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

[root@fedora ~]# ip netns exec podns ip -d a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo 
       valid_lft forever preferred_lft forever
3: vxlan-peerpods@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether ba:47:d0:73:da:b9 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 
    vxlan id 555000 remote 10.138.0.9 srcport 0 0 dstport 4790 ttl auto ageing 300 nolearning numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet 10.84.0.9/24 brd 10.84.0.255 scope global vxlan-peerpods
       valid_lft forever preferred_lft forever
    inet6 fe80::b847:d0ff:fe73:dab9/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever
5: veth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8a:ca:98:5c:3f:c5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 
    veth numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet6 fe80::88ca:98ff:fe5c:3fc5/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

Full agent-protocol-forwarder log:

https://paste.centos.org/view/4cc600f9

The text was updated successfully, but these errors were encountered:

beraldoleal · 2024-07-05T14:53:19Z

Based on current code my suspect is that when creating the interface, kernel will bring routes by default, and the agent is trying to create again. Maybe we should check if the routes are there before creating? Or maybe I'm still missing something.

yoheiueda · 2024-07-05T15:02:14Z

@beraldoleal could you get additional network info in the worker node?

If you can identify the network namespace for the pod in worker node side, please post the output of ip netns exec <namespace> ip -d a and ip netns exec <namespace> ip -d r.

In my environment, a network namespace of a pod is shown as follows.

# ip netns ls
cni-bda0c478-402f-b1b8-9153-ae987752a8b2 (id: 2)

Alternatively, you can create a usual runc pod, and execute kubectl exec <pod> ip -d a and kubectl exec <pod> ip -d r

I guess, the GKE CNI manipulates routes automatically set by kernel.

beraldoleal · 2024-07-05T15:33:38Z

Since the pod is still in ContainerCreating, I could not get the ns. Here are all namespace details:

https://paste.centos.org/view/3cf115da

And the detailed address and routes for the pod:

kubectl exec hello-world2 -- sh -c "ip -d address; ip -d route"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP group default 
    link/ether f6:fb:e1:48:e8:67 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535 
    veth numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    inet 10.84.1.16/24 brd 10.84.1.255 scope global eth0
       valid_lft forever preferred_lft forever


unicast default via 10.84.1.1 dev eth0 proto boot scope global 
unicast 10.84.1.0/24 via 10.84.1.1 dev eth0 proto boot scope global src 10.84.1.16 
unicast 10.84.1.1 dev eth0 proto boot scope link src 10.84.1.16

yoheiueda · 2024-07-08T07:56:52Z

The GKE CNI plugin is something similar to the PTP CNI plugin, which deletes a route that is automatically set by kernel as follows.

https://github.com/containernetworking/plugins/blob/acf8ddc8e1128e6f68a34f7fe91122afeb1fa93d/plugins/main/ptp/ptp.go#L58-L61

	// Our solution is to configure the interface with 192.168.3.5/24, then delete the
	// "192.168.3.0/24 dev $ifName" route that was automatically added. Then we add
	// "192.168.3.1/32 dev $ifName" and "192.168.3.0/24 via 192.168.3.1 dev $ifName".
	// In other words we force all traffic to ARP via the gateway except for GW itself.

So, we need to delete a route that is set by kernel in the peer-pod side as well.

I will think of how to fix it.

CNI plugins like PTP and GKE remove a route that is automatically added by kernel for eth0, and then add another route for the same destination. This patch changes the code to manipulates routes to support such CNI plugins. Fixes confidential-containers#1909 Signed-off-by: Yohei Ueda <[email protected]>

This is basically Cfir's work with some modifications to support the repository layout and small fixes. Rigth now, GKE is not supported because confidential-containers#1909, so this initial implementation requires a k8s cluster (either local or at Google Compute Engine). Signed-off-by: Cfir Cohen <[email protected]> Signed-off-by: Beraldo Leal <[email protected]>

CNI plugins like PTP and GKE remove a route that is automatically added by kernel for eth0, and then add another route for the same destination. This patch changes the code to manipulates routes to support such CNI plugins. Fixes confidential-containers#1909 Signed-off-by: Yohei Ueda <[email protected]>

This is basically Cfir's work with some modifications to support the repository layout and small fixes. Rigth now, GKE is not supported because confidential-containers#1909, so this initial implementation requires a k8s cluster (either local or at Google Compute Engine). Signed-off-by: Cfir Cohen <[email protected]> Signed-off-by: Beraldo Leal <[email protected]>

beraldoleal assigned yoheiueda Jul 5, 2024

beraldoleal changed the title ~~agent-protocol-forwarder failing to boostreap because of routes conflicting~~ agent-protocol-forwarder failing to bootstrap because of routes conflicting Jul 10, 2024

yoheiueda mentioned this issue Jul 12, 2024

podnetwork: Support CNI plugins like PTP and GKE #1920

Merged

bpradipt closed this as completed in #1920 Aug 3, 2024

bpradipt closed this as completed in e18ef16 Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-protocol-forwarder failing to bootstrap because of routes conflicting #1909

agent-protocol-forwarder failing to bootstrap because of routes conflicting #1909

beraldoleal commented Jul 5, 2024

beraldoleal commented Jul 5, 2024

yoheiueda commented Jul 5, 2024

beraldoleal commented Jul 5, 2024 •

edited

Loading

yoheiueda commented Jul 8, 2024

agent-protocol-forwarder failing to bootstrap because of routes conflicting #1909

agent-protocol-forwarder failing to bootstrap because of routes conflicting #1909

Comments

beraldoleal commented Jul 5, 2024

beraldoleal commented Jul 5, 2024

yoheiueda commented Jul 5, 2024

beraldoleal commented Jul 5, 2024 • edited Loading

yoheiueda commented Jul 8, 2024

beraldoleal commented Jul 5, 2024 •

edited

Loading