Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent-protocol-forwarder failing to bootstrap because of routes conflicting #1909

Closed
beraldoleal opened this issue Jul 5, 2024 · 4 comments · Fixed by #1920
Closed

agent-protocol-forwarder failing to bootstrap because of routes conflicting #1909

beraldoleal opened this issue Jul 5, 2024 · 4 comments · Fixed by #1920
Assignees

Comments

@beraldoleal
Copy link
Member

I'm provisioning a new provider (GCP/GKE) and for some reason on the PodVM the agent-protocol-forwarder is having some issues to bootstrap.

After realizing the forwarder was not starting due to vxlan conflicts, I deleted the vxlan manually and executed the following tests:

Here is the network on the PodVM before:

[root@fedora ~]# ip netns exec podns ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
5: veth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:e0:a7:64:cd:11 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::54e0:a7ff:fe64:cd11/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

[root@fedora ~]# ip netns exec podns ip r
169.254.169.254 dev veth2 scope link

Note that there is no vxlan interface and no routes at this point. Then I start the agent-protocol-forwarder:

/usr/local/bin/agent-protocol-forwarder -kata-agent-namespace /run/netns/podns -kata-agent-socket /run/kata-containers/agent.sock $TLS_OPTIONS $OPTIONS

agent-protocol-forwarder version v0.8.2-dev
  commit: 277e518665c96e5df62b3e63b2e9dae3ac5bf587-dirty
  go: go1.21.11
/usr/local/bin/agent-protocol-forwarder: error running a service *forwarder.daemon: failed to set up pod network: failed to set up tunnel "vxlan": failed to add a route to 10.120.1.0/24 via 10.120.1.1 on pod network namespace /run/netns/podns: failed to create a route (table: 0, dest: 10.120.1.0/24, gw: 10.120.1.1) with flags 0: file exists

It is actually complaining the route already exists, but wait, there was no route before on the podns, but If check now:

# ip netns exec podns ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
5: veth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 56:e0:a7:64:cd:11 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::54e0:a7ff:fe64:cd11/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
13: vxlan-peerpods@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 66:a2:52:de:5d:35 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.120.1.13/24 brd 10.120.1.255 scope global vxlan-peerpods
       valid_lft forever preferred_lft forever
    inet6 fe80::64a2:52ff:fede:5d35/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

[root@fedora ~]# ip netns exec podns ip r
default via 10.120.1.1 dev vxlan-peerpods
10.120.1.0/24 dev vxlan-peerpods proto kernel scope link src 10.120.1.13
10.120.1.1 dev vxlan-peerpods scope link
169.254.169.254 dev veth2 scope link

GKE network is currently a "default" VPC with one subnet for each region (also called "default"). Both VMs and GKE nodes are using this default VPC. What I tried so far:

  • Changed the VXLAN_PORT as suggested by @bpradipt
  • Tried GKE cluster with -no-alias-ip
  • Investigated the possibility of using Calico, but its not supported by GKE to change the CNI plugin

Here is also the daemon.json:

    "pod-network": {
        "podip": "10.84.0.9/24",
        "pod-hw-addr": "ba:47:d0:73:da:b9",
        "interface": "eth0",
        "worker-node-ip": "10.138.0.9/32",
        "tunnel-type": "vxlan",
        "routes": [
            {
                "Dst": "0.0.0.0/0",
                "GW": "10.84.0.1",
                "Dev": "eth0"
            },
            {
                "Dst": "10.84.0.0/24",
                "GW": "10.84.0.1",
                "Dev": "eth0"
            },
            {
                "Dst": "10.84.0.1/32",
                "GW": "",
                "Dev": "eth0"
            }
        ],
        "mtu": 1460,
        "index": 0,
        "vxlan-port": 4790,
        "vxlan-id": 555000,
        "dedicated": false
    },
    "pod-namespace": "coco-pp-e2e-test-5fa2ec1f",
    "pod-name": "simple-test",

And detailed interfaces for both PodVM and podns:

[root@fedora ~]# ip -d a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 42:01:0a:8a:00:0d brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 parentbus virtio parentdev virtio1 
    altname enp0s4
    inet 10.138.0.13/32 metric 1024 scope global dynamic ens4
       valid_lft 3444sec preferred_lft 3444sec
    inet6 fe80::4001:aff:fe8a:d/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever
4: veth1@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:e1:0f:21:53:2a brd ff:ff:ff:ff:ff:ff link-netns podns promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 
    veth numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet 169.254.99.99/32 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::4e1:fff:fe21:532a/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever
[root@fedora ~]# ip netns exec podns ip -d a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo 
       valid_lft forever preferred_lft forever
3: vxlan-peerpods@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether ba:47:d0:73:da:b9 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 
    vxlan id 555000 remote 10.138.0.9 srcport 0 0 dstport 4790 ttl auto ageing 300 nolearning numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet 10.84.0.9/24 brd 10.84.0.255 scope global vxlan-peerpods
       valid_lft forever preferred_lft forever
    inet6 fe80::b847:d0ff:fe73:dab9/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever
5: veth2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8a:ca:98:5c:3f:c5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 
    veth numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536 
    inet6 fe80::88ca:98ff:fe5c:3fc5/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

Full agent-protocol-forwarder log:

https://paste.centos.org/view/4cc600f9

@beraldoleal
Copy link
Member Author

Based on current code my suspect is that when creating the interface, kernel will bring routes by default, and the agent is trying to create again. Maybe we should check if the routes are there before creating? Or maybe I'm still missing something.

@yoheiueda
Copy link
Member

@beraldoleal could you get additional network info in the worker node?

If you can identify the network namespace for the pod in worker node side, please post the output of ip netns exec <namespace> ip -d a and ip netns exec <namespace> ip -d r.

In my environment, a network namespace of a pod is shown as follows.

# ip netns ls
cni-bda0c478-402f-b1b8-9153-ae987752a8b2 (id: 2)

Alternatively, you can create a usual runc pod, and execute kubectl exec <pod> ip -d a and kubectl exec <pod> ip -d r

I guess, the GKE CNI manipulates routes automatically set by kernel.

@beraldoleal
Copy link
Member Author

beraldoleal commented Jul 5, 2024

Since the pod is still in ContainerCreating, I could not get the ns. Here are all namespace details:

https://paste.centos.org/view/3cf115da

And the detailed address and routes for the pod:

kubectl exec hello-world2 -- sh -c "ip -d address; ip -d route"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP group default 
    link/ether f6:fb:e1:48:e8:67 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535 
    veth numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    inet 10.84.1.16/24 brd 10.84.1.255 scope global eth0
       valid_lft forever preferred_lft forever


unicast default via 10.84.1.1 dev eth0 proto boot scope global 
unicast 10.84.1.0/24 via 10.84.1.1 dev eth0 proto boot scope global src 10.84.1.16 
unicast 10.84.1.1 dev eth0 proto boot scope link src 10.84.1.16 

@yoheiueda
Copy link
Member

The GKE CNI plugin is something similar to the PTP CNI plugin, which deletes a route that is automatically set by kernel as follows.

https://github.com/containernetworking/plugins/blob/acf8ddc8e1128e6f68a34f7fe91122afeb1fa93d/plugins/main/ptp/ptp.go#L58-L61

	// Our solution is to configure the interface with 192.168.3.5/24, then delete the
	// "192.168.3.0/24 dev $ifName" route that was automatically added. Then we add
	// "192.168.3.1/32 dev $ifName" and "192.168.3.0/24 via 192.168.3.1 dev $ifName".
	// In other words we force all traffic to ARP via the gateway except for GW itself.

So, we need to delete a route that is set by kernel in the peer-pod side as well.

I will think of how to fix it.

@beraldoleal beraldoleal changed the title agent-protocol-forwarder failing to boostreap because of routes conflicting agent-protocol-forwarder failing to bootstrap because of routes conflicting Jul 10, 2024
yoheiueda added a commit to yoheiueda/cloud-api-adaptor that referenced this issue Jul 12, 2024
CNI plugins like PTP and GKE remove a route that
is automatically added by kernel for eth0, and then
add another route for the same destination.

This patch changes the code to manipulates routes to
support such CNI plugins.

Fixes confidential-containers#1909

Signed-off-by: Yohei Ueda <[email protected]>
yoheiueda added a commit to yoheiueda/cloud-api-adaptor that referenced this issue Jul 12, 2024
CNI plugins like PTP and GKE remove a route that
is automatically added by kernel for eth0, and then
add another route for the same destination.

This patch changes the code to manipulates routes to
support such CNI plugins.

Fixes confidential-containers#1909

Signed-off-by: Yohei Ueda <[email protected]>
yoheiueda added a commit to yoheiueda/cloud-api-adaptor that referenced this issue Jul 13, 2024
CNI plugins like PTP and GKE remove a route that
is automatically added by kernel for eth0, and then
add another route for the same destination.

This patch changes the code to manipulates routes to
support such CNI plugins.

Fixes confidential-containers#1909

Signed-off-by: Yohei Ueda <[email protected]>
yoheiueda added a commit to yoheiueda/cloud-api-adaptor that referenced this issue Jul 16, 2024
CNI plugins like PTP and GKE remove a route that
is automatically added by kernel for eth0, and then
add another route for the same destination.

This patch changes the code to manipulates routes to
support such CNI plugins.

Fixes confidential-containers#1909

Signed-off-by: Yohei Ueda <[email protected]>
yoheiueda added a commit to yoheiueda/cloud-api-adaptor that referenced this issue Jul 18, 2024
CNI plugins like PTP and GKE remove a route that
is automatically added by kernel for eth0, and then
add another route for the same destination.

This patch changes the code to manipulates routes to
support such CNI plugins.

Fixes confidential-containers#1909

Signed-off-by: Yohei Ueda <[email protected]>
beraldoleal added a commit to beraldoleal/cloud-api-adaptor that referenced this issue Jul 18, 2024
This is basically Cfir's work with some modifications to support the
repository layout and small fixes. Rigth now, GKE is not supported
because confidential-containers#1909, so this initial implementation requires a k8s cluster
(either local or at Google Compute Engine).

Signed-off-by: Cfir Cohen <[email protected]>
Signed-off-by: Beraldo Leal <[email protected]>
beraldoleal added a commit to beraldoleal/cloud-api-adaptor that referenced this issue Jul 18, 2024
This is basically Cfir's work with some modifications to support the
repository layout and small fixes. Rigth now, GKE is not supported
because confidential-containers#1909, so this initial implementation requires a k8s cluster
(either local or at Google Compute Engine).

Signed-off-by: Cfir Cohen <[email protected]>
Signed-off-by: Beraldo Leal <[email protected]>
beraldoleal added a commit to beraldoleal/cloud-api-adaptor that referenced this issue Jul 18, 2024
This is basically Cfir's work with some modifications to support the
repository layout and small fixes. Rigth now, GKE is not supported
because confidential-containers#1909, so this initial implementation requires a k8s cluster
(either local or at Google Compute Engine).

Signed-off-by: Cfir Cohen <[email protected]>
Signed-off-by: Beraldo Leal <[email protected]>
beraldoleal added a commit to beraldoleal/cloud-api-adaptor that referenced this issue Jul 19, 2024
This is basically Cfir's work with some modifications to support the
repository layout and small fixes. Rigth now, GKE is not supported
because confidential-containers#1909, so this initial implementation requires a k8s cluster
(either local or at Google Compute Engine).

Signed-off-by: Cfir Cohen <[email protected]>
Signed-off-by: Beraldo Leal <[email protected]>
bpradipt pushed a commit to bpradipt/cloud-api-adaptor that referenced this issue Jul 26, 2024
CNI plugins like PTP and GKE remove a route that
is automatically added by kernel for eth0, and then
add another route for the same destination.

This patch changes the code to manipulates routes to
support such CNI plugins.

Fixes confidential-containers#1909

Signed-off-by: Yohei Ueda <[email protected]>
beraldoleal added a commit to beraldoleal/cloud-api-adaptor that referenced this issue Aug 7, 2024
This is basically Cfir's work with some modifications to support the
repository layout and small fixes. Rigth now, GKE is not supported
because confidential-containers#1909, so this initial implementation requires a k8s cluster
(either local or at Google Compute Engine).

Signed-off-by: Cfir Cohen <[email protected]>
Signed-off-by: Beraldo Leal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants