-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VM access was blocked when eBPF dataplane used #6450
Comments
CC @tomastigera |
When we enable Calico eBPF dataplane, and a packet(e.g, a ping ICMP packet) destined for a VM of the host(VMs are ususually connected to host's physical interface through the macvtap/macvlan interface in either Bridge, VEPA or passthrough mode) from the physical interface, would be falsely bypassed by the eBPF program here and can't reach the target VM from the virtual interface (macvtap/macvlan). When a packet comes into the eBPF program of traffic control, its destination address(daddr) should be checked if it's for a known route by checking the route map, and if it's for an unknown route, it should be thought as it's not destined for this system, so we should just let it go through(skip) all our eBPF programs processing here by setting the action to TC_ACT_OK, which would skip for subsequent eBPF checkings and processings. So here we also should not check the unknown route traffic against FIB by bpf_fib_lookup (in forward_or_drop()), since in some systems, the lookup result would be successful like this: <idle>-0 [088] d.s. 1810775.267240: bpf_trace_printk: enp9s0---I: Traffic is towards the host namespace, doing Linux FIB lookup <idle>-0 [088] d.s. 1810775.267243: bpf_trace_printk: enp9s0---I: FIB lookup succeeded - with neigh <idle>-0 [088] d.s. 1810775.267244: bpf_trace_printk: enp9s0---I: Got Linux FIB hit, redirecting to iface 2. <idle>-0 [088] d.s. 1810775.267245: bpf_trace_printk: enp9s0---I: Traffic is towards host namespace, marking with 0x3000000. <idle>-0 [088] d.s. 1810775.267247: bpf_trace_printk: enp9s0---I: Final result=ALLOW (0). Program execution time: 31307ns <idle>-0 [088] d.s. 1810775.267249: bpf_trace_printk: enp9s0---E: New packet at ifindex=2; mark=3000000 <idle>-0 [088] d.s. 1810775.267250: bpf_trace_printk: enp9s0---E: Final result=ALLOW (3). Bypass mark bit set. and it's a wrong processing here since for the packet of a mark of 3000000 at the egress direction would be discarded by the system. On the other side, we also noticed in some systems, the issue of VM access blocking seems to be disappeared, and the packet can go through the eBPF program and finally reach the target VM. In this case, it does not mean the original action is correct, but just because the FIB lookup just fails here(see the log below), so the packet would be bypass by the eBPF program here with a mark 0x1000000: <idle>-0 [014] ..s. 17619198.981285: 0: eno1np0--I: Traffic is towards the host namespace, doing Linux FIB lookup <idle>-0 [014] ..s. 17619198.981287: 0: eno1np0--I: FIB lookup failed (FIB problem): 7. <idle>-0 [014] ..s. 17619198.981287: 0: eno1np0--I: Traffic is towards host namespace, marking with 0x1000000. <idle>-0 [014] ..s. 17619198.981288: 0: eno1np0--I: Final result=ALLOW (0). Program execution time: 16040ns So it can correctly skip the wrong marking action above. At the same time, we would like to say there is a similar processing for the unrelevant traffic in Cilium eBPF implementation: ep = lookup_ip4_endpoint(ip4); https://github.com/cilium/cilium/blob/master/bpf/bpf_host.c#L571 and if (!from_host) return CTX_ACT_OK; https://github.com/cilium/cilium/blob/master/bpf/bpf_host.c#L586 Here the endpoint of Cilium eBPF is similar to the route of Calico eBPF. This patch is also a fix for the issue of "VM access was blocked when eBPF dataplane used" projectcalico#6450 Signed-off-by: trevor tao <[email protected]>
I first met this issue on an arm64 platform, but it seems there is no such issue on some other platforms or systems, e.g, for some x86 systems. I checked the eBPF output log by setting bpfLogLevel to Debug, the output showed the differences between the 2 kinds of cases.
For other systems(x86 currently), the log showed:
The test process is the same for 2 systems: we just ping a VM in a host which had enabled Calico/ebpf dataplane from another host. I think for the packet destined for VMs instead of the host itself, it should be checked if it's actually for the host itself by checking the eBPF route map first. If the lookup result for route is unknown, it should be thought as NOT destined for this host and to be ok(TC_ACT_OK) to skip subsequent eBPF processing here. I saw there is a similar processing for the unrelevant traffic in Cilium eBPF implementation: and Here the endpoint of Cilium eBPF is similar to the route of Calico eBPF. I will put up a PR to address this issue and thanks for your review. The used versions of Calico: |
@tomastigera @mazdakn could you guys please take a look? |
When we enable Calico eBPF dataplane, and a packet(e.g, a ping ICMP packet) destined for a VM of the host(VMs are ususually connected to host's physical interface through the macvtap/macvlan interface in either Bridge, VEPA or passthrough mode) from the physical interface, would be falsely bypassed by the eBPF program here and can't reach the target VM from the virtual interface (macvtap/macvlan). When a packet comes into the eBPF program of traffic control, its destination address(daddr) should be checked if it's for a known route by checking the route map, and if it's for an unknown route, it should be thought as it's not destined for this system, so we should just let it go through(skip) all our eBPF programs processing here by setting the action to TC_ACT_OK, which would skip for subsequent eBPF checkings and processings. So here we also should not check the unknown route traffic against FIB by bpf_fib_lookup (in forward_or_drop()), since in some systems, the lookup result would be successful like this: <idle>-0 [088] d.s. 1810775.267240: bpf_trace_printk: enp9s0---I: Traffic is towards the host namespace, doing Linux FIB lookup <idle>-0 [088] d.s. 1810775.267243: bpf_trace_printk: enp9s0---I: FIB lookup succeeded - with neigh <idle>-0 [088] d.s. 1810775.267244: bpf_trace_printk: enp9s0---I: Got Linux FIB hit, redirecting to iface 2. <idle>-0 [088] d.s. 1810775.267245: bpf_trace_printk: enp9s0---I: Traffic is towards host namespace, marking with 0x3000000. <idle>-0 [088] d.s. 1810775.267247: bpf_trace_printk: enp9s0---I: Final result=ALLOW (0). Program execution time: 31307ns <idle>-0 [088] d.s. 1810775.267249: bpf_trace_printk: enp9s0---E: New packet at ifindex=2; mark=3000000 <idle>-0 [088] d.s. 1810775.267250: bpf_trace_printk: enp9s0---E: Final result=ALLOW (3). Bypass mark bit set. and it's a wrong processing here since for the packet of a mark of 3000000 at the egress direction would be discarded by the system. On the other side, we also noticed in some systems, the issue of VM access blocking seems to be disappeared, and the packet can go through the eBPF program and finally reach the target VM. In this case, it does not mean the original action is correct, but just because the FIB lookup just fails here(see the log below), so the packet would be bypass by the eBPF program here with a mark 0x1000000: <idle>-0 [014] ..s. 17619198.981285: 0: eno1np0--I: Traffic is towards the host namespace, doing Linux FIB lookup <idle>-0 [014] ..s. 17619198.981287: 0: eno1np0--I: FIB lookup failed (FIB problem): 7. <idle>-0 [014] ..s. 17619198.981287: 0: eno1np0--I: Traffic is towards host namespace, marking with 0x1000000. <idle>-0 [014] ..s. 17619198.981288: 0: eno1np0--I: Final result=ALLOW (0). Program execution time: 16040ns So it can correctly skip the wrong marking action above. At the same time, we would like to say there is a similar processing for the unrelevant traffic in Cilium eBPF implementation: ep = lookup_ip4_endpoint(ip4); https://github.com/cilium/cilium/blob/master/bpf/bpf_host.c#L571 and if (!from_host) return CTX_ACT_OK; https://github.com/cilium/cilium/blob/master/bpf/bpf_host.c#L586 Here the endpoint of Cilium eBPF is similar to the route of Calico eBPF. This patch is also a fix for the issue of "VM access was blocked when eBPF dataplane used" projectcalico#6450 Signed-off-by: trevor tao <[email protected]>
@TrevorTaoARM sorry for not responding sooner, totally missed this, 👀 now! And thanks for a great analysis! 🙏 |
@TrevorTaoARM I commented at your patch ⬆️ |
It seems like the packets ultimately ended up on the egress of the same device regardless of whether the FIB failed or not. But I am not quite sure how the packet looks like in the ARM case as that is missing in the logs when the BYPASS mark is set. Perhaps the host mangled that packet? |
Disable FIB, let the packet go through the host after it is policed. It is ingress into the system and we do not know what exactly is the packet's destination. It may be a local VM or something similar and we let the host to route it or dump it. projectcalico#6450
@tomastigera Yes, the difference of fib lookup results between the 2 platforms really confused me. But it looks like only when eBPF is enabled, the packet flow for a certain VM would be blocked. I didn't know when the BYPASS mark is set, what the subsequent data path for the packet is. The only trace I saw was: which showed the packet had been transfered to the egress direction, but for x86, the packet is still in the ingress direction: |
Disable FIB, let the packet go through the host after it is policed. It is ingress into the system and we do not know what exactly is the packet's destination. It may be a local VM or something similar and we let the host to route it or dump it. projectcalico#6450
@tomastigera Fixed but not complete
dropped by RPF check
|
@Dimonyga Not sure whether this is related to the original issue, however, if you apply bpf programs to eth0 in this setup, then surely you cannot pass a strict RPF because routing says that the return path is via |
sorry my mistakes When we start calico-node with |
* fix when CALI_ST_SKIP_FIB is set on the way to the host, set CALI_CT_FLAG_SKIP_FIB on conntrack - not just when from WEP * add test for ^^^ and issue projectcalico#6450 * In addition to skipping FIB when there is no route to post-dnat destination, also skip FIB when there is a route, but it is not local while there was no service involved. In that case, we are not forwarding a service (NodePort) to another node and we should only forward locally. Let the host decide what to do with such a packet. Fixes projectcalico#8918
* fix when CALI_ST_SKIP_FIB is set on the way to the host, set CALI_CT_FLAG_SKIP_FIB on conntrack - not just when from WEP * add test for ^^^ and issue projectcalico#6450 * In addition to skipping FIB when there is no route to post-dnat destination, also skip FIB when there is a route, but it is not local while there was no service involved. In that case, we are not forwarding a service (NodePort) to another node and we should only forward locally. Let the host decide what to do with such a packet. Fixes projectcalico#8918 (cherry picked from commit 327c4fd)
* fix when CALI_ST_SKIP_FIB is set on the way to the host, set CALI_CT_FLAG_SKIP_FIB on conntrack - not just when from WEP * add test for ^^^ and issue projectcalico#6450 * In addition to skipping FIB when there is no route to post-dnat destination, also skip FIB when there is a route, but it is not local while there was no service involved. In that case, we are not forwarding a service (NodePort) to another node and we should only forward locally. Let the host decide what to do with such a packet. Fixes projectcalico#8918
* fix when CALI_ST_SKIP_FIB is set on the way to the host, set CALI_CT_FLAG_SKIP_FIB on conntrack - not just when from WEP * add test for ^^^ and issue projectcalico#6450 * In addition to skipping FIB when there is no route to post-dnat destination, also skip FIB when there is a route, but it is not local while there was no service involved. In that case, we are not forwarding a service (NodePort) to another node and we should only forward locally. Let the host decide what to do with such a packet. Fixes projectcalico#8918
When I enabled the Calico eBPF dataplane for a K8s cluster, the VMs(for which the NIC was bridged on the physical NIC of the server) on the node which had been configured with the eBPF dataplane can't be accessed with normal ssh access.
When the kube-proxy was restored and eBPF DP disabled, the SSH access to VM was also restored.
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
The following script was used to enable eBPF dataplane:
#!/bin/bash
set -x
WORKDIR=$(pwd)
TMP_DIR=$(mktemp -d)
MARCH=$(uname -m)
CALICO_VERSION=${1:-3.23.2}
if [ $MARCH == "aarch64" ]; then ARCH=arm64;
elif [ $MARCH == "x86_64" ]; then ARCH=amd64;
else ARCH="unknown";
fi
echo ARCH=$ARCH
k8s_ep=$(kubectl get endpoints kubernetes -o wide | grep kubernetes | cut -d " " -f 4)
k8s_host=$(echo $k8s_ep | cut -d ":" -f 1)
k8s_port=$(echo $k8s_ep | cut -d ":" -f 2)
cat < ${WORKDIR}/k8s_service.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: kubernetes-services-endpoint
namespace: kube-system
data:
KUBERNETES_SERVICE_HOST: "KUBERNETES_SERVICE_HOST"
KUBERNETES_SERVICE_PORT: "KUBERNETES_SERVICE_PORT"
EOF
sed -i "s/KUBERNETES_SERVICE_HOST/${k8s_host}/" ${WORKDIR}/k8s_service.yaml
sed -i "s/KUBERNETES_SERVICE_PORT/${k8s_port}/" ${WORKDIR}/k8s_service.yaml
kubectl apply -f ${WORKDIR}/k8s_service.yaml
echo "Disable kube-proxy:"
kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
if [ ! -f /usr/local/bin/calicoctl ]; then
echo "No calicoctl, install now:"
curl -L https://github.com/projectcalico/calico/releases/download/v${CALICO_VERSION}/calicoctl-linux-${ARCH} -o ${WORKDIR}/calicoctl;
chmod +x ${WORKDIR}/calicoctl;
sudo cp ${WORKDIR}/calicoctl /usr/local/bin;
rm ${WORKDIR}/calicoctl
fi
echo "Enable eBPF:"
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": true}}' --allow-version-mismatch
echo "Enable Direct Server Return(DSR) mode: optional"
#calicoctl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "DSR"}}'
Context
I try to access the VM(10.169.210.139) which was located in a server with Calico eBPF enabled from another server(10.169.242.130), only the first ping packet can be received, and other ping packets were lost.
The conntrack for the Calico node showed the ssh access (from 10.169.242.130) to VM(10.169.210.139):
# calico-node -bpf conntrack dump |grep "10.169.210.139"
2022-07-15 08:21:37.276 [INFO][13703] confd/maps.go 433: Loaded map file descriptor. fd=0x7 name="/sys/fs/bpf/tc/globals/cali_v4_ct2"
ConntrackKey{proto=6 10.169.242.130:61701 <-> 10.169.210.139:22} -> Entry{Type:0, Created:17278773931441431, LastSeen:17278777015499210, Flags: Data: {A2B:{Seqno:92691206 SynSeen:true AckSeen:true FinSeen:false RstSeen:false Whitelisted:true Opener:true Ifindex:2} B2A:{Seqno:959809259 SynSeen:true AckSeen:true FinSeen:false RstSeen:false Whitelisted:false Opener:false Ifindex:0} OrigDst:0.0.0.0 OrigPort:0 OrigSPort:0 TunIP:0.0.0.0}} Age: 3.143463957s Active ago 59.406178ms ESTABLISHED
Your Environment
The text was updated successfully, but these errors were encountered: