NAT64 implementation for Kubernetes deployments (mainly)
Despite you can use IPv6 only in Kubernetes since 2019 the Internet is still far of having parity between both IPv4 and IPv6 worlds. DNS64 and NAT64 are commonly used to solve this problem and Kubernetes is not different, on the contrary, thanks to its "simple" network principlies this model is easy to implement.
+---------------------+ +---------------+
|IPv6 network | | IPv4 |
| | +-------------+ | network |
| |--| Name server |--| |
| | | with DNS64 | | +----+ |
| +----+ | +-------------+ | | H2 | |
| | H1 |---| | | +----+ |
| +----+ | +-------+ | 192.0.2.1 |
|2001:db8::1|------| NAT64 |----| |
| | +-------+ | |
| | | | |
+---------------------+ +---------------+
[rfc6146](https://datatracker.ietf.org/doc/html/rfc6146)
The main problem with DNS64 in Kubernetes is that the DNS service use to be implemented as a Deployment, so the Pods only can communicate via IPv6 with the upstream DNS server. This is one of the main problems why we need this solution, to be able to get rid from this hack we have to use in KIND , since the Github runners are IPv4 only.
We can just forward requests to a public DNS64 server, also CoreDNS has a DNS64 plugin
This is more tricky, one of the common solutions is to use an external gateway to perform NAT64, but that requires additional infrastructure and probable more cost and complexity, and is hard to implement on CI systems with KIND that run nested on VMs.
One of the nice things of Kubernetes, is that it is decoupled of the underlying infrastructure, in a Kubernetes IPv6-only cluster the family depends on the addresses assigned to the different API objects, so Pods, Services and Nodes only have IPv6 address and communicate using them, but the infrastructure can be dual-stack. Using VMs with with dual stack addresses can allow use to implement NAT64 in the host.
There are many implementations of Open Source NAT64 but I didn't find any of them that was able to fit my needs, in terms of simplicity, performance, dependencies, ...
Some time ago I hacked a solution proxying IPv6 on IPv4 but it was just that ... a hack. However, I've found out recently that android has a NAT64 implementation in eBPF and started to think more about this ...
The main problem is that we need to implement Stateful NAT64, and writing the NAT/conntrack logic is complex and hard to support, not mentioning that both NAT/conntrack systems are not synchronized so there can be collisions and packet drops :/
I also wanted this solution simple to troubleshoot and hermetic, so I remember my old days configuring routers, and I liked the existing solutions using a NAT64 interface.
With all of these ideas I came up with this solution that basically goes as:
-
The program runs as Daemonset all nodes
-
It configures a dummy interface named
nat64
by default
5: nat64: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether ca:a6:ab:76:fb:7c brd ff:ff:ff:ff:ff:ff
inet 169.254.64.0/24 scope global nat64
valid_lft forever preferred_lft forever
inet6 64:ff9b::/96 scope global
valid_lft forever preferred_lft forever
inet6 fe80::c8a6:abff:fe76:fb7c/64 scope link
valid_lft forever preferred_lft forever
-
This interface has assigned two subnets
-
The IPv6 one is the IPv4 in IPv6 prefx, the default is
64:ff9b::/96
per rfc6052 -
The IPv4 one is
169.254.64.0/24
, link-local also alleviates the risk of leaking traffic or overlapping.
-
-
The packets with IPv6 prefix that are directed are NAT64 stateless
-
Pod IPv6 saddr is replaced by one address in the IPv4 configured range
-
Destination IPv6 has the destination IP4 embedded
-
-
After the static NAT is performed, the packet goes through the kernel again and is MASQUERADE to the Internet with the IPv4 of the host, replacing the IPv4 from the
nat64
interface range. -
When the packet comes back, the MASQUERADE is reverted and the packet is destinted to the
nat64
interface where the static NAT64 is reverted.-
Source IPv6 address is the IPv4 in IPv6 address
-
Destination IPv6 address is the one we used in the step 4.
-
Just do kubectl apply -f https://raw.githubusercontent.com/aojea/nat64/main/install.yaml
Assuming you have checked out the repo and you are already in the repo folder
- Install kind cluster with IPv6 only
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
networking:
ipFamily: ipv6
nodes:
- role: control-plane
- role: worker
kind create cluster --name ipv6 --config kind-ipv6.yaml
- Build project (it already compiles the eBPF code too)
docker build . -t aojea/nat64:v0.1.0
- Preload the image in the kind cluster we just created
kind load docker-image aojea/nat64:v0.1.0 --name ipv6
- Install the nat64 daemonset
kubectl apply -f install.yaml
in case you already have it installed you can rollout restart the daemonset or just delete and create again
kubectl delete -f install.yaml && kubectl apply -f install.yaml
- Once it is installed you can test it by creating a pod and checking the connectivity to IPv4 sites using the NAT64 prefix:
$ kubectl run test --image k8s.gcr.io/e2e-test-images/agnhost:2.39 --command -- /agnhost netexec --http-port=8080
$ kubectl exec -it test bash
...
# UDP test
dig @64:ff9b::8.8.8.8 www.google.es
# TCP test
curl [64:ff9b::140.82.121.4]:80
This is far to be complete, features and suggestions are welcome:
- metrics: number of NAT64 translations: connection, packets, protocol, ...
- Right now the algorithm to map 6 to 4 is very simple, use the latest digit from the Pod IPv6 address, this limits us to 254 connection, is that enough?
- TCP and UDP checksum (fixed by @siwiutki)
- ICMP
- Testing, testing, ....
@siwiutki