-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static Egress IP #434
Comments
@murali-reddy I've solved this manually with a script I wrote. It routes all outgoing pod traffic that is not to another pod or service to the VIP and SNATs it to appear as the VIP. It uses a custom routing table, custom routing rule and a custom iptables chain. The iptables chain marks the appropriate packets to be forwarded via the routing table. Any packets that hit the FORWARD chain on the main interface are also marked, since that should only occur on the machine with the VIP. I'm not sure if kube-router internally uses the same concepts, but hopefully this script will help. It routes all pod traffic to a single VIP, so support for multiple egress IPs and a subset of pods would have to be figured out. #!/usr/bin/env bash
set -e
# Variables
ROUTE_ID="64"
ROUTE_TABLE="egress"
INTERFACE="eth0"
POD_CIDR="10.32.0.0/12"
SERVICE_CIDR="10.96.0.0/12"
VIP="10.0.2.100"
# SNAT only on VIP, forward only on non-VIP
if (ip -o addr show "${INTERFACE}" | grep " ${VIP}/32 "); then
iptables -t nat -I POSTROUTING -o "${INTERFACE}" -m mark --mark "${ROUTE_ID}" -j SNAT --to "${VIP}"
iptables -t mangle -A FORWARD -i "${INTERFACE}" -o "${INTERFACE}" -j MARK --set-mark "${ROUTE_ID}/${ROUTE_ID}"
else
echo "${ROUTE_ID} ${ROUTE_TABLE}" >> /etc/iproute2/rt_tables
ip route add default via "${VIP}" dev "${INTERFACE}" table "${ROUTE_TABLE}"
ip rule add fwmark "${ROUTE_ID}" table "${ROUTE_TABLE}"
ip route flush cache
fi
iptables -t mangle -N EGRESS
iptables -t mangle -A EGRESS -d "${POD_CIDR}" -j RETURN
iptables -t mangle -A EGRESS -d "${SERVICE_CIDR}" -j RETURN
iptables -t mangle -A EGRESS -s "${POD_CIDR}" -j MARK --set-mark "${ROUTE_ID}/${ROUTE_ID}"
iptables -t mangle -A PREROUTING -j EGRESS |
thanks for sharing this @steven-sheehy
Which VIP? Is it |
@murali-reddy Yes, it is a single layer-2 VIP created by kube-keepalived-vip and used as an ExternalIP on each service. So single VIP for all nodes, but only one machine at a time has the VIP on its main interface doing the SNAT. The fact that it's a VIP though is not required, it's just a requirement for us that we use the same IP for both ingress and egress. |
@murali-reddy I've refactored it and packaged our egress script into a reusable daemonset and opened source it as kube-egress. It works for our needs but probably for most people it's not as useful since it can't support multiple IPs or restrict by pod/service/namespace. That's only something that a project like kube-router or kube-proxy can provide since they operate at a higher level. |
thanks @steven-sheehy for sharing. Looks neat. Does it works across the subnets? I mean if traffic from pod A running on node A is sent to the node B owning the VIP, if node A and node B in different subnets does this work? |
@murali-reddy If you mean the pod subnet specifically for that node allocated by the CNI, then yes that works correctly. I use the overall cluster pod subnet and only use it to to exclude pod to pod traffic, not for routing. Routing tables are only updated to forward via VIP. So as long as VIP is routeable it works across all nodes. |
When i first decided to use kube-router I was certain in my head that this is a core feature :) In any case i think this would be really useful applied in a similar way that DSR is. Annotate service with a "egress-vip-enabled" and service external ip (and/or clusterip maybe) would be used for the pods egress traffic |
Any news here? This feature will be really great |
I made recently a custom version of kube-router with such feature. The commit, which adds the static pod egress feature is here: Trojan295@d48fd0a (ignore the parts with gitlab-ci and my makefile adjustments) It basically works by annotating the pod with metadata:
name: egress-example-pod
annotations:
egressSNAT.IPAddress: 1.1.1.1
egressSNAT.FixedPorts: 27015,udp:27015 such annotations would make the egress packets from the pod exiting the node with source IP: 1.1.1.1 and preserve the source port 27015, when doing SNAT Under the hood egress packets, which go to the internet are SNATed in nat POSTROUTING chain. As I'm running on OVH vRack (with MetalLB as LoadBalancer service provider) and I needed multiple routing tables, so you can see there also @murali-reddy @noillir WDYT about this? |
@Trojan295 quick question. How does return traffic work? If two pods on different nodes are annotated with same |
That's actually a good question. I would need to perform a few experiments, cause now I only tried it with a single pod per IP. In my setup with MetalLB running in ARP mode one of the machines, where a pod for the service is present, receives all the traffic for an IP and then distributes it. Now I see, that there can be problems, when the pod runs on node A and node B receives the IP traffic. Node A would know about the SNATing, but node B, which received the response, wouldn't have the NAT table to correctly process the packets. So looks the limitation is here, that the pod has to run on the node, which receives the traffic for this IP... |
@murali-reddy , yes I confirmed it, it doesn't work with pods on different nodes. Only the pod on the node, which handles the ingress traffic for this IP can access external services. I think some solution like this https://github.com/nirmata/kube-static-egress-ip with an node acting as a proxy for the traffic would be needed. |
Yes @Trojan295 |
I need some solution where Egress IP should be same as loadbalancer IP. That way I can have a single IP for a set of pods belonging to a service for both incoming packet and outgoing packet. Also an option to have different IPs for incoming and outgoing . I saw kube-static-egress-ip. It works for outgoing traffic but fails when we want to use MetalLB loadbalancer IP for SNAT too. |
Any progress done ? I am currently trying to host a secondary DNS on a k8s-metalLB in BGP mode It would be for DNS replication essential to have the same IP address for egress and ingress. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue was closed because it has been stale for 5 days with no activity. |
Cilium has an egress gateway https://docs.cilium.io/en/stable/network/egress-gateway/. |
It would be useful if kube-router supported making egress pod traffic appear as a specified static IP. Normally egress traffic gets a SNAT to appear as the node IP when it leaves the cluster. Instead, kube-router would intercept this traffic on each node and route, proxy or NAT it to appear as a VIP. This VIP would already be routed to one of the nodes via kube-router, MetalLB or a cloud provider's LB. In some cloud providers, this static egress IP is handled by a NAT gateway, but that is not feasible in a bare metal environment.
Use Case:
A lot of devices that pods connect to that are external to Kubernetes have IP based ACLs to restrict incoming traffic for security reasons and bandwidth limitations. For example, at our company we connect to routers and switches to retrieve SNMP information and they only allow a single IP to connect so that rogue machines can't cause a DOS attack on critical network infrastructure. With multi-node Kubernetes cluster, the IP connecting to these devices could be any one of the nodes and would be changing as nodes come and go. Our use case is very simple, we only have a single VIP for incoming and we want all egress traffic to also appear as that same VIP.
Proposal:
Specify the egress VIP via an annotation on either the service or the pod/deployment. A service is generally for incoming traffic, but it provides a convenient grouping of pods via label matching that is preferable to hardcoding at the pod level. A harder, but possibly cleaner solution would be to create a Custom Resource Definition to represent it like EgressService.
This was discussed in the Slack channel.
The text was updated successfully, but these errors were encountered: