Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static Egress IP #434

Closed
steven-sheehy opened this issue May 11, 2018 · 18 comments
Closed

Static Egress IP #434

steven-sheehy opened this issue May 11, 2018 · 18 comments

Comments

@steven-sheehy
Copy link

It would be useful if kube-router supported making egress pod traffic appear as a specified static IP. Normally egress traffic gets a SNAT to appear as the node IP when it leaves the cluster. Instead, kube-router would intercept this traffic on each node and route, proxy or NAT it to appear as a VIP. This VIP would already be routed to one of the nodes via kube-router, MetalLB or a cloud provider's LB. In some cloud providers, this static egress IP is handled by a NAT gateway, but that is not feasible in a bare metal environment.

Use Case:
A lot of devices that pods connect to that are external to Kubernetes have IP based ACLs to restrict incoming traffic for security reasons and bandwidth limitations. For example, at our company we connect to routers and switches to retrieve SNMP information and they only allow a single IP to connect so that rogue machines can't cause a DOS attack on critical network infrastructure. With multi-node Kubernetes cluster, the IP connecting to these devices could be any one of the nodes and would be changing as nodes come and go. Our use case is very simple, we only have a single VIP for incoming and we want all egress traffic to also appear as that same VIP.

Proposal:
Specify the egress VIP via an annotation on either the service or the pod/deployment. A service is generally for incoming traffic, but it provides a convenient grouping of pods via label matching that is preferable to hardcoding at the pod level. A harder, but possibly cleaner solution would be to create a Custom Resource Definition to represent it like EgressService.

This was discussed in the Slack channel.

@steven-sheehy
Copy link
Author

steven-sheehy commented Sep 4, 2018

@murali-reddy I've solved this manually with a script I wrote. It routes all outgoing pod traffic that is not to another pod or service to the VIP and SNATs it to appear as the VIP. It uses a custom routing table, custom routing rule and a custom iptables chain. The iptables chain marks the appropriate packets to be forwarded via the routing table. Any packets that hit the FORWARD chain on the main interface are also marked, since that should only occur on the machine with the VIP.

I'm not sure if kube-router internally uses the same concepts, but hopefully this script will help. It routes all pod traffic to a single VIP, so support for multiple egress IPs and a subset of pods would have to be figured out.

#!/usr/bin/env bash
set -e

# Variables
ROUTE_ID="64"
ROUTE_TABLE="egress"
INTERFACE="eth0"
POD_CIDR="10.32.0.0/12"
SERVICE_CIDR="10.96.0.0/12"
VIP="10.0.2.100"

# SNAT only on VIP, forward only on non-VIP
if (ip -o addr show "${INTERFACE}" | grep " ${VIP}/32 "); then
  iptables -t nat -I POSTROUTING -o "${INTERFACE}" -m mark --mark "${ROUTE_ID}" -j SNAT --to "${VIP}"
  iptables -t mangle -A FORWARD -i "${INTERFACE}" -o "${INTERFACE}" -j MARK --set-mark "${ROUTE_ID}/${ROUTE_ID}"
else 
  echo "${ROUTE_ID} ${ROUTE_TABLE}" >> /etc/iproute2/rt_tables
  ip route add default via "${VIP}" dev "${INTERFACE}" table "${ROUTE_TABLE}"
  ip rule add fwmark "${ROUTE_ID}" table "${ROUTE_TABLE}"
  ip route flush cache
fi

iptables -t mangle -N EGRESS
iptables -t mangle -A EGRESS -d "${POD_CIDR}" -j RETURN
iptables -t mangle -A EGRESS -d "${SERVICE_CIDR}" -j RETURN
iptables -t mangle -A EGRESS -s "${POD_CIDR}" -j MARK --set-mark "${ROUTE_ID}/${ROUTE_ID}"
iptables -t mangle -A PREROUTING -j EGRESS

@murali-reddy
Copy link
Member

thanks for sharing this @steven-sheehy

It routes all outgoing pod traffic that is not to another pod or service to the VIP and SNATs it to appear as the VIP.

Which VIP? Is it specified static IP like you mentioned in earlier comment? If single VIP on all nodes work?

@steven-sheehy
Copy link
Author

@murali-reddy Yes, it is a single layer-2 VIP created by kube-keepalived-vip and used as an ExternalIP on each service. So single VIP for all nodes, but only one machine at a time has the VIP on its main interface doing the SNAT. The fact that it's a VIP though is not required, it's just a requirement for us that we use the same IP for both ingress and egress.

@steven-sheehy
Copy link
Author

@murali-reddy I've refactored it and packaged our egress script into a reusable daemonset and opened source it as kube-egress. It works for our needs but probably for most people it's not as useful since it can't support multiple IPs or restrict by pod/service/namespace. That's only something that a project like kube-router or kube-proxy can provide since they operate at a higher level.

@murali-reddy
Copy link
Member

thanks @steven-sheehy for sharing. Looks neat. Does it works across the subnets? I mean if traffic from pod A running on node A is sent to the node B owning the VIP, if node A and node B in different subnets does this work?

@steven-sheehy
Copy link
Author

@murali-reddy If you mean the pod subnet specifically for that node allocated by the CNI, then yes that works correctly. I use the overall cluster pod subnet and only use it to to exclude pod to pod traffic, not for routing. Routing tables are only updated to forward via VIP. So as long as VIP is routeable it works across all nodes.

@noillir
Copy link
Contributor

noillir commented Apr 26, 2019

When i first decided to use kube-router I was certain in my head that this is a core feature :)

In any case i think this would be really useful applied in a similar way that DSR is. Annotate service with a "egress-vip-enabled" and service external ip (and/or clusterip maybe) would be used for the pods egress traffic

@HaveFun83
Copy link

Any news here? This feature will be really great

@Trojan295
Copy link

Trojan295 commented May 18, 2020

I made recently a custom version of kube-router with such feature. The commit, which adds the static pod egress feature is here: Trojan295@d48fd0a (ignore the parts with gitlab-ci and my makefile adjustments)

It basically works by annotating the pod with egressSNAT.IPAddress. Optionally you can define egressSNAT.FixedPorts to don't perform SPAT on some ports.

metadata:
  name: egress-example-pod
  annotations:
    egressSNAT.IPAddress: 1.1.1.1
    egressSNAT.FixedPorts: 27015,udp:27015

such annotations would make the egress packets from the pod exiting the node with source IP: 1.1.1.1 and preserve the source port 27015, when doing SNAT

Under the hood egress packets, which go to the internet are SNATed in nat POSTROUTING chain.

As I'm running on OVH vRack (with MetalLB as LoadBalancer service provider) and I needed multiple routing tables, so you can see there also ip rule calls for using proper routing tables for routing the packets marked in mangle PREROUTING. This isn't necessary, when the default routing table can be used for egress traffic.

@murali-reddy @noillir WDYT about this?

@murali-reddy
Copy link
Member

@Trojan295 quick question. How does return traffic work? If two pods on different nodes are annotated with same egressSNAT.IPAddress, return traffic with destination as egressSNAT.IPAddress how does it reach correct node?

@Trojan295
Copy link

That's actually a good question. I would need to perform a few experiments, cause now I only tried it with a single pod per IP.

In my setup with MetalLB running in ARP mode one of the machines, where a pod for the service is present, receives all the traffic for an IP and then distributes it.
I'm using the same IP, which MetalLB assigned to LB service as egressSNAT.IPAddress in pod definitions behind the service. So in a single pod setup the node, which receives the traffic is the same node, on which the pod exists (that's how MetalLB works here).

Now I see, that there can be problems, when the pod runs on node A and node B receives the IP traffic. Node A would know about the SNATing, but node B, which received the response, wouldn't have the NAT table to correctly process the packets.

So looks the limitation is here, that the pod has to run on the node, which receives the traffic for this IP...

@Trojan295
Copy link

@murali-reddy , yes I confirmed it, it doesn't work with pods on different nodes. Only the pod on the node, which handles the ingress traffic for this IP can access external services.

I think some solution like this https://github.com/nirmata/kube-static-egress-ip with an node acting as a proxy for the traffic would be needed.

@murali-reddy
Copy link
Member

Yes @Trojan295 kube-static-egress-ip is dedicated solution for it. Scalability, failover, cluster spanning multile zones/subnets etc need to be thought out as well.

@shubham2110
Copy link

I need some solution where Egress IP should be same as loadbalancer IP. That way I can have a single IP for a set of pods belonging to a service for both incoming packet and outgoing packet.

Also an option to have different IPs for incoming and outgoing .

I saw kube-static-egress-ip. It works for outgoing traffic but fails when we want to use MetalLB loadbalancer IP for SNAT too.

@Zetanova
Copy link

Any progress done ?

I am currently trying to host a secondary DNS on a k8s-metalLB in BGP mode
With the metallb.universe.tf/allow-shared-ip: "powerdns" workaround,
it is possible to use udp & tcp port on the same IP address.

It would be for DNS replication essential to have the same IP address for egress and ingress.

@github-actions
Copy link

github-actions bot commented Sep 6, 2023

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 6, 2023
@github-actions
Copy link

This issue was closed because it has been stale for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 11, 2023
@r0123456789
Copy link

Cilium has an egress gateway https://docs.cilium.io/en/stable/network/egress-gateway/.
It would be wonderful if kube-router developed something similar.
I'm gona change to Cilium because of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants