-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sig-network: Add egress-source-ip-support KEP #1105
Conversation
Welcome @mkimuram! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mkimuram The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As a user of Kubernetes, I have some pods which require an access to different databases that restrict access by source ip and exists outside the k8s cluster. | ||
So, some pods which require database access need a specific egress source IP when sending packets to the database, and other pods need another specific egress source IP. | ||
|
||
### Implementation Details/Notes/Constraints [optional] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should explain how this will work on the nodes themselves. For example (and I haven't looked too deep into the implementation) does your implementation assign all the egress-source-ips to the node that the pod lives on? If so, how would that work in cloud environments that have tighter constraints on the IPs available to nodes. That kind of thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment.
It assigns each IP on one of the nodes by leveraging keepalived-vip (https://github.com/kubernetes-retired/contrib/tree/master/keepalived-vip), then forward packets from the node that pod lives on to the node that have the specific IP by iptables rule and routing table.
|
||
## Proposal | ||
|
||
Expose an egress API to user like below to allow user to assign a static egress source IP to specific pod(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you proposing a Custom Resource Definition, or an actual API resource here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my PoC implementation, it uses CRD, for it is implemented as a k8s operator to reconcile the iptables rules and routing tables on all nodes. However, I think that we still have a choice to define k8s API or keep it as CRD. .
|
||
## Summary | ||
|
||
Egress source IP is a feature to assign a static egress source IP for packets from a pod to outside k8s cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to limit this to a pod or something more stable like a label selector? Naming pods explicitly in resources can be fragile -- pod names are meant to be temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to define what "outside the k8s cluster" means. Where is the boundary? Is it when the packet leaves the node, some notion of network, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. As I mentioned in the SIG meeting. I meant "within private network", network like across cloud was not within my scope. I will update the KEP.
Also, label based approach sounds good, because the use case includes multiple pod to IP mapping case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(TLDR; I'm newbie, and you are allowed to ignore this comment )
I would say that I understand the original term of "outside of k8s cluster", as destination seem to hint that it is outside the k8s cluster pod CIDR and service CIDR.
If the destination was inside the kubernetes cluster, this wouldn't make much sense IMO.
(ie destination could use a NetworkPolicy for securing it)
From my understanding, the keepalived is just yet another VIP that acts as an service type LoadBalancer, which kube-egress needs to be able to bind + using the iptables and routing entries to do the right thing for ensuring pod's get's source ip set to keepalive VIP before leaving the node.
(Note; new in here, newbie, I might be totally off, just wanted to leave a note, but coming from a world where I used to pet servers in a datacenter, keepalived is a good friend for providing HA for a load balancer IP that is put into DNS as it uses software defined VRRP to ensure VIP is always available. From sig-network meeting and getting [ei]ngress towards such a keepalive VIP , it seems someone has already been playing around/been on an adventure to get it to work together with cloud providers using service type LoadBalancer: https://github.com/munnerz/keepalived-cloud-provider )
And also mention during the sig-network meeting if running in cloud, you could probably achieve the stories by getting a dedicated source ip for the whole cluster using platform/cloud provider dependent configuration of [ie]ngress to/from the kubernetes cluster, that would solve the use cases of reaching the destination that requires a whitelist of source IP.
KEP is targeting set of PODs, using label selector makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have made you confuse by not putting details on what should be done and how it is achieved in my PoC implementation. I will update the KEP to clarify them.
However, short explanation will be:
My goal is not to make PodIP or ClusterIP to be visible to applications not running on k8s cluster.
Instead, the goal is to make certain pods' source IPs to be fixed one to applications not running on k8s cluster. To do that I'm thinking about making N:1 map of Pods and VIP when PodIP is SNATed. (Actually, VIP could be assigned by anyway as k8s LoadBalancer allows it, but my PoC implementation just used keepalived-vip.)
My intention of excluding usecases like cross-cloud above, was to exclude scenarios like there are another SNAT, VPN, and so on between k8s cluster and applications, which will require much more works than just doing such a mapping when going out from k8s cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are use cases for both "outside the cluster but within the private network" and "all the way out to the internet". (eg, take any of the user stories below and assume that the kubernetes cluster is in a public cloud but the database server is not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somethings popping up to my mind:
I wonder if it is something that could be done on service IP's, they are already VIPs inside the k8s cluster. So the SNATing you already do in PoC, could it be applied to kube-proxy?
How would this work for IPVS?
Ie is it okey to only support iptabels and not IPVS?
(I think I have a vague memory of maybe @thockin mentioning about a customer who wanted likewise support for egress on service vips during the sig-network call?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are use cases for both "outside the cluster but within the private network" and "all the way out to the internet". (eg, take any of the user stories below and assume that the kubernetes cluster is in a public cloud but the database server is not)
O.K. Let's also consider "all the way out to the internet". And if needed, let's set another milestone to achieve it.
is it okey to only support iptabels and not IPVS?
I think that it is a good idea to allow different implementation to forward packets for egress as kube-proxy do. Also, we might be able to leverage service
as a mechanism to trace the PodIP. I will add description on this to "Design Details" section of this KEP to discuss this in detail.
metadata: | ||
name: example-pod1-egress | ||
spec: | ||
ip: 192.168.122.222 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the restrictions/instructions for this IP be? Any IP in the node CIDR?
Also, is this meant to be sharable, or unique per-Egress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is "Any IP in the node CIDR" and is sharable. I will add this to KEP.
This is the calico feature I mentioned on the last sig-network call: |
|
||
### Goals | ||
|
||
Provide users with an official and common way to assign a static egress source IP for packets from a pod to outside k8s cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a static egress source IP for packages from one or more pods" isn't it? (eg, Story 2 below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it.
name: pod1 | ||
``` | ||
|
||
PoC implementation is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is the expectation of how the KEP'ed version of the feature would differ from the PoC?
More specifically, if this is something that can already be implemented entirely outside of kubernetes, then does it benefit from being moved into kubernetes?
Are you expecting that kubernetes would adopt essentially the PoC implementation, or merely the API surrounding it? Would this be something that would be core to Kubernetes, or would it be implemented by network plugins (who might be able to optimize in various ways that a generic implementation could not)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically, if this is something that can already be implemented entirely outside of kubernetes, then does it benefit from being moved into kubernetes?
Benefits that I expects are:
- Make it work with any CNI drivers. (Some CNI drivers won't work well just by my current PoC implementation)
- Define stable API that can keep compatible with future k8s versions (I won't stick to make it a core k8s API, as long as, it can keep computability. As volume snapshot feature is implemented as CRD.)
- Make use of existing k8s mechanisms like kube-proxy and service, if possible and useful
Then, it will provide users with the same UX across any k8s cluster and it will decrease developers burden to maintain the compatibility for this feature.
|
||
## Motivation | ||
|
||
In k8s, egress traffic has its source IP translated (SNAT) to appear as the node IP when it leaves the cluster. However, there are many devices and software that use IP based ACLs to restrict incoming traffic for security reasons and bandwidth limitations. As a result, this kind of ACLs outside k8s cluster will block packets from the pod, which causes a connectivity issue. To resolve this issue, we need a feature to assign a particular static egress source IP to one or more particular pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In k8s, egress traffic has its source IP translated (SNAT) to appear as the node IP when it leaves the cluster
Is this defined somewhere? My understanding is that many network plugins can do this, but it's not mandated or enforced by Kubernetes itself. You could set up a cluster to maintain original source IP address if so desired.
In that sense, I feel like this is adding an additional responsibility to Kubernetes which currently exists on the other side of the plugin boundary. Are we sure that's the right thing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback.
Is this defined somewhere? My understanding is that many network plugins can do this, but it's not mandated or enforced by Kubernetes itself.
You are right that this is not defined as a k8s network model here, and this is just a behavior of some CNI plugins’ implementation. So, I will rephrase it to make it more accurate in KEP.
My understanding is that many network plugins can do this, but it's not mandated or enforced by Kubernetes itself.
You could set up a cluster to maintain original source IP address if so desired.
In my understanding, some plugins can directly send packets to outside k8s cluster from Pods, and they can assign a particular PodIP to a Pod, in their own ways. However, this feature still has values, which I believe existing plugins alone won’t solve, like below:
- Compatibility and Interoperability: It provides a common interface to achieve this for any CNI plugins or cloud providers
- Mulitple Pod usecase: It will also solve usecases like Story 2, without adding many IPs to ACL.
- Across Internet usecases: It will also add a nob to expand ability to handle “(2) internet that is outside the private network where k8s cluster is running” usecase
In that sense, I feel like this is adding an additional responsibility to Kubernetes which currently exists on the other side of the plugin boundary. Are we sure that's the right thing here?
I hope above justifies the addition of the new responsibility to k8s.
This won’t change the k8s network model, but add a common way to solve common use cases, just like service and ingress do.
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Wondering if there has been any progress on this? |
Kubernetes can define an The implementation of an egress-ip function will be dependent on the network environment, such as the CNI-plugin, external routers and network security policies. A generic in-cluster solution can not be provided and therefore Kubernetes should not implement this function IMO. And BTW, the PoC refered in this PR and the more mature https://github.com/nirmata/kube-static-egress-ip are not generic. |
Agree that an Egress object in K8s with cloud-provider specific implementation would be ideal. Similar to how Service type LoadBalancer allows each cloud-provider to assign IP addresses in their own ways. But just like the Service object which specifies which pods should receive packets from that ingress external IP, we need some native K8s object which specifies which pods should be able to send out through the external egress IP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went to the bottom of my email list and found this - it wasn't assigned to me so it never got priority. Sorry.
Thoughts.
-
Does the IP have to be a node's IP or can it be any IP in the network? Can it be a public IP? Can it be different for different destinations (e.g. an RFC-1918 IP if dest is RFC-1918, but a public IP if dest is not RFC-1918)?
-
Can or must the user ask for a specific IP or is it allocated by the controller?
-
What happens if the use asks for an IP that is already in use?
-
What happens if two Egress resources select the same pod, which IP is used? Perhaps this should be based on ServiceAccount or something else that is 1:1 instead?
-
I understand how you can make a single pod work. How can you make multiple pods work?
-
Overall if something like this is to be developed, I think it should be a CRD and live outside the core for the forseeable future. I'd want to see several implementations to explore the intricacies of the idea.
That said, this is a few months old - is it something you still want to pursue?
@thockin Thanks for sharing those thoughts which are definitely something to be considered. This feature is still something we need and wish to pursue. |
Thank you for comments and sorry for late response. I'm thinking about making this feature available outside k8s, as it would require CNI-plugin or cloud provider specific implementation, which k8s community would like to avoid adding to k8s. (For me, this feature isn't necessary to be in k8s, as long as it is a well-maintained common way. Some might not agree with it, though.) So, improving project like kube-static-egress-ip would be one way to solve this goal. (Thank you for sharing information on the project. @uablrek) Another idea is to extend submariner's scope from cluster-to-cluster to cluster-to-non-cluster. The summary of my idea is making pod accessible via podIP from outside cluster without NAT, instead of assigning 1:1 static NAT accessible from outside cluster. (It's still just an idea and I haven't discussed it with submariner community yet, though). By doing this, it would allow:
However, ability to assign specific IP is missing in this case. So, we need to think about a common way to solve it for some use cases, like ip-based external firewall where the IPs are expected to be specific fixed values. This feature would be challenging as @thockin pointed out, and it seems to be a kind of feature that would be hardly accepted to k8s. |
Thanks for sharing the details of your mechanism @mkimuram. In part, this shows the complexity required to ensure that egressing packets from a given pod-type have a static source IP. A native approach would enlist the help of iptables at the node level to SNAT the packets going out. The key thing to think about is what entity creates the right iptables rule - an independent operator or possibly kube-proxy? |
@thockin: A common requirement in this regard is the need for symmetry: When a LoadBalancer service is used, an external client sends packets to the LoadBalancer IP. When pods that are part of that service initiate packets towards an external server, the packets should go out using the LoadBalancer IP as the source IP. |
Yes. I meant to make k8s operator, and possibly k8s, handle this complexity.
External IPs are already used by gateway server, so we won't be able to assign the same external IP to k8s's load balancer when the above idea is used. However, if we create a per external IP forwarder pod and run ssh client to remote forward ssh tunnel in it, packets from external servers could be sent to client pods via corresponding external IPs. In that case, the source IP of the packets won't be the clusterIP of the service of the external server's forwarder pod, instead it will be the IP of the per external IP forwarder pod. So, it won't be symmetry in this point. I will consider if there will be a better way. |
Let me also share the detailed idea of preserving source IPs of reverse access. By using remote ssh tunnel, source IP of packets from external server to each pod can be each external server's fwd-pod's IP. So, each pod will be able to distinguish which external server sent the packets. (Each pod will be able to access to the right external server via the fwd-pod's IP, although the IP might change.) This won't be a perfect solultion, but I couldn't find a good way to send a packet from service IP. Note that in this idea, it creates a tunnel by using ssh client in forwarder pod, instead of creating it in per external IP pod, as I mentioned in my previous comment. This is because remote ssh tunnel doesn't preserve source IP, so it's too late to find original source IP after it is tunneled to pod network. By doing so, iptables rules need to be updated in the gateway server, which would make management complex, but I guess that it is still possible if we develop k8s operator to handle it. Set up the same configuration in #1105 (comment) . [On client1 (
[On client2 (
[On k8s]
[On forwarder pod (
(To make this work, sshd on gateway node needs to allow external access for remote forward, by setting [On Gateway node]
2.Add IP tables rule to forward packets to a particular port depending on the source IP in namespace net2 (
|
Just for your information. I've implemented POC code of the above idea, here. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
No description provided.