sig-network: Add egress-source-ip-support KEP #1105

mkimuram · 2019-06-13T19:27:41Z

No description provided.

k8s-ci-robot · 2019-06-13T19:27:42Z

Welcome @mkimuram!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-06-13T19:28:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkimuram
To complete the pull request process, please assign dcbw
You can assign the PR to them by writing /assign @dcbw in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-network/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dcbw · 2019-06-13T21:00:17Z

keps/sig-network/20190613-egress-source-ip-support.md

+As a user of Kubernetes, I have some pods which require an access to different databases that restrict access by source ip and exists outside the k8s cluster. 
+So, some pods which require database access need a specific egress source IP when sending packets to the database, and other pods need another specific egress source IP.
+
+### Implementation Details/Notes/Constraints [optional]


Should explain how this will work on the nodes themselves. For example (and I haven't looked too deep into the implementation) does your implementation assign all the egress-source-ips to the node that the pod lives on? If so, how would that work in cloud environments that have tighter constraints on the IPs available to nodes. That kind of thing.

Thank you for your comment.

It assigns each IP on one of the nodes by leveraging keepalived-vip (https://github.com/kubernetes-retired/contrib/tree/master/keepalived-vip), then forward packets from the node that pod lives on to the node that have the specific IP by iptables rule and routing table.

dcbw · 2019-06-13T21:02:47Z

keps/sig-network/20190613-egress-source-ip-support.md

+
+## Proposal
+
+Expose an egress API to user like below to allow user to assign a static egress source IP to specific pod(s).


Are you proposing a Custom Resource Definition, or an actual API resource here?

In my PoC implementation, it uses CRD, for it is implemented as a k8s operator to reconcile the iptables rules and routing tables on all nodes. However, I think that we still have a choice to define k8s API or keep it as CRD. .

bowei · 2019-06-13T21:43:11Z

keps/sig-network/20190613-egress-source-ip-support.md

+
+## Summary
+
+Egress source IP is a feature to assign a static egress source IP for packets from a pod to outside k8s cluster.


Do we want to limit this to a pod or something more stable like a label selector? Naming pods explicitly in resources can be fragile -- pod names are meant to be temporary.

It would also be good to define what "outside the k8s cluster" means. Where is the boundary? Is it when the packet leaves the node, some notion of network, etc?

Good point. As I mentioned in the SIG meeting. I meant "within private network", network like across cloud was not within my scope. I will update the KEP.

Also, label based approach sounds good, because the use case includes multiple pod to IP mapping case.

(TLDR; I'm newbie, and you are allowed to ignore this comment )

I would say that I understand the original term of "outside of k8s cluster", as destination seem to hint that it is outside the k8s cluster pod CIDR and service CIDR.

If the destination was inside the kubernetes cluster, this wouldn't make much sense IMO.
(ie destination could use a NetworkPolicy for securing it)

From my understanding, the keepalived is just yet another VIP that acts as an service type LoadBalancer, which kube-egress needs to be able to bind + using the iptables and routing entries to do the right thing for ensuring pod's get's source ip set to keepalive VIP before leaving the node.

(Note; new in here, newbie, I might be totally off, just wanted to leave a note, but coming from a world where I used to pet servers in a datacenter, keepalived is a good friend for providing HA for a load balancer IP that is put into DNS as it uses software defined VRRP to ensure VIP is always available. From sig-network meeting and getting [ei]ngress towards such a keepalive VIP , it seems someone has already been playing around/been on an adventure to get it to work together with cloud providers using service type LoadBalancer: https://github.com/munnerz/keepalived-cloud-provider )

And also mention during the sig-network meeting if running in cloud, you could probably achieve the stories by getting a dedicated source ip for the whole cluster using platform/cloud provider dependent configuration of [ie]ngress to/from the kubernetes cluster, that would solve the use cases of reaching the destination that requires a whitelist of source IP.

KEP is targeting set of PODs, using label selector makes sense.

I might have made you confuse by not putting details on what should be done and how it is achieved in my PoC implementation. I will update the KEP to clarify them.

However, short explanation will be:

My goal is not to make PodIP or ClusterIP to be visible to applications not running on k8s cluster.
Instead, the goal is to make certain pods' source IPs to be fixed one to applications not running on k8s cluster. To do that I'm thinking about making N:1 map of Pods and VIP when PodIP is SNATed. (Actually, VIP could be assigned by anyway as k8s LoadBalancer allows it, but my PoC implementation just used keepalived-vip.)

My intention of excluding usecases like cross-cloud above, was to exclude scenarios like there are another SNAT, VPN, and so on between k8s cluster and applications, which will require much more works than just doing such a mapping when going out from k8s cluster.

I think there are use cases for both "outside the cluster but within the private network" and "all the way out to the internet". (eg, take any of the user stories below and assume that the kubernetes cluster is in a public cloud but the database server is not)

Somethings popping up to my mind:

I wonder if it is something that could be done on service IP's, they are already VIPs inside the k8s cluster. So the SNATing you already do in PoC, could it be applied to kube-proxy?
How would this work for IPVS?
Ie is it okey to only support iptabels and not IPVS?

(I think I have a vague memory of maybe @thockin mentioning about a customer who wanted likewise support for egress on service vips during the sig-network call?)

I think there are use cases for both "outside the cluster but within the private network" and "all the way out to the internet". (eg, take any of the user stories below and assume that the kubernetes cluster is in a public cloud but the database server is not)

O.K. Let's also consider "all the way out to the internet". And if needed, let's set another milestone to achieve it.

is it okey to only support iptabels and not IPVS?

I think that it is a good idea to allow different implementation to forward packets for egress as kube-proxy do. Also, we might be able to leverage service as a mechanism to trace the PodIP. I will add description on this to "Design Details" section of this KEP to discuss this in detail.

vllry · 2019-06-16T19:26:05Z

keps/sig-network/20190613-egress-source-ip-support.md

+metadata:
+  name: example-pod1-egress
+spec:
+  ip: 192.168.122.222


What would the restrictions/instructions for this IP be? Any IP in the node CIDR?

Also, is this meant to be sharable, or unique per-Egress?

It is "Any IP in the node CIDR" and is sharable. I will add this to KEP.

bjhaid · 2019-06-16T23:56:52Z

This is the calico feature I mentioned on the last sig-network call:

https://docs.projectcalico.org/v3.7/reference/cni-plugin/configuration#requesting-a-specific-ip-address

danwinship · 2019-06-17T19:51:12Z

keps/sig-network/20190613-egress-source-ip-support.md

+
+### Goals
+
+Provide users with an official and common way to assign a static egress source IP for packets from a pod to outside k8s cluster.


"a static egress source IP for packages from one or more pods" isn't it? (eg, Story 2 below)

I will fix it.

danwinship · 2019-06-17T19:56:53Z

keps/sig-network/20190613-egress-source-ip-support.md

+  name: pod1
+```
+
+PoC implementation is 


So what is the expectation of how the KEP'ed version of the feature would differ from the PoC?

More specifically, if this is something that can already be implemented entirely outside of kubernetes, then does it benefit from being moved into kubernetes?

Are you expecting that kubernetes would adopt essentially the PoC implementation, or merely the API surrounding it? Would this be something that would be core to Kubernetes, or would it be implemented by network plugins (who might be able to optimize in various ways that a generic implementation could not)?

More specifically, if this is something that can already be implemented entirely outside of kubernetes, then does it benefit from being moved into kubernetes?

Benefits that I expects are:

Make it work with any CNI drivers. (Some CNI drivers won't work well just by my current PoC implementation)

Define stable API that can keep compatible with future k8s versions (I won't stick to make it a core k8s API, as long as, it can keep computability. As volume snapshot feature is implemented as CRD.)

Make use of existing k8s mechanisms like kube-proxy and service, if possible and useful

Then, it will provide users with the same UX across any k8s cluster and it will decrease developers burden to maintain the compatibility for this feature.

mkimuram · 2019-06-18T22:36:59Z

@dcbw @bowei @norrs @vllry @bjhaid @danwinship

Thank you for your feedback. I've updated the KEP based on feedback.
Please check it.

caseydavenport · 2019-06-18T23:22:25Z

keps/sig-network/20190613-egress-source-ip-support.md

+
+## Motivation
+
+In k8s, egress traffic has its source IP translated (SNAT) to appear as the node IP when it leaves the cluster. However, there are many devices and software that use IP based ACLs to restrict incoming traffic for security reasons and bandwidth limitations. As a result, this kind of ACLs outside k8s cluster will block packets from the pod, which causes a connectivity issue. To resolve this issue, we need a feature to assign a particular static egress source IP to one or more particular pods.


In k8s, egress traffic has its source IP translated (SNAT) to appear as the node IP when it leaves the cluster

Is this defined somewhere? My understanding is that many network plugins can do this, but it's not mandated or enforced by Kubernetes itself. You could set up a cluster to maintain original source IP address if so desired.

In that sense, I feel like this is adding an additional responsibility to Kubernetes which currently exists on the other side of the plugin boundary. Are we sure that's the right thing here?

@caseydavenport

Thank you for your feedback.

Is this defined somewhere? My understanding is that many network plugins can do this, but it's not mandated or enforced by Kubernetes itself.

You are right that this is not defined as a k8s network model here, and this is just a behavior of some CNI plugins’ implementation. So, I will rephrase it to make it more accurate in KEP.

My understanding is that many network plugins can do this, but it's not mandated or enforced by Kubernetes itself.
You could set up a cluster to maintain original source IP address if so desired.

In my understanding, some plugins can directly send packets to outside k8s cluster from Pods, and they can assign a particular PodIP to a Pod, in their own ways. However, this feature still has values, which I believe existing plugins alone won’t solve, like below:

Compatibility and Interoperability: It provides a common interface to achieve this for any CNI plugins or cloud providers

Mulitple Pod usecase: It will also solve usecases like Story 2, without adding many IPs to ACL.

Across Internet usecases: It will also add a nob to expand ability to handle “(2) internet that is outside the private network where k8s cluster is running” usecase

In that sense, I feel like this is adding an additional responsibility to Kubernetes which currently exists on the other side of the plugin boundary. Are we sure that's the right thing here?

I hope above justifies the addition of the new responsibility to k8s.
This won’t change the k8s network model, but add a common way to solve common use cases, just like service and ingress do.

fejta-bot · 2019-10-09T19:47:13Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-11-08T20:29:49Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

skydoctor · 2019-12-05T15:53:34Z

Wondering if there has been any progress on this?

uablrek · 2019-12-06T09:04:51Z

Kubernetes can define an Egress object in the same way as the Ingress object. Some cloud providers already allows the egress-ip to be specified but outside "standard" Kubernetes (*). With a defined Egress object the way of defining an egress-ip would be uniform, again compare with the Ingress.

The implementation of an egress-ip function will be dependent on the network environment, such as the CNI-plugin, external routers and network security policies. A generic in-cluster solution can not be provided and therefore Kubernetes should not implement this function IMO.

And BTW, the PoC refered in this PR and the more mature https://github.com/nirmata/kube-static-egress-ip are not generic.

skydoctor · 2019-12-08T18:48:42Z

Agree that an Egress object in K8s with cloud-provider specific implementation would be ideal. Similar to how Service type LoadBalancer allows each cloud-provider to assign IP addresses in their own ways. But just like the Service object which specifies which pods should receive packets from that ingress external IP, we need some native K8s object which specifies which pods should be able to send out through the external egress IP.

thockin

I went to the bottom of my email list and found this - it wasn't assigned to me so it never got priority. Sorry.

Thoughts.

Does the IP have to be a node's IP or can it be any IP in the network? Can it be a public IP? Can it be different for different destinations (e.g. an RFC-1918 IP if dest is RFC-1918, but a public IP if dest is not RFC-1918)?
Can or must the user ask for a specific IP or is it allocated by the controller?
What happens if the use asks for an IP that is already in use?
What happens if two Egress resources select the same pod, which IP is used? Perhaps this should be based on ServiceAccount or something else that is 1:1 instead?
I understand how you can make a single pod work. How can you make multiple pods work?
Overall if something like this is to be developed, I think it should be a CRD and live outside the core for the forseeable future. I'd want to see several implementations to explore the intricacies of the idea.

That said, this is a few months old - is it something you still want to pursue?

nitishm · 2019-12-18T16:59:32Z

@thockin Thanks for sharing those thoughts which are definitely something to be considered. This feature is still something we need and wish to pursue.
I can imagine this requirement is going to be something more and more internet/telco based organizations will require (similar to ours).

bowei · 2019-12-18T17:21:41Z

cc: @satyasm @vbannai

mkimuram · 2019-12-19T00:23:43Z

Thank you for comments and sorry for late response.

I'm thinking about making this feature available outside k8s, as it would require CNI-plugin or cloud provider specific implementation, which k8s community would like to avoid adding to k8s. (For me, this feature isn't necessary to be in k8s, as long as it is a well-maintained common way. Some might not agree with it, though.)

So, improving project like kube-static-egress-ip would be one way to solve this goal. (Thank you for sharing information on the project. @uablrek)

Another idea is to extend submariner's scope from cluster-to-cluster to cluster-to-non-cluster. The summary of my idea is making pod accessible via podIP from outside cluster without NAT, instead of assigning 1:1 static NAT accessible from outside cluster. (It's still just an idea and I haven't discussed it with submariner community yet, though).

By doing this, it would allow:

Reserving source IP (podIP) to outside the cluster where the pod is running
Connectivity over internet, not just inside the same LAN
HA gateway on k8s cluster
Offload IP address management to k8s (by utilizing podIP)

However, ability to assign specific IP is missing in this case. So, we need to think about a common way to solve it for some use cases, like ip-based external firewall where the IPs are expected to be specific fixed values. This feature would be challenging as @thockin pointed out, and it seems to be a kind of feature that would be hardly accepted to k8s.
(Actually, we won't need to stick to assigning static podIPs in k8s cluster to achieve this goal. For example, we would be able to 1:1 static NAT in the remote(non-cluster side's) gateway, by reconciling mapping between podIP and floating IP, as it is done in "local" gateway in kube-static-egress-ip.)

mkimuram · 2020-01-07T17:44:32Z

Let me share the details of my implementation idea that I mentioned in my previous comment.
(It uses iptables and ssh tunnel, however there might be a better way. Any feedback is welcomed.)

Let's assume a use case as described in figure#1:

Source IP for access from Client Pod#1 to port 8000 of External Server (A) should be External IP#1,
Source IP for access from Client Pod#2 to port 8000 of External Server (A) should be External IP#2.

(We will be able to extend it to multiple external servers and/or N:1 mapping of pods to an external source IP, later.)

It can be achieved by below steps (See figure#2):

Create a gateway server that has External IP#1 and External IP#2 and make an sshd service run for each IP on the server (Gateway server needs to be accessible to external server from the IPs),
Create a forwarder pod for the external server and create an ssh tunnel to a different port for the server for each external IP and make iptables rules to forward to the right tunnel depending on the source IP of the pod (sshd needs to be accessible from the forwarder pod),
Create a service to the forwarder pod to expose it to client pods,
Then, each client pod will be able to access to external server via the service and the source IP for the access to the external server will be as expected.

This concept will be able to extend like below:

To add more external IPs, assign external IPs to the gateway server and run sshd for each external IP,
To consume the external IP, create an ssh tunnel for the external IP in the forwarder pod,
To assign multiple pods to an external IP, add iptables rules for the pod to the ssh tunnel,
To add external servers, add forwarder pod for each external server and create iptables rules and an ssh tunnel for it inside the forwarder pod.

Forwarder pods need to be created as many as external servers, but forwarding rule and ssh tunnel for the servers exist only in the forwarder pod for a particular external server. Therefore, no change is required for the gateway server side for podIP changes and mapping changes for another external servers. We will be able to create a k8s operator to reconcile the mapping in forwarder pods without introducing much complexity.

In addition, this idea only assumes "pods on a node can communicate with all pods on all nodes without NAT", therefore it should work well for any CNI plugins and any cloud providers.
Also, if k8s clusters are connected by pod network level, for example by using submariner, any external networks accessible by one of the k8s cluster will be accessible by all the other k8s clusters (We can even create small k8s cluster, like Kind, in a certain external network just to access to the external network from the k8s clusters).

I manually confirmed that above concept works well for the configuration of the above use case by below steps:

Setup test environment

[On external server]

Confirm IP address of external server

# ip addr show eth0 | awk '/inet /{print $2}'
192.168.122.139/24

Run service on the node (HTTP server on port 8000)

# python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

[On Gateway node]

Assign IP addresses to be used as external IPs and run sshd for each IP (Just for test, see (*1))

[On k8s]

Create client1 and client2

$ kubectl run client1 --image=centos:7 --restart=Never --command -- bash -c "sleep 10000"
$ kubectl run client2 --image=centos:7 --restart=Never --command -- bash -c "sleep 10000"

Create forwarder pod

$ cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: fwd-pod
  labels:
    app: fwd-pod
spec:
  containers:
  - command:
    - bash
    - -c
    - sleep 10000
    image: centos:7
    name: fwd-pod
    securityContext: 
      privileged: true
EOF

Confirm pod IPs

$ kubectl get pod -o wide
NAME      READY   STATUS    RESTARTS   AGE    IP           NODE               NOMINATED NODE   READINESS GATES
client1   1/1     Running   0          8m7s   10.244.2.7   cluster1-worker    <none>           <none>
client2   1/1     Running   0          8m3s   10.244.2.8   cluster1-worker    <none>           <none>
fwd-pod   1/1     Running   0          24s    10.244.2.9   cluster1-worker    <none>           <none>

[On forwarder pod ($ kubectl exec -it fwd-pod bash)]

Create ssh tunnel to external server via gateway node

# yum install -y openssh-clients

# EXTIP1=192.168.122.200
# EXTIP2=192.168.122.201
# EXTSVRIP=192.168.122.139

# ssh -g -f -N -L 8001:$EXTSVRIP:8000 $EXTIP1
# ssh -g -f -N -L 8002:$EXTSVRIP:8000 $EXTIP2

Test accessibility and check source IP on HTTP server

# curl localhost:8001
# curl localhost:8002

(Stdout from SimpleHTTPServer)

192.168.122.200 - - [20/Dec/2019 17:04:13] "GET / HTTP/1.1" 200 -
192.168.122.201 - - [20/Dec/2019 17:04:15] "GET / HTTP/1.1" 200 -

Setup iptables rules to forward to different ports depending on a source IP

# yum install -y iptables

# CLIENT1IP=10.244.2.7
# CLIENT2IP=10.244.2.8
# FWDPODIP=10.244.2.9
# EXTSVRIP=192.168.122.139

# iptables -A PREROUTING -t nat -m tcp -p tcp --dst $FWDPODIP --src $CLIENT1IP --dport 8000 -j DNAT --to-destination $FWDPODIP:8001
# iptables -A POSTROUTING -t nat -m tcp -p tcp --dst $EXTSVRIP --dport 8001 -j SNAT --to-source $FWDPODIP
# iptables -A PREROUTING -t nat -m tcp -p tcp --dst $FWDPODIP --src $CLIENT2IP --dport 8000 -j DNAT --to-destination $FWDPODIP:8002
# iptables -A POSTROUTING -t nat -m tcp -p tcp --dst $EXTSVRIP --dport 8002 -j SNAT --to-source $FWDPODIP

# iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
DNAT       tcp  --  10.244.2.7           10.244.2.9           tcp dpt:8000 to:10.244.2.9:8001
DNAT       tcp  --  10.244.2.8           10.244.2.9           tcp dpt:8000 to:10.244.2.9:8002

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
SNAT       tcp  --  0.0.0.0/0            192.168.122.139      tcp dpt:8001 to:10.244.2.9
SNAT       tcp  --  0.0.0.0/0            192.168.122.139      tcp dpt:8002 to:10.244.2.9

[On k8s]

Expose forwarder pod as service

# kubectl expose pod fwd-pod --name=ext-service --port=8000
# kubectl get svc
NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
ext-service   ClusterIP   100.94.90.120   <none>        8000/TCP   16s
kubernetes    ClusterIP   100.94.0.1      <none>        443/TCP    26h

Test
[On client1 ($ kubectl exec -it client1 bash)]

Access to ext-service

$ curl ext-service:8000

Check source IP for the access on external server
(Stdout from SimpleHTTPServer)

192.168.122.200 - - [20/Dec/2019 17:30:14] "GET / HTTP/1.1" 200 -

[On client2 ($ kubectl exec -it client2 bash)]

Access to ext-service

$ curl ext-service:8000

Check source IP for the access on external server
(Stdout from SimpleHTTPServer)

192.168.122.201 - - [20/Dec/2019 17:30:52] "GET / HTTP/1.1" 200 -

(*1)

# ip link add macvlan1 link eth0 type macvlan mode bridge
# ip netns add net1
# ip link set macvlan1 netns net1
# ip netns exec net1 bash
# ip link set lo up
# ip link set macvlan1 up
# ip addr add 192.168.122.200/24 dev macvlan1
# ip route add default via 192.168.122.1
# /usr/sbin/sshd -o PidFile=/run/sshd-net1.pid
# exit

# ip link add macvlan2 link eth0 type macvlan mode bridge
# ip netns add net2
# ip link set macvlan2 netns net2
# ip netns exec net2 bash
# ip link set lo up
# ip link set macvlan2 up
# ip addr add 192.168.122.201/24 dev macvlan2
# ip route add default via 192.168.122.1
# /usr/sbin/sshd -o PidFile=/run/sshd-net2.pid
# exit

skydoctor · 2020-01-07T23:24:31Z

Thanks for sharing the details of your mechanism @mkimuram. In part, this shows the complexity required to ensure that egressing packets from a given pod-type have a static source IP. A native approach would enlist the help of iptables at the node level to SNAT the packets going out. The key thing to think about is what entity creates the right iptables rule - an independent operator or possibly kube-proxy?

skydoctor · 2020-01-07T23:29:05Z

@thockin: A common requirement in this regard is the need for symmetry: When a LoadBalancer service is used, an external client sends packets to the LoadBalancer IP. When pods that are part of that service initiate packets towards an external server, the packets should go out using the LoadBalancer IP as the source IP.

mkimuram · 2020-01-09T22:29:04Z

@skydoctor

The key thing to think about is what entity creates the right iptables rule - an independent operator or possibly kube-proxy?

Yes. I meant to make k8s operator, and possibly k8s, handle this complexity.
(It's too much work to do it, manually. The list of commands are just to test that it works.)

A common requirement in this regard is the need for symmetry: When a LoadBalancer service is used, an external client sends packets to the LoadBalancer IP. When pods that are part of that service initiate packets towards an external server, the packets should go out using the LoadBalancer IP as the source IP.

External IPs are already used by gateway server, so we won't be able to assign the same external IP to k8s's load balancer when the above idea is used. However, if we create a per external IP forwarder pod and run ssh client to remote forward ssh tunnel in it, packets from external servers could be sent to client pods via corresponding external IPs.

In that case, the source IP of the packets won't be the clusterIP of the service of the external server's forwarder pod, instead it will be the IP of the per external IP forwarder pod. So, it won't be symmetry in this point.
Also, by doing it, it makes the combinations of the external IP and particular ports dedicate to it
(Without it, only port 22 for an external IP is used on the gateway server).
So, we will need to care about which external IPs and ports are used by which set of pods, if we do it.

I will consider if there will be a better way.

mkimuram · 2020-01-17T01:31:12Z

Let me also share the detailed idea of preserving source IPs of reverse access.

By using remote ssh tunnel, source IP of packets from external server to each pod can be each external server's fwd-pod's IP. So, each pod will be able to distinguish which external server sent the packets. (Each pod will be able to access to the right external server via the fwd-pod's IP, although the IP might change.)

This won't be a perfect solultion, but I couldn't find a good way to send a packet from service IP.
I'm sharing this to get feedback. Please see the figure 3 and details below.

Note that in this idea, it creates a tunnel by using ssh client in forwarder pod, instead of creating it in per external IP pod, as I mentioned in my previous comment. This is because remote ssh tunnel doesn't preserve source IP, so it's too late to find original source IP after it is tunneled to pod network. By doing so, iptables rules need to be updated in the gateway server, which would make management complex, but I guess that it is still possible if we develop k8s operator to handle it.

Set up the same configuration in #1105 (comment) .

[On client1 ($ kubectl exec -it client1 bash)]

Run service on the client1 (HTTP server on port 80)

# python -m SimpleHTTPServer 80
Serving HTTP on 0.0.0.0 port 80 ...

[On client2 ($ kubectl exec -it client2 bash)]

Run service on the client2 (HTTP server on port 80)

# python -m SimpleHTTPServer 80
Serving HTTP on 0.0.0.0 port 80 ...

[On k8s]

Create services for clients

$ kubectl expose pod client1 --name=cl1-service --port=80
$ kubectl expose pod client2 --name=cl2-service --port=80

Confirm the service IP

# kubectl get svc cl1-service cl2-service
NAME          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
cl1-service   ClusterIP   100.94.212.105   <none>        80/TCP    12m
cl2-service   ClusterIP   100.94.16.100    <none>        80/TCP    14s

[On forwarder pod ($ kubectl exec -it fwd-pod bash)]

Create remote ssh tunnel to client1's service IP via gateway node

# EXTIP1=192.168.122.200
# EXTIP2=192.168.122.201
# CL1SVCIP=100.94.212.105
# CL2SVCIP=100.94.16.100
# ssh -f -N -R $EXTIP1:10080:$CL1SVCIP:80 $EXTIP1
# ssh -f -N -R $EXTIP2:10080:$CL2SVCIP:80 $EXTIP2

(To make this work, sshd on gateway node needs to allow external access for remote forward, by setting GatewayPorts to clientspecified in sshd_config.)

[On Gateway node]

Add IP tables rule to forward packets to a particular port depending on the source IP in namespace net1 (# ip netns exec net1 bash)

# EXTSVRIP=192.168.122.139
# EXTIP1=192.168.122.200

# iptables -A PREROUTING -t nat -m tcp -p tcp --dst $EXTIP1 --src $EXTSVRIP --dport 80 -j DNAT --to-destination $EXTIP1:10080
# iptables -A POSTROUTING -t nat -m tcp -p tcp --dst $EXTSVRIP --dport 10080 -j SNAT --to-source $EXTIP1

# iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
DNAT       tcp  --  192.168.122.139      192.168.122.200      tcp dpt:80 to:192.168.122.200:10080

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
SNAT       tcp  --  0.0.0.0/0            192.168.122.139      tcp dpt:10080 to:192.168.122.200

2.Add IP tables rule to forward packets to a particular port depending on the source IP in namespace net2 (# ip netns exec net2 bash)

# EXTSVRIP=192.168.122.139
# EXTIP2=192.168.122.201

# iptables -A PREROUTING -t nat -m tcp -p tcp --dst $EXTIP2 --src $EXTSVRIP --dport 80 -j DNAT --to-destination $EXTIP2:10080
# iptables -A POSTROUTING -t nat -m tcp -p tcp --dst $EXTSVRIP --dport 10080 -j SNAT --to-source $EXTIP2

# iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
DNAT       tcp  --  192.168.122.139      192.168.122.201      tcp dpt:80 to:192.168.122.201:10080

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
SNAT       tcp  --  0.0.0.0/0            192.168.122.139      tcp dpt:10080 to:192.168.122.201

Test
[On external server]

Access to external IP#1

$ EXTIP1=192.168.122.200
$ curl $EXTIP1

Check source IP for the access on external server (IP will be fwd-pod's IP.)
(Stdout from SimpleHTTPServer)

10.244.2.9 - - [16/Jan/2020 22:27:54] "GET / HTTP/1.1" 200 -

Access to external IP#2

$ EXTIP2=192.168.122.201
$ curl $EXTIP2

Check source IP for the access on external server (IP will be fwd-pod's IP.)
(Stdout from SimpleHTTPServer)

10.244.2.9 - - [16/Jan/2020 22:27:54] "GET / HTTP/1.1" 200 -

mkimuram · 2020-01-31T23:47:25Z

Just for your information. I've implemented POC code of the above idea, here.

fejta-bot · 2020-03-02T00:03:16Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-03-02T00:03:24Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nitishm · 2020-03-02T00:24:57Z

/remove-lifecycle rotten

sig-network: Add egress-source-ip-support KEP

4f6a3d0

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 13, 2019

k8s-ci-robot requested review from caseydavenport and thockin June 13, 2019 19:27

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. labels Jun 13, 2019

mkimuram mentioned this pull request Jun 13, 2019

Egress source IP support kubernetes/kubernetes#78550

Closed

dcbw reviewed Jun 13, 2019

View reviewed changes

bowei reviewed Jun 13, 2019

View reviewed changes

vllry reviewed Jun 16, 2019

View reviewed changes

danwinship reviewed Jun 17, 2019

View reviewed changes

Add details according to feedback

edac795

mkimuram force-pushed the egress branch from 9248daf to edac795 Compare June 18, 2019 22:35

caseydavenport reviewed Jun 19, 2019

View reviewed changes

Fix description of existing behavior to be more accurate

40899da

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 8, 2019

thockin self-assigned this Dec 18, 2019

thockin reviewed Dec 18, 2019

View reviewed changes

mkimuram mentioned this pull request Jan 7, 2020

Extend the scope of submariner to cluster-to-non-cluster submariner-io/submariner#275

Closed

k8s-ci-robot closed this Mar 2, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 2, 2020

bowei mentioned this pull request May 27, 2020

Question regarding masquerading all, lets say SSH only, egress traffic to a lot of destinations as sourced from limited number of fixed IPs kubernetes-sigs/ip-masq-agent#54

Closed

jianjuns mentioned this pull request Jan 28, 2021

Egress Policy antrea-io/antrea#667

Closed


		## Proposal

		Expose an egress API to user like below to allow user to assign a static egress source IP to specific pod(s).


		## Summary

		Egress source IP is a feature to assign a static egress source IP for packets from a pod to outside k8s cluster.


		### Goals

		Provide users with an official and common way to assign a static egress source IP for packets from a pod to outside k8s cluster.


		## Motivation

		In k8s, egress traffic has its source IP translated (SNAT) to appear as the node IP when it leaves the cluster. However, there are many devices and software that use IP based ACLs to restrict incoming traffic for security reasons and bandwidth limitations. As a result, this kind of ACLs outside k8s cluster will block packets from the pod, which causes a connectivity issue. To resolve this issue, we need a feature to assign a particular static egress source IP to one or more particular pods.

sig-network: Add egress-source-ip-support KEP #1105

sig-network: Add egress-source-ip-support KEP #1105

Conversation

mkimuram commented Jun 13, 2019

k8s-ci-robot commented Jun 13, 2019

k8s-ci-robot commented Jun 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjhaid commented Jun 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkimuram commented Jun 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta-bot commented Oct 9, 2019

fejta-bot commented Nov 8, 2019

skydoctor commented Dec 5, 2019

uablrek commented Dec 6, 2019

skydoctor commented Dec 8, 2019

thockin left a comment

Choose a reason for hiding this comment

nitishm commented Dec 18, 2019

bowei commented Dec 18, 2019

mkimuram commented Dec 19, 2019

mkimuram commented Jan 7, 2020 • edited Loading

skydoctor commented Jan 7, 2020

skydoctor commented Jan 7, 2020

mkimuram commented Jan 9, 2020

mkimuram commented Jan 17, 2020

mkimuram commented Jan 31, 2020

fejta-bot commented Mar 2, 2020

k8s-ci-robot commented Mar 2, 2020

nitishm commented Mar 2, 2020

bjhaid commented Jun 16, 2019 •

edited

Loading

mkimuram commented Jan 7, 2020 •

edited

Loading