-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress controller cannot reach itself (Azure-VNet, ACS) #91
Comments
@carlpett Is it still happening for you? Can you please share your cluster config. We can try to repro this. |
@sharmasushant Yes, still happens. After some more investigation, it does not only happen for ingresses, but all services with type LoadBalancer. Here is a repro case: apiVersion: v1
kind: Pod
metadata:
name: nginx-reachability-testing
labels:
app: nginx-reachability-testing
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-reachability-testing
spec:
ports:
- port: 80
name: http
type: LoadBalancer
selector:
app: nginx-reachability-testing The public IP is reachable from anywhere, except from within the pod that is backing the service. I reported this while waiting for a response on an Azure support ticket, which has now gotten started, so there is an ongoing case there (118011617470683) |
The reason for the this reported behaviour is that the interface attached to the bridge (azure0) is not well configured. Meaning a default setting for the hairpin mode is used. The man for bridge states the following
As a workaround one can perform the following changes to identify the right interface Howto find out the interface for a given POD to alter the hairpin flag? First identify the node on which the POD is running kubectl get pod nginx-reachability-testing -o wide Second get the interface detail of the POD/Container kubectl exec -it nginx-reachability-testing bash Note down the detail: --> 22: eth0@if21 Third, get to the agent on which the POD/Container is running As seen previously the network interface has the number 22 and is connected to interface 21. The reason is a veth pair which is in use. ip add | grep @if22 The interface name we get returned is the one which is connected on the bridge (azure0) This is the name we have to use to get its hairpin mode changed Howto change the hairpin mode? The hairpin mode for a given interface can easily be altered by performing this step --> sudo bridge link set dev azvethc9e7b2c hairpin on Verification Get the external IP of the POD kubectl get service nginx-reachability-testing connect to your POD: Connect from inside the POD against the LB root@nginx-reachability-testing:/# curl 13.74.43.81 <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> Change the hairpin mode to off again and try to connect against the LB. This will show that the connection is not established again. |
Found the following issue reports which are related to the hairpin mode kubernetes/kubernetes#53269 The main context is this one |
To allow some kind of automation I have created this simple script which can be use as a starting base: https://gist.github.com/malachma/8603d1d6daedceb4320d20b70527f0ac |
We have the exact same use case and we experience the same issue. The workaround from @malachma worked for us. |
Any updates on this issue being fixed natively in the CNI plugin? While we could use the script above and automate it, this seems like something the plugin should do natively and is probably a simple change. I'm happy to look at creating a PR if needed. Thanks. |
This issue is fixed #248 |
Hi,
We recently stood up a new Kubernetes cluster with ACS engine, after previously using the ACS service in the azure cli. The new cluster has an issue where ingress controllers (we're using the stock nginx ingress controller) cannot make requests to "themselves", ie cannot reach its' own public IP.
The only difference I've found between the old clusters and the new one is that this automatically got Azure-VNet CNI, whereas the previous ones had no CNI configuration at all (which I guess means kubenet?).
This is a problem because external auth causes the controller to make a request to its' own domain. This traffic seems to be dropped, and the requests time out.
It is clearly related to pod networking, since the container host can reach the ingress properly:
Is this a known limitation of azure-vnet, or a misconfiguration?
The text was updated successfully, but these errors were encountered: