-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make upstreams route only to nodes known to contain ingress pods #160
Comments
In my cluster, kube-proxy do not pass the traffic as expected. |
Can you please elaborate? |
I solved my problem by changing externalTrafficPolicy to Cluster from Local. I think we should add this to the documentation. If someone doesn't change this to Cluster, NLK will not work as expected. I deployed NGINX Ingress through Helm, where Local is the default configuration. When we set this value to 'Local,' if the traffic is directed to a node where no NGINX Ingress controller pod exists, kube-proxy will not forward this traffic to a node where an NGINX Ingress controller does exist. @ciroque |
Nice find, thoughts on where in the documentation this should be? |
This is exactly how NodePort in Kubernetes works, it opens the same high-numbered socket port on all K8s nodes, you can't change that unless you install some non-standard CNI or do some serious modifications to how Kubernetes networking works. This would likely result in a non-standard architecture. You don't know which Kubernetes node the scheduler will choose for the Ingress controller. Again, unless you modify deployment options and move toward a static configuration, also not recommended unless you really know what you are doing. NLK is not aware of the NIC deployment details. It is intentionally designed to look at the nginx-ingress Service object, which represents one or more NIC pods/endpoints. You don't know how someone will build or change their Local Traffic Policy, so it's not defined as a pre-requisite. Some use Cluster, some use Local. We could add a note that Cluster is required. However, that aside, this is addressed outside the cluster with the Plus LB server. Using the "least_time last_byte" LB algorithm on the external Plus server upstream block, will tell Nginx Plus to prefer the fastest kubernetes nodeport, and it will heavily favor the fastest node. The greater the delta in response times of the nodeports, the greater the favor to the fastest worker node. ( you can see this realtime in the Plus dashboard ). This is why least time last byte is an essential setting for the NLK upstream block. And, If you deploy more than one ingress controller replica, or even DaemonSet, then you might have two workers faster than the others, and Nginx Plus will send the majority of the Requests to the 2 fastest workers where NIC is running, also visible in the Plus dashboard. This algorithm is very dynamic, and responds very well to the dynamic, ever-changing environment running in the cluster at all times. Try a load test, and experiment, and you will see it to be true. Not all clusters, workers, ingresses, services, and pods are the same, so I would argue an adaptive LB algorithm is necessary, if your goal is to provide the quickest response time to every single request. |
Is your feature request related to a problem? Please describe
Currently NLK populates the host/port of all nodes in the cluster to the NGINX
upstream
block. If a request goes to a node that does not have a pod running the NGINX Ingress controller,kube-proxy
will take care of getting it there. However, it seems that it would be better to route the request to the ingress-containing node to begin with since we have that information.Describe the solution you'd like
When we receive updates about a new configuration of the cluster/controller, NLK should compare the node list to the locations of the ingress pods, and only send those upstreams to NLK. That means upstreams should change when:
If there are no nodes containing ingress pods, perhaps it is best then to send all notes and dispatch an event about the trouble. That way at least we consistently send traffic to the cluster and don't drop it if we have some bug/race condition in figuring out ingress pod assignments.
Describe alternatives you've considered
The current operation works fine due to kube-proxy picking up the slack. I don't see any other ways of achieving this.
Additional context
The text was updated successfully, but these errors were encountered: