Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure nginx worker timeout #1088

Merged
merged 1 commit into from
Aug 8, 2017
Merged

Conversation

aledbf
Copy link
Member

@aledbf aledbf commented Aug 8, 2017

No description provided.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 8, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 44.768% when pulling 106cfca on aledbf:worker-timeout into 27f447f on kubernetes:master.

@aledbf aledbf merged commit 91077a2 into kubernetes:master Aug 8, 2017
@aledbf aledbf deleted the worker-timeout branch August 8, 2017 18:47
@danielfm
Copy link

danielfm commented Aug 9, 2017

Looking forward to this! 😄

I'm trying to deploy a WebSocket application to Kubernetes and use NGINX to reverse proxy the WebSocket connections to my application pods, but I observed a very odd memory usage behavior in my ingress pods ever since I deployed this to production:

image

After struggling for a while, I noticed that the increased memory usage was due to lots of worker processes stuck in worker process is shutting down state, probably due to active WebSocket connections still being handled by those workers:

root       507   490  0 16:09 ?        00:00:00 /usr/bin/dumb-init -v /nginx-ingress-controller --default-backend-service=kube-system/broken-bronco-nginx-ingress-be --configmap=kube-system/broken-bronco-nginx-ingress-conf --ingress-class=nginx-ingress-prd
root       521   507  0 16:09 ?        00:00:38 /nginx-ingress-controller --default-backend-service=kube-system/broken-bronco-nginx-ingress-be --configmap=kube-system/broken-bronco-nginx-ingress-conf --ingress-class=nginx-ingress-prd
root       543   521  0 16:09 ?        00:00:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nobody     562   543  1 16:09 ?        00:01:17 nginx: worker process is shutting down
nobody    4868   543  5 17:45 ?        00:00:08 nginx: worker process
core      5838  5826  0 17:48 pts/0    00:00:00 grep --colour=auto nginx
nobody   10001   543  0 16:34 ?        00:00:23 nginx: worker process is shutting down
nobody   12630   543  2 16:41 ?        00:01:34 nginx: worker process is shutting down
nobody   23713   543  0 17:10 ?        00:00:01 nginx: worker process is shutting down
nobody   24172   543  0 17:11 ?        00:00:05 nginx: worker process is shutting down
nobody   24867   543  0 17:13 ?        00:00:07 nginx: worker process is shutting down
nobody   25864   543  2 17:15 ?        00:00:40 nginx: worker process is shutting down
nobody   30571   543  0 17:28 ?        00:00:11 nginx: worker process is shutting down
nobody   31906   543  4 17:32 ?        00:00:44 nginx: worker process is shutting down
...

Stracing one of those worker processes showed that it was still handling WebSocket traffic:

Process 10001 attached
gettimeofday({1502304533, 387656}, NULL) = 0
epoll_wait(24, {{EPOLLIN|EPOLLOUT, {u32=31428513, u64=31428513}}}, 512, 3940) = 1
gettimeofday({1502304534, 692450}, NULL) = 0
recvfrom(45, "\301\203n\0\3514\\\2\351", 4096, 0, NULL, NULL) = 9
sendto(48, "\301\203n\0\3514\\\2\351", 9, 0, NULL, 0) = 9
epoll_wait(24, {{EPOLLIN|EPOLLOUT, {u32=31431873, u64=31431873}}}, 512, 10000) = 1
gettimeofday({1502304534, 694349}, NULL) = 0
recvfrom(48, "\201\0013", 4096, 0, NULL, NULL) = 3
sendto(45, "\201\0013", 3, 0, NULL, 0)  = 3
epoll_wait(24, {{EPOLLIN|EPOLLOUT, {u32=31431873, u64=31431873}}}, 512, 9998) = 1
gettimeofday({1502304536, 762742}, NULL) = 0
recvfrom(48, "\201~\1R42/quaestio,[\"insert\",{\"text"..., 4096, 0, NULL, NULL) = 342
sendto(45, "\201~\1R42/quaestio,[\"insert\",{\"text"..., 342, 0, NULL, 0) = 342

So, is specifying a worker shutdown timeout (what this PR does) the only way to avoid having worker nodes unable to shut down due to active WebSocket connections, or do you guys know other ways of handling this?

@aledbf
Copy link
Member Author

aledbf commented Aug 9, 2017

@danielfm you can use this image quay.io/aledbf/nginx-ingress-controller:0.173 to test the new parameter.
I'm planing to release a new version before the end of the week

@danielfm
Copy link

danielfm commented Aug 9, 2017

@aledbf just tried this image, but could not find the worker_shutdown_timeout parameter in the generated configuration.

@aledbf
Copy link
Member Author

aledbf commented Aug 9, 2017

Sorry about that. Please use quay.io/aledbf/nginx-ingress-controller:0.174

Edit: this image contains current master

@danielfm
Copy link

danielfm commented Aug 9, 2017

It did not seem to work for me; I still see several workers stuck in shutting down:

root     17755 17739  0 19:47 ?        00:00:00 /usr/bin/dumb-init /nginx-ingress-controller --default-backend-service=kube-system/broken-bronco-nginx-ingress-be --configmap=kube-system/broken-bronco-nginx-ingress-conf --ingress-class=nginx-ingress-prd
root     17765 17755  0 19:47 ?        00:00:08 /nginx-ingress-controller --default-backend-service=kube-system/broken-bronco-nginx-ingress-be --configmap=kube-system/broken-bronco-nginx-ingress-conf --ingress-class=nginx-ingress-prd
root     17776 17765  0 19:47 ?        00:00:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nobody   18866 17776  0 19:49 ?        00:00:05 nginx: worker process is shutting down
nobody   19466 17776  0 19:51 ?        00:00:01 nginx: worker process is shutting down
nobody   19698 17776  0 19:51 ?        00:00:05 nginx: worker process is shutting down
nobody   20331 17776  0 19:53 ?        00:00:05 nginx: worker process is shutting down
nobody   20947 17776  0 19:54 ?        00:00:03 nginx: worker process is shutting down
nobody   21390 17776  1 19:55 ?        00:00:05 nginx: worker process is shutting down
nobody   22139 17776  0 19:57 ?        00:00:00 nginx: worker process is shutting down
nobody   22251 17776  0 19:57 ?        00:00:01 nginx: worker process is shutting down
nobody   22510 17776  0 19:58 ?        00:00:01 nginx: worker process is shutting down
nobody   22759 17776  0 19:58 ?        00:00:01 nginx: worker process is shutting down
nobody   23038 17776  1 19:59 ?        00:00:03 nginx: worker process is shutting down
nobody   23476 17776  1 20:00 ?        00:00:01 nginx: worker process is shutting down
nobody   23738 17776  1 20:00 ?        00:00:01 nginx: worker process is shutting down
nobody   24026 17776  2 20:01 ?        00:00:02 nginx: worker process is shutting down
nobody   24408 17776  4 20:01 ?        00:00:01 nginx: worker process

Any suggestion on how to find out what's going on?

@aledbf
Copy link
Member Author

aledbf commented Aug 9, 2017

@danielfm how are you testing this?

@danielfm
Copy link

danielfm commented Aug 9, 2017

I simply deployed the image you provided and, when the ingress deployment was fully rolled out, I changed the DNS record for this application to shift traffic to the one deployed in Kubernetes.

As soon as I switch the DNS, client applications start connecting to the WebSocket application in Kubernetes via the NGINX ingress load balancer. When I do that, I can almost immediately see the memory usage for the ingress pods going up:

image

And when I log into any machine where some NGINX pod is running, I can see several workers unable to shutdown (see the ps -ef listing in my previous comment).

When I rollback the DNS change, the memory usage stabilizes:

image

I'm also seeing lots of reload requests made by the ingress controller that I'm unable to explain (I mean, I expect the configuration to be reloaded, but not that frequently), this might give you some insight:

I0809 19:47:29.748303       6 leaderelection.go:184] successfully acquired lease kube-system/ingress-controller-leader-nginx-ingress-prd
I0809 19:47:30.186002       6 controller.go:419] backend reload required
I0809 19:47:30.439873       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:47:46.152766       6 controller.go:419] backend reload required
I0809 19:47:46.283154       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:48:12.824363       6 controller.go:419] backend reload required
I0809 19:48:12.959181       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:48:46.157339       6 controller.go:419] backend reload required
I0809 19:48:46.286852       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:49:16.150587       6 controller.go:419] backend reload required
I0809 19:49:16.345406       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:49:42.816188       6 controller.go:419] backend reload required
I0809 19:49:42.959675       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:51:12.817276       6 controller.go:419] backend reload required
I0809 19:51:12.947618       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:51:42.854549       6 controller.go:419] backend reload required
I0809 19:51:43.054739       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:53:16.152390       6 controller.go:419] backend reload required
I0809 19:53:16.378867       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:54:44.605780       6 controller.go:419] backend reload required
I0809 19:54:44.850715       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:55:44.543525       6 controller.go:419] backend reload required
I0809 19:55:44.739965       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:57:29.511057       6 controller.go:419] backend reload required
I0809 19:57:29.664816       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:57:42.815175       6 controller.go:419] backend reload required
I0809 19:57:42.893928       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:58:12.815656       6 controller.go:419] backend reload required
I0809 19:58:12.947353       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:58:42.814878       6 controller.go:419] backend reload required
I0809 19:58:42.892335       6 controller.go:428] ingress backend successfully reloaded...
I0809 19:59:12.823478       6 controller.go:419] backend reload required
I0809 19:59:12.954918       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:00:16.150933       6 controller.go:419] backend reload required
I0809 20:00:16.339993       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:00:46.154790       6 controller.go:419] backend reload required
I0809 20:00:46.280207       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:01:12.827212       6 controller.go:419] backend reload required
I0809 20:01:12.962812       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:01:16.149813       6 controller.go:419] backend reload required
I0809 20:01:16.352509       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:01:58.773523       6 controller.go:419] backend reload required
I0809 20:01:58.955574       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:01:59.494158       6 controller.go:419] backend reload required
I0809 20:01:59.652722       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:03:03.463374       6 controller.go:419] backend reload required
I0809 20:03:03.659459       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:03:08.989399       6 controller.go:419] backend reload required
I0809 20:03:09.158166       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:03:55.041454       6 controller.go:419] backend reload required
I0809 20:03:55.175486       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:04:42.592096       6 controller.go:419] backend reload required
I0809 20:04:42.749416       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:04:49.475018       6 controller.go:419] backend reload required
I0809 20:04:49.640061       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:06:05.272085       6 controller.go:419] backend reload required
I0809 20:06:05.377370       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:07:29.501766       6 controller.go:419] backend reload required
I0809 20:07:29.664621       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:08:16.148507       6 controller.go:419] backend reload required
I0809 20:08:16.277656       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:08:42.817247       6 controller.go:419] backend reload required
I0809 20:08:43.051563       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:09:12.872480       6 controller.go:419] backend reload required
I0809 20:09:13.147934       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:09:46.151076       6 controller.go:419] backend reload required
I0809 20:09:46.275197       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:10:46.159073       6 controller.go:419] backend reload required
I0809 20:10:46.281841       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:11:12.873691       6 controller.go:419] backend reload required
I0809 20:11:13.065928       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:12:02.823197       6 controller.go:419] backend reload required
I0809 20:12:03.054718       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:12:34.720417       6 controller.go:419] backend reload required
I0809 20:12:34.867959       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:12:44.543628       6 controller.go:419] backend reload required
I0809 20:12:44.674828       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:17:29.494224       6 controller.go:419] backend reload required
I0809 20:17:29.641063       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:17:46.149205       6 controller.go:419] backend reload required
I0809 20:17:46.269999       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:18:12.862447       6 controller.go:419] backend reload required
I0809 20:18:13.062000       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:19:42.816781       6 controller.go:419] backend reload required
I0809 20:19:42.956621       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:20:12.822210       6 controller.go:419] backend reload required
I0809 20:20:12.956532       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:20:42.822127       6 controller.go:419] backend reload required
I0809 20:20:42.959075       6 controller.go:428] ingress backend successfully reloaded...
I0809 20:21:12.815628       6 controller.go:419] backend reload required
I0809 20:21:12.951984       6 controller.go:428] ingress backend successfully reloaded...
...

Edit: As soon as I kill those workers, the memory usage returns back to "normal":

image

@danielfm
Copy link

I seem to have found the culprit in my case: a misconfigured socket.io ping interval / timeout. After adjusting these according to NGINX read/send timeout (and ELB idle timeout, since I'm running this on AWS), this is how things are working now:

  1. Clients connect to the WebSocket-based application
  2. Ingress controller reloads the configuration (i.e. when the endpoints for some service change), spawning new workers
  3. The old workers will not be terminated until all open connections handled by them are closed. Since we are talking about WebSockets, these connections might take a long time to finish; the new worker_shutdown_timeout parameter does not have any effect on this, as far as I could tell

From the client PoV, this is nice because we avoid dropping WebSockets at every configuration reloads (which can happen quit frequently in larger deployments).

However, from the server PoV, you might end accumulating workers in 'shutting down' state (which might cause the elevated memory consumption as I showed earlier) depending on how long the WebSocket connections are kept open and how many times the configuration gets reloaded by the ingress controller.

After spending some time on this problem, it seems like the only way to mitigate this is to find a way to keep the rate of configuration reloads as low as possible. One way of achieving this is running a dedicated ingress deployment just for this WebSocket app, but it seems a bit overkill.

@aledbf What do you think?

Sorry for hijacking this thread, but it has gone too far already. 😅

@aledbf
Copy link
Member Author

aledbf commented Aug 10, 2017

Ingress controller reloads the configuration (i.e. when the endpoints for some service change), spawning new workers

Can you increase the log level to 2? (flag --v=2) so you can see the diff of the configuration and which service is the guilty of the changes? The number of reloads is not normal (I have more than 300 apps and I cannot reproduce that)

@aledbf
Copy link
Member Author

aledbf commented Aug 10, 2017

@danielfm are you using slack (k8s channel)?

@danielfm
Copy link

@aledbf Yes, my handle is 'danielmartins'.

I've increased the logs and, as far as I can tell, the only thing triggering configuration reloads are changes in upstreams. (Well, not exactly, since apparently the list of endpoints and servers are identical, the only thing that changed was the order in which they got rendered):

I0810 23:14:47.866939       5 nginx.go:300] NGINX configuration diff
I0810 23:14:47.866963       5 nginx.go:301] --- /tmp/a072836772	2017-08-10 23:14:47.000000000 +0000
+++ /tmp/b304986035	2017-08-10 23:14:47.000000000 +0000
@@ -163,32 +163,26 @@
 
     proxy_ssl_session_reuse on;
 
-    upstream production-chimera-production-pepper-80 {
+    upstream upstream-default-backend {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.71.14:3000 max_fails=0 fail_timeout=0;
-        server 10.2.32.22:3000 max_fails=0 fail_timeout=0;
+        server 10.2.157.13:8080 max_fails=0 fail_timeout=0;
     }
 
-    upstream production-gabarito-production-80 {
+    upstream production-landings-production-80 {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.110.13:3000 max_fails=0 fail_timeout=0;
-        server 10.2.109.195:3000 max_fails=0 fail_timeout=0;
+        server 10.2.82.66:3000 max_fails=0 fail_timeout=0;
+        server 10.2.79.124:3000 max_fails=0 fail_timeout=0;
+        server 10.2.59.21:3000 max_fails=0 fail_timeout=0;
+        server 10.2.45.219:3000 max_fails=0 fail_timeout=0;
     }
 
     upstream production-sisu-production-80 {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.109.177:3000 max_fails=0 fail_timeout=0;
         server 10.2.12.161:3000 max_fails=0 fail_timeout=0;
-    }
-
-    upstream production-lap-production-worker-80 {
-        # Load balance algorithm; empty for round robin, which is the default
-        least_conn;
-        server 10.2.21.37:9292 max_fails=0 fail_timeout=0;
-        server 10.2.65.105:9292 max_fails=0 fail_timeout=0;
+        server 10.2.109.177:3000 max_fails=0 fail_timeout=0;
     }
 
     upstream production-passepartout-production-80 {
@@ -201,61 +195,67 @@
     upstream production-lap-production-80 {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.45.223:8000 max_fails=0 fail_timeout=0;
+        server 10.2.21.36:8000 max_fails=0 fail_timeout=0;
         server 10.2.78.36:8000 max_fails=0 fail_timeout=0;
+        server 10.2.45.223:8000 max_fails=0 fail_timeout=0;
         server 10.2.99.151:8000 max_fails=0 fail_timeout=0;
-        server 10.2.21.36:8000 max_fails=0 fail_timeout=0;
     }
 
-    upstream production-desauth-production-80 {
+    upstream production-chimera-production-pepper-80 {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.79.126:3000 max_fails=0 fail_timeout=0;
-        server 10.2.35.105:3000 max_fails=0 fail_timeout=0;
-        server 10.2.114.143:3000 max_fails=0 fail_timeout=0;
-        server 10.2.50.44:3000 max_fails=0 fail_timeout=0;
-        server 10.2.149.135:3000 max_fails=0 fail_timeout=0;
-        server 10.2.45.155:3000 max_fails=0 fail_timeout=0;
+        server 10.2.71.14:3000 max_fails=0 fail_timeout=0;
+        server 10.2.32.22:3000 max_fails=0 fail_timeout=0;
     }
 
-    upstream production-live-production-80 {
+    upstream production-gabarito-production-80 {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.53.23:5000 max_fails=0 fail_timeout=0;
-        server 10.2.110.22:5000 max_fails=0 fail_timeout=0;
-        server 10.2.35.91:5000 max_fails=0 fail_timeout=0;
-        server 10.2.45.221:5000 max_fails=0 fail_timeout=0;
+        server 10.2.110.13:3000 max_fails=0 fail_timeout=0;
+        server 10.2.109.195:3000 max_fails=0 fail_timeout=0;
     }
 
-    upstream upstream-default-backend {
+    upstream production-chimera-production-80 {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.157.13:8080 max_fails=0 fail_timeout=0;
+        server 10.2.78.26:3000 max_fails=0 fail_timeout=0;
+        server 10.2.59.22:3000 max_fails=0 fail_timeout=0;
+        server 10.2.96.249:3000 max_fails=0 fail_timeout=0;
+        server 10.2.32.21:3000 max_fails=0 fail_timeout=0;
+        server 10.2.114.177:3000 max_fails=0 fail_timeout=0;
+        server 10.2.83.20:3000 max_fails=0 fail_timeout=0;
+        server 10.2.118.111:3000 max_fails=0 fail_timeout=0;
+        server 10.2.26.23:3000 max_fails=0 fail_timeout=0;
+        server 10.2.35.150:3000 max_fails=0 fail_timeout=0;
+        server 10.2.79.125:3000 max_fails=0 fail_timeout=0;
+        server 10.2.157.165:3000 max_fails=0 fail_timeout=0;
     }
 
-    upstream production-landings-production-80 {
+    upstream production-lap-production-worker-80 {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.79.124:3000 max_fails=0 fail_timeout=0;
-        server 10.2.82.66:3000 max_fails=0 fail_timeout=0;
-        server 10.2.45.219:3000 max_fails=0 fail_timeout=0;
-        server 10.2.59.21:3000 max_fails=0 fail_timeout=0;
+        server 10.2.21.37:9292 max_fails=0 fail_timeout=0;
+        server 10.2.65.105:9292 max_fails=0 fail_timeout=0;
     }
 
-    upstream production-chimera-production-80 {
+    upstream production-desauth-production-80 {
         # Load balance algorithm; empty for round robin, which is the default
         least_conn;
-        server 10.2.96.249:3000 max_fails=0 fail_timeout=0;
-        server 10.2.157.165:3000 max_fails=0 fail_timeout=0;
-        server 10.2.114.177:3000 max_fails=0 fail_timeout=0;
-        server 10.2.118.111:3000 max_fails=0 fail_timeout=0;
-        server 10.2.79.125:3000 max_fails=0 fail_timeout=0;
-        server 10.2.78.26:3000 max_fails=0 fail_timeout=0;
-        server 10.2.59.22:3000 max_fails=0 fail_timeout=0;
-        server 10.2.35.150:3000 max_fails=0 fail_timeout=0;
-        server 10.2.32.21:3000 max_fails=0 fail_timeout=0;
-        server 10.2.83.20:3000 max_fails=0 fail_timeout=0;
-        server 10.2.26.23:3000 max_fails=0 fail_timeout=0;
+        server 10.2.114.143:3000 max_fails=0 fail_timeout=0;
+        server 10.2.79.126:3000 max_fails=0 fail_timeout=0;
+        server 10.2.45.155:3000 max_fails=0 fail_timeout=0;
+        server 10.2.35.105:3000 max_fails=0 fail_timeout=0;
+        server 10.2.50.44:3000 max_fails=0 fail_timeout=0;
+        server 10.2.149.135:3000 max_fails=0 fail_timeout=0;
+    }
+
+    upstream production-live-production-80 {
+        # Load balance algorithm; empty for round robin, which is the default
+        least_conn;
+        server 10.2.53.23:5000 max_fails=0 fail_timeout=0;
+        server 10.2.45.221:5000 max_fails=0 fail_timeout=0;
+        server 10.2.35.91:5000 max_fails=0 fail_timeout=0;
+        server 10.2.110.22:5000 max_fails=0 fail_timeout=0;
     }
 
     server {

All other configuration reloads looks like this.

@redbaron
Copy link

redbaron commented Mar 9, 2018

@danielfm , maybe it is not too late, but "--sort-backends=true" should help :)

@danielfm
Copy link

danielfm commented Mar 9, 2018

@redbaron Haha thanks, after I hit this, I noticed the latest version introduced that flag and started using it right away. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants