enabling Session affinity goes to a single pod only #3056

wstrange · 2018-09-07T20:46:03Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version:

0.18.0 and 0.19.0

Kubernetes version (use kubectl version):

1.10.7

Environment:

Cloud provider or hardware configuration: GKE
OS (e.g. from /etc/os-release): COS
Install tools:
Others:

Using an external GCP TCP load balancer (L4) as the ingress IP.

What happened:
With session affinity enabled, traffic goes to a single pod only.

What you expected to happen:
Multiple requests (e.g. with curl -vk ..) should get sent to a different backend.

How to reproduce it (as minimally and precisely as possible):
Working on a simpler repro...

Anything else we need to know:

service works internally. Curl on the internal service name will get spread out to both nodes
both endpoints are alive / responsive
The cookie is getting set just fine. Each curl request results in a new cookie
The nginx config looks like it finds both backend endpoints
disabling affinity results in the ingress results in traffic being spread out over both nodes.

The configuration output is below. The service in question is "openam"

[  
   {  
      "name":"prod-openam-80",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"openam",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":8080
               }
            ],
            "selector":{  
               "app":"openam",
               "release":"openam-prod"
            },
            "clusterIP":"10.0.28.178",
            "type":"ClusterIP",
            "sessionAffinity":"ClientIP",
            "sessionAffinityConfig":{  
               "clientIP":{  
                  "timeoutSeconds":10800
               }
            }
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":80,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.0.19",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         },
         {  
            "address":"10.4.3.22",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"cookie",
         "cookieSessionAffinity":{  
            "name":"INGRESSCOOKIE",
            "hash":"md5",
            "locations":{  
               "openam.prod.frk8s.net":[  
                  "/openam"
               ]
            }
         }
      }
   },
   {  
      "name":"prod-openidm-80",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"openidm",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":8080,
                  "nodePort":30606
               }
            ],
            "selector":{  
               "app":"openidm",
               "release":"openidm-prod"
            },
            "clusterIP":"10.0.24.160",
            "type":"NodePort",
            "sessionAffinity":"None",
            "externalTrafficPolicy":"Cluster"
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":80,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.2.18",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         },
         {  
            "address":"10.4.1.14",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"cookie",
         "cookieSessionAffinity":{  
            "name":"route",
            "hash":"sha1",
            "locations":{  
               "openidm.prod.frk8s.net":[  
                  "/"
               ]
            }
         }
      }
   },
   {  
      "name":"upstream-default-backend",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"http",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":"http"
               }
            ],
            "selector":{  
               "app":"nginx-ingress",
               "component":"default-backend",
               "release":"nginx"
            },
            "clusterIP":"10.0.22.133",
            "type":"ClusterIP",
            "sessionAffinity":"None"
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":0,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.2.4",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"",
         "cookieSessionAffinity":{  
            "name":"",
            "hash":""
         }
      }
   }
]

The text was updated successfully, but these errors were encountered:

wstrange · 2018-09-10T15:33:35Z

Given that we are using a GCP L4 TCP load balancer, is it possible that the hashing algorithm is usng the IP of the GCP load balancer, instead of the client? Would this explain why it always goes to the same pod?

wstrange · 2018-09-10T17:31:31Z

For reference, this is how we are installing the helm chart:

helm install --namespace nginx --name nginx  \
    --set rbac.create=true \
    --set controller.service.loadBalancerIP=$IP \
    --set controller.publishService.enabled=true \
    --set controller.stats.enabled=true \
    --set controller.service.externalTrafficPolicy=Local \
    --set controller.service.type=LoadBalancer \
    stable/nginx-ingress

If we set the image version back to < 0.18.0, we get load balanced requests.

ElvinEfendi · 2018-09-11T01:42:09Z

@wstrange can you post a minimal Ingress manifest to reproduce this?

wstrange · 2018-09-12T15:39:39Z

I'm trying to replicate this. It looks like it does not happen with http. Something to do with https / ssl. I'll keep testing.

Update: Can't replicate yet with a simple test headers app, even on SSL. Sigh..

StaffanSvensson-playtech · 2018-09-21T13:35:06Z

We've also encountered exactly the same symptoms on 2 seperate occations last two weeks where all our load goes to a single pod when using session affinity. We have not experienced this in versions prior to 0.18.0 from what I remember.

Is there some changes done to how session affinity is handled in later versions? I cant seem to see anything about it in the release notes.

We are currently also unable to reproduce this so it's hard to find the root cause of it.
Any ideas what might be the issue here?

wstrange · 2018-09-21T18:17:49Z

We can not replicate this with a simple echo headers application, but see it in a more complex deployment of our Java application.

What is the logic used to calculate the backend pod to steer the session to? This might help us to narrrow down how this happens.

svenbs · 2018-09-24T09:49:30Z

We have the same problem with a 3 pod deployment and http loadbalancing. Sometimes (not always) one pod does not recieve any http traffic. The traffic is instead sent to one of the remaining pods.
We recieve 3 different INGRESSCOOKIEs, but two of them proxy to the same pod. Even if no cookie is set on the request the pod in question is not recieving any requests.

We assume this problem is the same here and is introduced by the dynamic-configuration of backends in https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.18.0.
Disabling this feature by adding --enable-dynamic-configuration=false to the args-section solved this issue for us.

Of course this is just a workaround and we would like to solve the underlying issue with the lua balancer.

floriantraber · 2018-10-22T13:50:58Z

We are experiencing the same problem (on 0.19.0 and 0.20.0).

Thank you @svenbs for suggesting disabling dynamic configuration. Using --enable-dynamic-configuration=false gives us the desired results.

We did however experience an issue that the generated cookie is different in the domain with dynamic configuration enabled. With dynamic configuration the domain of the cookie is .somedomain.com, while without dynamic configuration the domain of the cookie is somedomain.com. Having these two cookie results in non-desired behavior See Image.

wstrange · 2018-10-22T19:44:40Z

Given that others are seeing this issue, and it seems to be hard to reproduce with a simple echoheaders sample, is there any way that more debug / diagnostic information can be logged to show how the dynamic configuration module arrives at decisions on pod backends.

I am guessing that there is some timing issue. i.e. some pod is ready before others, or briefly reports not live, etc.

* Added workaround for bug kubernetes/ingress-nginx#3056 * Changes to support CLOUD-855. Note the change in location of the keystore and password store. They are not directly picked up from the secrets mounts.

ElvinEfendi · 2018-10-30T15:27:57Z

anyone having this issue please try

quay.io/kubernetes-ingress-controller/nginx-ingress-controller:dev

ElvinEfendi · 2018-10-30T15:28:51Z

/close

k8s-ci-robot · 2018-10-30T15:28:53Z

@ElvinEfendi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nelsonfassis · 2018-11-08T23:27:11Z

@ElvinEfendi To which tag will this fix be deployed?
--enable-dynamic-configuration=false fixed the problem for us using the 0.18 but I'm afraid that upgrading it to a newer version without the fix would break the deployment again

ElvinEfendi · 2018-11-14T09:41:11Z

@ElvinEfendi To which tag will this fix be deployed?

@nelsonfassis it will be included in 0.21.0

wstrange · 2018-12-06T16:16:03Z

We are still seeing this issue on 0.21.0

Anyone else?

StaffanSvensson-playtech · 2019-01-21T22:59:39Z

Yes, we have the same issue on 0.22.0 as well.

svenbs · 2019-01-24T09:15:07Z

We're still seeing this on 0.21.0.

qi-min · 2019-02-28T03:20:37Z

Me too, I see same issue in 0.22.0
When I created ingress, everything is ok. But after a few minutes (doing nothing), the nginx start to route all request to same pod.

ElvinEfendi · 2019-02-28T13:28:13Z

Can you try the latest version quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0 and see whether this still is an issue.

qi-min · 2019-03-01T02:05:51Z

@ElvinEfendi After updating to 0.23.0, I do not see this problem for now. Very appreciate for your suggestion.

ElvinEfendi · 2019-03-01T02:09:08Z

@m7luffy you are welcome! In that case most likely the bug was related to #3809 (comment) that got fixed in 0.23.0.

wajihahmed pushed a commit to wajihahmed/forgeops that referenced this issue Oct 23, 2018

Added workaround for bug kubernetes/ingress-nginx#3056

e3c80ed

ElvinEfendi mentioned this issue Oct 30, 2018

Fix sticky session #3324

Merged

k8s-ci-robot closed this as completed Oct 30, 2018

poirot007 mentioned this issue Nov 15, 2018

Some sticky client sessions are re-routed to a new pod when auto-scaling and load balancing does not follow the 'least_conn' algorithm #3384

Closed

wajihahmed pushed a commit to wajihahmed/forgeops that referenced this issue Nov 20, 2018

kubernetes/ingress-nginx#3056 is fixed now so removing workaround

b281754

wajihahmed pushed a commit to ForgeRock/forgeops that referenced this issue Nov 20, 2018

kubernetes/ingress-nginx#3056 is fixed now so removing workaround

08c9101

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enabling Session affinity goes to a single pod only #3056

enabling Session affinity goes to a single pod only #3056

wstrange commented Sep 7, 2018 •

edited

Loading

wstrange commented Sep 10, 2018

wstrange commented Sep 10, 2018 •

edited

Loading

ElvinEfendi commented Sep 11, 2018

wstrange commented Sep 12, 2018 •

edited

Loading

StaffanSvensson-playtech commented Sep 21, 2018

wstrange commented Sep 21, 2018

svenbs commented Sep 24, 2018

floriantraber commented Oct 22, 2018

wstrange commented Oct 22, 2018

ElvinEfendi commented Oct 30, 2018

ElvinEfendi commented Oct 30, 2018

k8s-ci-robot commented Oct 30, 2018

nelsonfassis commented Nov 8, 2018

ElvinEfendi commented Nov 14, 2018

wstrange commented Dec 6, 2018

StaffanSvensson-playtech commented Jan 21, 2019 •

edited

Loading

svenbs commented Jan 24, 2019

qi-min commented Feb 28, 2019 •

edited

Loading

ElvinEfendi commented Feb 28, 2019

qi-min commented Mar 1, 2019

ElvinEfendi commented Mar 1, 2019

enabling Session affinity goes to a single pod only #3056

enabling Session affinity goes to a single pod only #3056

Comments

wstrange commented Sep 7, 2018 • edited Loading

wstrange commented Sep 10, 2018

wstrange commented Sep 10, 2018 • edited Loading

ElvinEfendi commented Sep 11, 2018

wstrange commented Sep 12, 2018 • edited Loading

StaffanSvensson-playtech commented Sep 21, 2018

wstrange commented Sep 21, 2018

svenbs commented Sep 24, 2018

floriantraber commented Oct 22, 2018

wstrange commented Oct 22, 2018

ElvinEfendi commented Oct 30, 2018

ElvinEfendi commented Oct 30, 2018

k8s-ci-robot commented Oct 30, 2018

nelsonfassis commented Nov 8, 2018

ElvinEfendi commented Nov 14, 2018

wstrange commented Dec 6, 2018

StaffanSvensson-playtech commented Jan 21, 2019 • edited Loading

svenbs commented Jan 24, 2019

qi-min commented Feb 28, 2019 • edited Loading

ElvinEfendi commented Feb 28, 2019

qi-min commented Mar 1, 2019

ElvinEfendi commented Mar 1, 2019

wstrange commented Sep 7, 2018 •

edited

Loading

wstrange commented Sep 10, 2018 •

edited

Loading

wstrange commented Sep 12, 2018 •

edited

Loading

StaffanSvensson-playtech commented Jan 21, 2019 •

edited

Loading

qi-min commented Feb 28, 2019 •

edited

Loading