Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabling Session affinity goes to a single pod only #3056

Closed
wstrange opened this issue Sep 7, 2018 · 21 comments
Closed

enabling Session affinity goes to a single pod only #3056

wstrange opened this issue Sep 7, 2018 · 21 comments

Comments

@wstrange
Copy link

wstrange commented Sep 7, 2018

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version:

0.18.0 and 0.19.0

Kubernetes version (use kubectl version):

1.10.7

Environment:

  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release): COS
  • Install tools:
  • Others:

Using an external GCP TCP load balancer (L4) as the ingress IP.

What happened:
With session affinity enabled, traffic goes to a single pod only.

What you expected to happen:
Multiple requests (e.g. with curl -vk ..) should get sent to a different backend.

How to reproduce it (as minimally and precisely as possible):
Working on a simpler repro...

Anything else we need to know:

  • service works internally. Curl on the internal service name will get spread out to both nodes
  • both endpoints are alive / responsive
  • The cookie is getting set just fine. Each curl request results in a new cookie
  • The nginx config looks like it finds both backend endpoints
  • disabling affinity results in the ingress results in traffic being spread out over both nodes.

The configuration output is below. The service in question is "openam"

[  
   {  
      "name":"prod-openam-80",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"openam",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":8080
               }
            ],
            "selector":{  
               "app":"openam",
               "release":"openam-prod"
            },
            "clusterIP":"10.0.28.178",
            "type":"ClusterIP",
            "sessionAffinity":"ClientIP",
            "sessionAffinityConfig":{  
               "clientIP":{  
                  "timeoutSeconds":10800
               }
            }
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":80,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.0.19",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         },
         {  
            "address":"10.4.3.22",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"cookie",
         "cookieSessionAffinity":{  
            "name":"INGRESSCOOKIE",
            "hash":"md5",
            "locations":{  
               "openam.prod.frk8s.net":[  
                  "/openam"
               ]
            }
         }
      }
   },
   {  
      "name":"prod-openidm-80",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"openidm",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":8080,
                  "nodePort":30606
               }
            ],
            "selector":{  
               "app":"openidm",
               "release":"openidm-prod"
            },
            "clusterIP":"10.0.24.160",
            "type":"NodePort",
            "sessionAffinity":"None",
            "externalTrafficPolicy":"Cluster"
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":80,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.2.18",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         },
         {  
            "address":"10.4.1.14",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"cookie",
         "cookieSessionAffinity":{  
            "name":"route",
            "hash":"sha1",
            "locations":{  
               "openidm.prod.frk8s.net":[  
                  "/"
               ]
            }
         }
      }
   },
   {  
      "name":"upstream-default-backend",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"http",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":"http"
               }
            ],
            "selector":{  
               "app":"nginx-ingress",
               "component":"default-backend",
               "release":"nginx"
            },
            "clusterIP":"10.0.22.133",
            "type":"ClusterIP",
            "sessionAffinity":"None"
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":0,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.2.4",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"",
         "cookieSessionAffinity":{  
            "name":"",
            "hash":""
         }
      }
   }
]
@wstrange
Copy link
Author

Given that we are using a GCP L4 TCP load balancer, is it possible that the hashing algorithm is usng the IP of the GCP load balancer, instead of the client? Would this explain why it always goes to the same pod?

@wstrange
Copy link
Author

wstrange commented Sep 10, 2018

For reference, this is how we are installing the helm chart:

helm install --namespace nginx --name nginx  \
    --set rbac.create=true \
    --set controller.service.loadBalancerIP=$IP \
    --set controller.publishService.enabled=true \
    --set controller.stats.enabled=true \
    --set controller.service.externalTrafficPolicy=Local \
    --set controller.service.type=LoadBalancer \
    stable/nginx-ingress

If we set the image version back to < 0.18.0, we get load balanced requests.

@ElvinEfendi
Copy link
Member

@wstrange can you post a minimal Ingress manifest to reproduce this?

@wstrange
Copy link
Author

wstrange commented Sep 12, 2018

I'm trying to replicate this. It looks like it does not happen with http. Something to do with https / ssl. I'll keep testing.

Update: Can't replicate yet with a simple test headers app, even on SSL. Sigh..

@StaffanSvensson-playtech

We've also encountered exactly the same symptoms on 2 seperate occations last two weeks where all our load goes to a single pod when using session affinity. We have not experienced this in versions prior to 0.18.0 from what I remember.

Is there some changes done to how session affinity is handled in later versions? I cant seem to see anything about it in the release notes.

We are currently also unable to reproduce this so it's hard to find the root cause of it.
Any ideas what might be the issue here?

@wstrange
Copy link
Author

We can not replicate this with a simple echo headers application, but see it in a more complex deployment of our Java application.

What is the logic used to calculate the backend pod to steer the session to? This might help us to narrrow down how this happens.

@svenbs
Copy link

svenbs commented Sep 24, 2018

We have the same problem with a 3 pod deployment and http loadbalancing. Sometimes (not always) one pod does not recieve any http traffic. The traffic is instead sent to one of the remaining pods.
We recieve 3 different INGRESSCOOKIEs, but two of them proxy to the same pod. Even if no cookie is set on the request the pod in question is not recieving any requests.

We assume this problem is the same here and is introduced by the dynamic-configuration of backends in https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.18.0.
Disabling this feature by adding --enable-dynamic-configuration=false to the args-section solved this issue for us.

Of course this is just a workaround and we would like to solve the underlying issue with the lua balancer.

@floriantraber
Copy link

We are experiencing the same problem (on 0.19.0 and 0.20.0).

Thank you @svenbs for suggesting disabling dynamic configuration. Using --enable-dynamic-configuration=false gives us the desired results.

We did however experience an issue that the generated cookie is different in the domain with dynamic configuration enabled. With dynamic configuration the domain of the cookie is .somedomain.com, while without dynamic configuration the domain of the cookie is somedomain.com. Having these two cookie results in non-desired behavior See Image.
image 1

@wstrange
Copy link
Author

Given that others are seeing this issue, and it seems to be hard to reproduce with a simple echoheaders sample, is there any way that more debug / diagnostic information can be logged to show how the dynamic configuration module arrives at decisions on pod backends.

I am guessing that there is some timing issue. i.e. some pod is ready before others, or briefly reports not live, etc.

wajihahmed pushed a commit to wajihahmed/forgeops that referenced this issue Oct 23, 2018
wajihahmed added a commit to ForgeRock/forgeops that referenced this issue Oct 23, 2018
* Added workaround for bug kubernetes/ingress-nginx#3056

* Changes to support CLOUD-855.  Note the change in location of the keystore and password store.  They are not directly picked up from the secrets mounts.
@ElvinEfendi
Copy link
Member

anyone having this issue please try

quay.io/kubernetes-ingress-controller/nginx-ingress-controller:dev

@ElvinEfendi
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@ElvinEfendi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nelsonfassis
Copy link

@ElvinEfendi To which tag will this fix be deployed?
--enable-dynamic-configuration=false fixed the problem for us using the 0.18 but I'm afraid that upgrading it to a newer version without the fix would break the deployment again

@ElvinEfendi
Copy link
Member

@ElvinEfendi To which tag will this fix be deployed?

@nelsonfassis it will be included in 0.21.0

@wstrange
Copy link
Author

wstrange commented Dec 6, 2018

We are still seeing this issue on 0.21.0

Anyone else?

@StaffanSvensson-playtech
Copy link

StaffanSvensson-playtech commented Jan 21, 2019

Yes, we have the same issue on 0.22.0 as well.

@svenbs
Copy link

svenbs commented Jan 24, 2019

We're still seeing this on 0.21.0.

@qi-min
Copy link

qi-min commented Feb 28, 2019

Me too, I see same issue in 0.22.0
When I created ingress, everything is ok. But after a few minutes (doing nothing), the nginx start to route all request to same pod.

@ElvinEfendi
Copy link
Member

Can you try the latest version quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0 and see whether this still is an issue.

@qi-min
Copy link

qi-min commented Mar 1, 2019

@ElvinEfendi After updating to 0.23.0, I do not see this problem for now. Very appreciate for your suggestion.

@ElvinEfendi
Copy link
Member

@m7luffy you are welcome! In that case most likely the bug was related to #3809 (comment) that got fixed in 0.23.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants