Support for draining node during pod termination #1473

gavinbunney · 2019-04-26T21:01:41Z

Please describe your use case / problem.

We have a legacy very-stateful application (tomcat/java), which needs sticky sessions. When we deploy new versions of our applications, or a scale down event happens, we need to stop sending new connections to a server, while sending bound sessions to the old server. Please note: this is not referring to in-flight requests, we're needing the active tomcat sessions to expire, which normally takes about an hour.

We have this working currently in Kubernetes with HAProxy-Ingress by setting its drain-support flag, which drains a pod when it transitions to Terminating, but keeps existing sessions attached to that pod. We then have a preStop hook which blocks the shutdown until users have finished their sessions.

When looking at Ambassador to fulfill this ingress role, we see the pod being removed from routing as soon as it transitions to Terminating. This was tested with a simple application returning the hostname of the pod:

Getting two sessions against different pods - Pod A and Pod B
Scale the deployment down to 1 - Pod A moved to Terminating, Pod B still active
Session immediately switched over to the remaining pod - that is, Pod A session is dropped

Describe the solution you'd like

We'd love for ambassador to drain a pod when it is in Terminating, and keep existing connections alive, whilst routing new connections to other active Pods.

Additional context

Looking at the Kubernetes docs, it looks like they are dropping the pods from the Service registry, as soon as they pod moves to terminating. As per: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

3. Pod shows up as “Terminating” when listed in client commands

4. (simultaneous with 3) When the Kubelet sees that a Pod has been marked as terminating because the time in 2 has been set, it begins the pod shutdown process.
...
5. (simultaneous with 3) Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly cannot continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.

Test Details

Ambassador version: 0.60.1

Ambassador Service Configuration

getambassador.io/config: |
      ---
      apiVersion: ambassador/v1
      kind: KubernetesEndpointResolver
      name: my-resolver
      ---
      apiVersion: ambassador/v1
      kind:  Module
      name:  ambassador
      config:
        resolver: my-resolver
        load_balancer:
          policy: round_robin

Ambassador Target Service Configuration:

apiVersion: v1
kind: Service
metadata:
  name: sticky
  labels:
    app: sticky
  annotations:
    getambassador.io/config: |
      ---
      apiVersion: ambassador/v1
      kind:  Mapping
      name:  sticky_mapping
      prefix: /sticky/
      service: sticky
      resolver: my-resolver
      load_balancer:
        policy: maglev
        cookie:
          name: sticky-cookie
          ttl: 300s
spec:
  ports:
  - name: http
    port: 80
  selector:
    app: sticky
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sticky
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sticky
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
      labels:
        app: sticky
    spec:
      terminationGracePeriodSeconds: 120
      containers:
      - name: hello
        image: nginxdemos/hello
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sleep
              - '60'
        ports:
        - containerPort: 80

The text was updated successfully, but these errors were encountered:

gavinbunney · 2019-04-26T21:02:55Z

Related issue for nginx ingress controller: nginxinc/kubernetes-ingress#267

richarddli · 2019-04-29T18:29:59Z

Somewhat related use-case cc: @AaronTriplett:

Could someone answer the following question regarding the use of ring_hash load-balancing with a header? If we’re using a header (ex: user-id) to create session affinity between a specific user and a pod, is the affinity immediately cleared once that pod is downed by k8s (in case of a failure or new deployment)?

We’re using multiple long-lived gRPC connections for each user (Chat app) and have the idea of using session affinity to ensure any connections for a specific user end up on the same pod. When set up in this fashion, it would allow us to have those server / pod instances reroute data to the appropriate pod in which is affinitzed to a specific user by setting the user-id header to the “destination” user and essentially calling their own route through Ambassador.

It seems like its worth a try, but I want to ensure that if a pod is downed, all of the user-id header values affinitized to said pod get evicted and the next incoming request with that header is forwarded and affinitized to a new pod. (edited)

Aaron Triplett [11:45 PM]
Looks like I found the answer in an envoy issue: envoyproxy/envoy#2819
Long-lived connections would stay connected but new requests would hit a new pod. Split-brain issue, and likely not a good idea until Envoy has connection termination support.

AaronTriplett · 2019-05-07T07:00:50Z

I've also opened this on the envoy side: envoyproxy/envoy#6730

AaronTriplett · 2019-05-07T07:01:45Z

Wanted to note this needs to happen on any host change, scaling up, scaling down, and pod termination or failure.

kflynn · 2019-05-09T17:21:16Z

I'm estimating this at 5 story points because it involves Envoy and Kubernetes and Ambassador. Honestly fitting this into parc-güell may be a bit aggressive.

concaf · 2019-06-09T07:49:57Z

I'm currently working on this, and this is at a state where the terminating pods have made their way to Envoy - but indicating Envoy that this connection is being drained is proving to be a problem.

I've opened an issue in Envoy regarding this at envoyproxy/envoy#7218

Also, from Slack -

I've tried the following approaches -

Whenever a pod goes into terminating, Ambassador captures it and puts the terminating pod's IP right back there with the endpoints with -
1. setting `load_balancing_weight: 1` (allowed values are 1-128) - but this does not works as expected! The new requests are not routed to this IP - but the older sticky sessions do not make their way to the terminating pod. I _thought_ this was working but scaling the pods to more than 2 busted this myth. I'm surprised as this is the implementation used by haproxy-ingress - but envoy behaves differently.

2. setting `'health_status': 'DRAINING'` for the endpoint (See https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/health_check.proto#envoy-api-enum-core-healthstatus). This makes envoy believe that the endpoint is unhealthy so no requests are routed to this, but again, the older sticky sessions do not persist.

I've tested with websocket connections as well as normal ones, but no luck. Now there could be something wrong with my testing as well. In subsequent `curl` requests from the command line, `curl` does not re-use the same connection again - which I think is fine as long as the sticky conditions are met i.e. the right cookie is set. Is the right?

Also, I've noticed that when a pod with a preStop hook preventing it to die goes into terminating state, the connection between host and the pod remains intact even though Ambassador (and hence Envoy) removes the host from the endpoint list - which makes me think that this issue is for subsequent sticky connections than for single persistent connections.

Any leads/pointers appreciated.

stale · 2019-08-17T15:24:12Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

richarddli · 2019-08-17T15:52:11Z

not stale

stale · 2019-10-16T16:38:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

concaf · 2019-10-17T07:51:26Z

not stale

stale · 2019-12-16T08:28:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jwm · 2019-12-16T13:37:48Z

not stale

jwm · 2020-04-16T01:41:17Z

not stale

stale · 2020-06-15T03:31:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

daveharding · 2020-06-15T08:01:13Z

not stale

hugocarrasco · 2020-06-17T15:29:30Z

Hi, Would this issue include other active connections? Like draining TCP connections

imranismail · 2020-06-23T03:54:05Z

Contour already has this and their implementation seems pretty solid projectcontour/contour#145

psalaberria002 · 2020-07-30T21:34:09Z

Any updates on this one? I see Contour has added a shutdown manager sidecar for handling clean shutdowns in https://github.com/projectcontour/contour/pull/2227/files . I tried to to add a preStop hook for draining connections and failing envoy healthchecks, but 8877/ambassador/v0/check_ready was still reporting 200 OK, while 8001/ready was reporting 503s (as it should). I also tried setting the health checks in 8001/ready, but it seems Envoy's admin port does not bind to 0.0.0.0.

rbtcollins · 2021-11-09T16:42:24Z

We have a patch we'll put up soon for this in legacy mode, but non-legacy is looking a bit tricky, because dgroup makes assumptions - in particular that there are no actions accessible to clients of dgroup during graceful shutdowns: https://github.com/datawire/dlib/blob/master/dgroup/group.go#L294

One possible way is to permit fallible user supplied calls to be called-back-to by the graceful shutdown logic. But also dgroup explicitly doesn't support dependencies (https://github.com/datawire/dlib/blob/master/dgroup/group.go#L19) I note ambex that similarly needs to manage TERM logic and doesn't use dgroup.

Another possibility would be a config option that disables signal handling for TERM, allowing the TERM logic to be written outside of dgroup, rather than the all-or-nothing signal handling present today.

Some guidance on this would be needed before we start investing in coding.

LukeShu · 2021-11-09T17:24:51Z

@rbtcollins

Yeah, I've been growing increasingly dissatisfied with dgroup's signal
handling. Not so much as an excuse, but as explanation: It's more or
less exactly what the amb-sidecar part of AES needed for signal
handling, since dgroup was originally factored out from AES.

That said, I'm not sure I follow what you need dgroup to do that it
won't let you. The way to have graceful shutdown code would be have a
goroutine that just blocks on <-ctx.Done() before doing anything:

grp.Go("drainer", func(ctx context.Context) error {
	<-ctx.Done() // wait for graceful shutdown to start
	ctx = dcontext.HardContext(ctx)

	// graceful shutdown code goes here

})

That said, if you're still unsure, but you've got a working patch for
legacy mode, I'd say to go ahead and just PR that to release/v1.14
(legacy mode is "gone" in master and release/v2.0). From there,
I'd happily do/assist in porting it over to non-legacy mode and v2.

rbtcollins · 2021-11-09T19:13:13Z

Ah, thanks for the hint. Yes, we'll get the legacy patch up in a day or two , just sorting attribution stuff internally and validating the most recent forward port - its currently against master, since all the code is still there as of late last week, but we'll see :)

LukeShu · 2021-11-10T17:27:46Z

A lot of the legacy-mode code is still on master, but a lot of it has been deleted (in particular: cmd/watt and pkg/supervisor).

rbtcollins · 2021-11-10T22:15:49Z

@LukeShu the problem is that dexec.CommandContext kills off things when the graceful shutdown starts. Which is the opposite of graceful when dealing with longer lived sockets in a subprocess.

We could perhaps write a layered Context.Background() :

    group.Go("envoy", func(ctx context.Context) error {
        child_ctx,cancel  := context.WithCancel(Context.Background())
        go runEnvoy(ctx, envoyHUP)
        <- ctx.Done()
        ctx = dcontext.HardContext(ctx)
        ... graceful drain of envoy
       cancel()
    })

This might also need the same done for ambex, to ensure that envoy doesn't get configuration errors during draining.

Thoughts?

rbtcollins · 2021-11-11T17:26:50Z

Righto, PR up at #3926

rbtcollins · 2021-11-16T12:46:24Z

@LukeShu ping [on the PR and the golang thoughts here]

marianafranco · 2021-11-30T18:26:09Z

Any updates on this one? We were seeing 5XXs during pod restarts which we were able to fix by calling localhost:8001/drain_listeners?graceful on a preStop lifecycle, but we prefer to have this handled/fixed by ambassador. We are using the AMBASSADOR_LEGACY_MODE.

KatieLo · 2021-11-30T18:54:57Z

Hey @marianafranco and thread, apologies for the delay, the team will get back to you on this shortly!

rbtcollins · 2021-11-30T20:39:08Z

@marianafranco you could use our patch from #3926 for legacy mode. We haven't got a patch for non-legacy together yet.

marianafranco · 2021-12-04T20:29:44Z

@rbtcollins Thanks for this fix! I tested it in one of our staging environments and it's working (no 5XXs during pod restarts/termination) when AMBASSADOR_LEGACY_MODE=true is set.

Unfortunately, I was mistaken and we are not using the AMBASSADOR_LEGACY_MODE in production today. Do you know what would be the downside of move back to the AMBASSADOR_LEGACY_MODE vs the default one (we are using v1.14.2)?

rbtcollins · 2021-12-13T08:57:16Z

@rbtcollins Thanks for this fix! I tested it in one of our staging environments and it's working (no 5XXs during pod restarts/termination) when AMBASSADOR_LEGACY_MODE=true is set.

Unfortunately, I was mistaken and we are not using the AMBASSADOR_LEGACY_MODE in production today. Do you know what would be the downside of move back to the AMBASSADOR_LEGACY_MODE vs the default one (we are using v1.14.2)?

I don't know - needless to say we're using legacy mode just fine :).

mohitreddy1996 · 2022-05-31T03:08:05Z

Hi, any updates if this is supported on the non-legacy versions?

If not, curious to understand if switching to LEGACY mode will have any consequences.

cindymullins-dw · 2023-02-17T01:39:26Z

Hi @mohitreddy1996 , the docs indicate that AMBASSADOR_LEGACY_MODE is not recommended when performance is critical. Closing as it appears the original issue has been addressed.

goodgravy · 2023-07-22T06:04:27Z

@cindymullins-dw I'm not sure this has been addressed…

The patch @rbtcollins shared is for legacy mode only, and as you point out legacy mode is not recommended if one needs high performance.

richarddli assigned concaf Apr 26, 2019

kflynn unassigned concaf Apr 29, 2019

concaf self-assigned this May 8, 2019

kflynn added this to the parc-güell milestone May 9, 2019

kflynn modified the milestones: parc-güell, pedrera May 13, 2019

kflynn modified the milestones: pedrera, sagrada-familia Jun 6, 2019

concaf mentioned this issue Jun 10, 2019

Unable to drain endpoints before removal envoyproxy/envoy#7218

Open

kflynn modified the milestones: sagrada-familia, santa-cruz Jun 11, 2019

stale bot added the stale Issue is stale and will be closed label Aug 17, 2019

stale bot removed the stale Issue is stale and will be closed label Aug 17, 2019

stale bot added the stale Issue is stale and will be closed label Oct 16, 2019

stale bot removed the stale Issue is stale and will be closed label Oct 17, 2019

stale bot added the stale Issue is stale and will be closed label Dec 16, 2019

stale bot removed the stale Issue is stale and will be closed label Dec 16, 2019

stale bot removed the stale Issue is stale and will be closed label Apr 16, 2020

stale bot added the stale Issue is stale and will be closed label Jun 15, 2020

stale bot removed the stale Issue is stale and will be closed label Jun 15, 2020

concaf added the pinned Issue cannot be marked stale label Jul 31, 2020

rbtcollins mentioned this issue Nov 11, 2021

Update AMBASSADOR_LEGACY_MODE to do a graceful shutdown. #3926

Closed

4 tasks

khussey added the legacy Problems that require legacy mode to work around label Dec 2, 2021

concaf removed their assignment Dec 3, 2021

khussey added this to the 2022 Cycle 1 milestone Jan 4, 2022

cindymullins-dw closed this as completed Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for draining node during pod termination #1473

Support for draining node during pod termination #1473

gavinbunney commented Apr 26, 2019 •

edited

Loading

gavinbunney commented Apr 26, 2019

richarddli commented Apr 29, 2019 •

edited

Loading

AaronTriplett commented May 7, 2019

AaronTriplett commented May 7, 2019

kflynn commented May 9, 2019

concaf commented Jun 9, 2019

stale bot commented Aug 17, 2019

richarddli commented Aug 17, 2019

stale bot commented Oct 16, 2019

concaf commented Oct 17, 2019

stale bot commented Dec 16, 2019

jwm commented Dec 16, 2019

jwm commented Apr 16, 2020

stale bot commented Jun 15, 2020

daveharding commented Jun 15, 2020

hugocarrasco commented Jun 17, 2020

imranismail commented Jun 23, 2020

psalaberria002 commented Jul 30, 2020

rbtcollins commented Nov 9, 2021

LukeShu commented Nov 9, 2021

rbtcollins commented Nov 9, 2021

LukeShu commented Nov 10, 2021

rbtcollins commented Nov 10, 2021

rbtcollins commented Nov 11, 2021

rbtcollins commented Nov 16, 2021

marianafranco commented Nov 30, 2021

KatieLo commented Nov 30, 2021

rbtcollins commented Nov 30, 2021

marianafranco commented Dec 4, 2021 •

edited

Loading

rbtcollins commented Dec 13, 2021

mohitreddy1996 commented May 31, 2022

cindymullins-dw commented Feb 17, 2023

goodgravy commented Jul 22, 2023

Support for draining node during pod termination #1473

Support for draining node during pod termination #1473

Comments

gavinbunney commented Apr 26, 2019 • edited Loading

gavinbunney commented Apr 26, 2019

richarddli commented Apr 29, 2019 • edited Loading

AaronTriplett commented May 7, 2019

AaronTriplett commented May 7, 2019

kflynn commented May 9, 2019

concaf commented Jun 9, 2019

stale bot commented Aug 17, 2019

richarddli commented Aug 17, 2019

stale bot commented Oct 16, 2019

concaf commented Oct 17, 2019

stale bot commented Dec 16, 2019

jwm commented Dec 16, 2019

jwm commented Apr 16, 2020

stale bot commented Jun 15, 2020

daveharding commented Jun 15, 2020

hugocarrasco commented Jun 17, 2020

imranismail commented Jun 23, 2020

psalaberria002 commented Jul 30, 2020

rbtcollins commented Nov 9, 2021

LukeShu commented Nov 9, 2021

rbtcollins commented Nov 9, 2021

LukeShu commented Nov 10, 2021

rbtcollins commented Nov 10, 2021

rbtcollins commented Nov 11, 2021

rbtcollins commented Nov 16, 2021

marianafranco commented Nov 30, 2021

KatieLo commented Nov 30, 2021

rbtcollins commented Nov 30, 2021

marianafranco commented Dec 4, 2021 • edited Loading

rbtcollins commented Dec 13, 2021

mohitreddy1996 commented May 31, 2022

cindymullins-dw commented Feb 17, 2023

goodgravy commented Jul 22, 2023

gavinbunney commented Apr 26, 2019 •

edited

Loading

richarddli commented Apr 29, 2019 •

edited

Loading

marianafranco commented Dec 4, 2021 •

edited

Loading