Cleanly draining envoy with DaemonSet/NLB deployment #145

cmaloney · 2018-01-13T01:05:43Z

sometimes it's necessary to re-deploy newer versions of contour / envoy into the cluster, and that involves killing off the running containers and replacing them with new ones.

If I'm running with an AWS ALB or classic load balancer I could probably add http healthchecks for the envoy health check endpoint + pod lifecycle hooks to make envoy stat failing it's healthcheck so it can be cleanly removed from the nlb without losing traffic

For my deployment to get HTTP2 and minimal hops, I'm using the DaemonSet / NLB deployment of contour and envoy. The NLB does TCP forwarding to contour. Currently, it healthchecks the port of contour it's forwarding traffic to (8080 or 8443). When I want / need to replace envoy, the pod is killed, envoy stops listening, but not long enough that the AWS NLB fails health checks and redirects traffic to the other host without having some lost traffic.

I'd like to help contribute closing this gap so that envoy and contour can safely / cleanly be upgraded without loss of traffic.

Two thoughts on how to do this:

Figure out how to get it so the pods can be replaced with the envoy "hot restart" pieces (some form of https://www.envoyproxy.io/docs/envoy/latest/operations/hot_restarter) without dropping any traffic
Teach contour to start an additional listener on a configurable port which is the "healthcheck" listener. All AWS NLBs get configured to use that port for whether a given instance should be used. When contour recieves a SIGTERM as part of it's lifecycle hooks (https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods) it shuts down this extra listener, which should cause health checks to fail and the removal of the instance from new traffic through the AWS NLB. After a configurable number of seconds from recieving SIGTERM, contour should put envoy into its "drain down" state where it stops accepting incoming connections / drains every existing connection. After the grace period for the pod finally comes up, or if envoy is fully drained, then the pod is killed and remaining pieces die non-gracefully / have their connections cut off.

I think the second option is easier to implement and get working correctly in all cases than the first, the first I see potential issues with the envoy<->contour communication conflicting between "new" and old pods, as well as how you cleanly roll out a replacement of an existing daemon set

The text was updated successfully, but these errors were encountered:

davecheney · 2018-01-13T01:55:52Z

Thanks for raising this issue. There's a PR in fly for adding a healthz to envoy and contour, #135.

cmaloney · 2018-01-13T02:34:13Z

Healthz alone wouldn't solve it though with an NLB, since there are only TCP health checks available for a TCP NLB, but healthz on a extra port could definitely work

davecheney · 2019-03-12T07:13:27Z

This issue has been p1 without a milestone for over a year. I need to get product input on before we can prioritise it.

stevesloka · 2019-07-18T03:27:19Z

We do now have the preStop hooks enabled. When an Envoy pod is starting to shut down, the hook should fire telling Envoy to drain all connections: https://github.com/heptio/contour/blob/master/examples/ds-hostnet-split/03-envoy.yaml#L60-L63

rochacon · 2019-07-18T06:02:21Z

@stevesloka I've been testing with this recently and it seems like only telling Envoy to stop accepting connections is not enough, since Kubernetes will send the SIGTERM immediately after running the preStop hook.

I've somewhat found a stable solution with this configuration changes:

Configured terminationGracePeriods: 60 for Envoy pod. This is the drain limit I want to support in my cluster.
Added --drain-time-s 60 to Envoy container, default is 600.
In the Envoy container, configured readinessProbe.periodSeconds: 3, resulting in more frequent readiness checks.
Added sleep 20 before and sleep 40 after the Envoy preStop command [1]. The initial sleep is to give time between rejecting connections and pod's Terminating status propagation. Then we tell Envoy to start rejecting connections by failing healthchecks. The final sleep holds SIGTERM until the pod is safely out of the service Endpoints and external LB health checks.
Configured Service.spec.externalTrafficPolicy: Local, this configures a Kubelet managed port (service.spec.healthCheckNodePort) that returns HTTP 200 if the Envoy pod is running on that node. @cmaloney this config has the behavior you described on Option 2, it works for both LoadBalancer and NodePort services, it is an easy way to avoid the extra network hop and maintain the original connection IP.
Split Contour and Envoy pods
Reduced AWS LB health check intervals to a minimum. IIRC, I'm using 2 unhealthy checks as threshold with 5s period. AWS LB target group draining settings must be below or equal your draining timeout (in my case, 60s).

With this I managed to get zero downtime Envoy rollouts on top of AWS NLB in a few simple tests (running hey against the NLB, the service below Envoy is a pod that sleeps for 1s before returning the HTTP response).

[1]

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 20 ; wget -qO- --post-data='' http://127.0.0.1:9001/healthcheck/fail; sleep 40"]

stevesloka · 2019-07-18T13:40:22Z

Fantastic notes @rochacon!! Would you like to contribute this back as a doc or update our examples? If not I'd like to add this to the repo to help out others.

@cmaloney do you see this as maybe solving your original issue?

rochacon · 2019-07-18T17:28:43Z

@stevesloka sure, I'm up for it, I didn't do it before because I think it is still a bit ugly and I want to do more tests.

cmaloney · 2019-07-20T23:32:21Z

I'm not using contour in current envs (now at a different company than where I was using it) / don't have pieces to particularly test

rochacon · 2019-07-30T14:14:13Z

I've been testing the mentioned configurations in an attempt to clean up and confirm the minimum required steps, still need a bit more checks before issuing the patch.

During the tests, I found out that NLBs provisioned through Kubernetes services objects have fixed healthcheck settings (30s interval, 3 unhealthy threshold, tested on v1.12.10), it ignores the current annotations for customization of this behavior, I believe this is due to the alpha/beta state of NLB integration. Not sure if this was improved in more recent Kubernetes versions or when using the external cloud-provider. Configuring the Envoy service as NodePort and managing the NLB externally is a way to customize this settings (I use this setup because I change Envoy pools and prefer to avoid replacing the NLB, swaping DNS, waiting propagation and etc).

Right now, I've achieved good results with the following settings on top of current examples:

Set terminationGracePeriodSeconds: 120 to the Envoy pod
Add sleep 118 after the Envoy /healthcheck/fail notification [1]
Added --drain-time-s 60 to Envoy container (my desired request time limit is 60s, regardless of Envoy rollouts)

This gives enough time for Kubernetes components and the NLB to notice the Unready status and remove the node/pod from the target group instances and delays the SIGTERM that will shutdown Envoy. It also follows Lyft rollout practices as described at envoyproxy/envoy#1990 (comment)

[1]

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "wget -qO- --post-data='' http://127.0.0.1:9001/healthcheck/fail; sleep 118"]

alexbrand · 2019-08-22T19:53:23Z

I have been experimenting with this as well, and I have a proposal for properly draining Envoy that builds on @rochacon's suggestion.

Currently in master, we have a preStop hook in the Envoy container that POSTs to the /healthcheck/fail endpoint. This tells Envoy that it needs to start draining connections. In addition, the failing healthcheck triggers the readiness probe to fail, which in turn tells the system to stop sending traffic to the Envoy that is shutting down.

The problem with the current preStop hook is that it does not wait until Envoy has drained all the connections. To address this, I think we could leverage the stats endpoint to create a while loop that sleeps until all the connections have been drained. Once the connections have been drained, the preStop hook returns and lets Kubernetes continue the pod shutdown process.

Curious what everyone thinks of this idea. I have a PoC of that I'd be happy to contribute.

davecheney · 2019-08-22T20:20:41Z

Ping @youngnick who has strong opinions on this one.

…

On 23 Aug 2019, at 05:53, Alexander Brand ***@***.***> wrote: I have been experimenting with this as well, and I have a proposal for properly draining Envoy that builds on @rochacon's suggestion. Currently in master, we have a preStop hook in the Envoy container that POSTs to the /healthcheck/fail endpoint. This tells Envoy that it needs to start draining connections. In addition, the failing healthcheck triggers the readiness probe to fail, which in turn tells the system to stop sending traffic to the Envoy that is shutting down. The problem with the current preStop hook is that it does not wait until Envoy has drained all the connections. To address this, I think we could leverage the stats endpoint to create a while loop that sleeps until all the connections have been drained. Once the connections have been drained, the preStop hook returns and lets Kubernetes continue the pod shutdown process. Curious what everyone thinks of this idea. I have a PoC of that I'd be happy to contribute. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

youngnick · 2019-08-23T03:56:14Z

I really like the stats watch idea, that's really good, much better than guessing timeouts. I'd be very interested to see the PoC.

alexbrand · 2019-08-23T15:29:09Z

Yeah, I totally agree.

What I ended up doing was adding a sidecar to handle this. The sidecar sends the POST request to Envoy, and then starts watching the stats endpoint. Once the drain is complete, it creates a file in the already-existing config volume which signals Envoy that the drain is complete.

      - name: graceful-terminator
        image: busybox
        command:
          - ash
          - -c
          - while [[ ! -f /config/drain-complete ]]; do sleep 1; done; echo "Drain Complete"
        lifecycle:
          preStop:
            exec:
              command:
                - ash
                - -c
                - >
                  wget -q -O- --post-data "" localhost:9001/healthcheck/fail;
                  while [[ $(wget -q -O- localhost:9001/stats | grep http.ingress_http.downstream_cx_active | awk '{print $2}') != 0 ]];
                  do echo "waiting until active connections are drained"; sleep 1; done;
                  touch /config/drain-complete;
        volumeMounts:
          - mountPath: /config
            name: contour-config

The envoy preStop hook waits until the file appears:

        lifecycle:
          preStop:
            exec:
              command:
              - bash
              - -c
              - while [[ ! -f /config/drain-complete ]]; do echo "waiting for drain confirmation"; sleep 1; done;

The end result is:

Envoy remains running until the preStop hook exits, letting requests/connections drain gracefully
Envoy's preStop hook exits when it receives the signal from the termination sidecar
The termination sidecar sends the signal when the number of active connections is zero
No new connections are established because the readinessProbe is failing

All of this is governed by the terminationGracePeriodSeconds parameter in the pod spec.

Would love to upstream this if we think it is a reasonable approach.

stevesloka · 2019-08-23T15:36:06Z

Would a small go program make this impl simpler? I like the idea but worry about all the code inline in the yaml definition. We could add tests, etc to the go code and maybe open up for other types of hooks that we might need in the future to manage Envoy's state.

alexbrand · 2019-08-23T17:04:12Z

That is certainly another option!

youngnick · 2019-08-26T06:53:54Z

I like the use of the config volume for signal-passing, that's neat.

I agree that a small go program that handles SIGTERM and then watches the Envoy stats until it's drained would be very clean, but doing it in bash is definitely a good start.

davecheney · 2019-08-28T06:27:23Z

One possibility would be to manage this cordoning via Contour. Now envoy passes its $POD_NAME we have the ability to customise the configuration sent to each Envoy should we choose. If, for the contour we wanted to isolate, we sent an empty LDS table this would put the current listeners into draining mode (600s default timeout) and immediately close the current accepting sockets. That would prevent, possibly abruptly, establishing more connections via the cordoned envoy.

…

On Mon, 26 Aug 2019 at 16:53, Nick Young ***@***.***> wrote: I like the use of the config volume for signal-passing, that's neat. I agree that a small go program that handles SIGTERM and then watches the Envoy stats until it's drained would be very clean, but doing it in bash is definitely a good start. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#145>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAABYAZJTZ46N4OBC7AFATTQGN4YHANCNFSM4ELUE6IA> .

youngnick · 2019-10-25T02:00:02Z

Next steps for this issue:

Investigate using a Go program (or a Contour subcommand) for draining Envoy.
Investigate manipulating LDS tables for draining too.

On to the backlog for now.

Signed-off-by: Daneyon Hansen <[email protected]>

davecheney added this to the 1.0.0.beta.1 milestone Jun 18, 2019

rochacon mentioned this issue Jul 18, 2019

Upgrade to Envoy v1.11.0 #1242

Closed

rochacon mentioned this issue Jul 30, 2019

Envoy v1.11: Investigate using /ready endpoint for probes #1277

Closed

davecheney modified the milestones: 1.0.0-beta.1, 1.0.0-rc.1 Sep 5, 2019

davecheney removed this from the 1.0.0-rc.1 milestone Sep 29, 2019

davecheney added this to the 1.0.0-rc.2 milestone Sep 29, 2019

stevesloka assigned davecheney Oct 25, 2019

youngnick unassigned davecheney Oct 25, 2019

youngnick modified the milestones: 1.0.0-rc.2, Backlog Oct 25, 2019

youngnick mentioned this issue Jan 12, 2020

certs: Rotate certs in 1-year #1856

Closed

stevesloka modified the milestones: Backlog, 1.2.0 Jan 15, 2020

stevesloka self-assigned this Jan 21, 2020

stevesloka mentioned this issue Jan 29, 2020

design: Design doc for Envoy Shutdown logic #2177

Merged

youngnick mentioned this issue Feb 4, 2020

Add Contour Operator Project #2187

Closed

jpeach added the area/deployment Issues or PRs related to deployment tooling or infrastructure. label Feb 10, 2020

stevesloka mentioned this issue Feb 13, 2020

cmd/contour: Envoy Shutdown Manager #2227

Merged

stevesloka closed this as completed in #2227 Feb 20, 2020

imranismail mentioned this issue Jun 23, 2020

Support for draining node during pod termination emissary-ingress/emissary#1473

Closed

sunjayBhatia pushed a commit that referenced this issue Jan 30, 2023

Splits Test Jobs (#145)

319e439

Signed-off-by: Daneyon Hansen <[email protected]>

skonto mentioned this issue Mar 11, 2024

Gracefully drain connections when stopping the gateway knative-extensions/net-kourier#1203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanly draining envoy with DaemonSet/NLB deployment #145

Cleanly draining envoy with DaemonSet/NLB deployment #145

cmaloney commented Jan 13, 2018

davecheney commented Jan 13, 2018

cmaloney commented Jan 13, 2018

davecheney commented Mar 12, 2019

stevesloka commented Jul 18, 2019

rochacon commented Jul 18, 2019 •

edited

Loading

stevesloka commented Jul 18, 2019

rochacon commented Jul 18, 2019

cmaloney commented Jul 20, 2019

rochacon commented Jul 30, 2019

alexbrand commented Aug 22, 2019

davecheney commented Aug 22, 2019 via email

youngnick commented Aug 23, 2019

alexbrand commented Aug 23, 2019

stevesloka commented Aug 23, 2019

alexbrand commented Aug 23, 2019

youngnick commented Aug 26, 2019

davecheney commented Aug 28, 2019 via email

youngnick commented Oct 25, 2019

Cleanly draining envoy with DaemonSet/NLB deployment #145

Cleanly draining envoy with DaemonSet/NLB deployment #145

Comments

cmaloney commented Jan 13, 2018

davecheney commented Jan 13, 2018

cmaloney commented Jan 13, 2018

davecheney commented Mar 12, 2019

stevesloka commented Jul 18, 2019

rochacon commented Jul 18, 2019 • edited Loading

stevesloka commented Jul 18, 2019

rochacon commented Jul 18, 2019

cmaloney commented Jul 20, 2019

rochacon commented Jul 30, 2019

alexbrand commented Aug 22, 2019

davecheney commented Aug 22, 2019 via email

youngnick commented Aug 23, 2019

alexbrand commented Aug 23, 2019

stevesloka commented Aug 23, 2019

alexbrand commented Aug 23, 2019

youngnick commented Aug 26, 2019

davecheney commented Aug 28, 2019 via email

youngnick commented Oct 25, 2019

rochacon commented Jul 18, 2019 •

edited

Loading