diff --git a/examples/k8s-health-check/README.md b/examples/k8s-health-check/README.md index f5560684b..0413bdf78 100644 --- a/examples/k8s-health-check/README.md +++ b/examples/k8s-health-check/README.md @@ -18,17 +18,18 @@ localhost with three endpoints: - `/startup`: Returns 200 status when the proxy has finished starting up. Otherwise returns 503 status. -- `/readiness`: Returns 200 status when the proxy has started, has available -connections if max connections have been set with the `--max-connections` -flag, and when the proxy can connect to all registered instances. Otherwise, -returns a 503 status. Optionally supports a min-ready query param (e.g., -`/readiness?min-ready=3`) where the proxy will return a 200 status if the -proxy can connect successfully to at least min-ready number of instances. If -min-ready exceeds the number of registered instances, returns a 400. - - `/liveness`: Always returns 200 status. If this endpoint is not responding, the proxy is in a bad state and should be restarted. +- `/readiness`: Returns 200 status when the proxy has started, has available + connections if max connections have been set with the `--max-connections` + flag, and when the proxy can connect to all registered instances. Otherwise, + returns a 503 status. Optionally supports a min-ready query param (e.g., + `/readiness?min-ready=3`) where the proxy will return a 200 status if the + proxy can connect successfully to at least min-ready number of instances. If + min-ready exceeds the number of registered instances, returns a 400. + + To configure the address, use `--http-address`. To configure the port, use `--http-port`. @@ -39,41 +40,41 @@ To configure the address, use `--http-address`. To configure the port, use # Recommended configurations for health check probes. # Probe parameters can be adjusted to best fit the requirements of your application. # For details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ -livenessProbe: - httpGet: - path: /liveness - port: 9090 - # Number of seconds after the container has started before the first probe is scheduled. Defaults to 0. - # Not necessary when the startup probe is in use. - initialDelaySeconds: 0 - # Frequency of the probe. Defaults to 10. - periodSeconds: 10 - # Number of seconds after which the probe times out. Defaults to 1. - timeoutSeconds: 5 - # Number of times the probe is allowed to fail before the transition from healthy to failure state. - # Defaults to 3. - failureThreshold: 1 -readinessProbe: - httpGet: - path: /readiness - port: 9090 - initialDelaySeconds: 0 - periodSeconds: 10 - timeoutSeconds: 5 - # Number of times the probe must report success to transition from failure to healthy state. - # Defaults to 1 for readiness probe. - successThreshold: 1 - failureThreshold: 1 startupProbe: - httpGet: - path: /startup - port: 9090 - periodSeconds: 1 - timeoutSeconds: 5 - failureThreshold: 20 + # We recommend adding a startup probe to the proxy sidecar + # container. This will ensure that service traffic will be routed to + # the pod only after the proxy has successfully started. + httpGet: + path: /startup + port: 9090 + periodSeconds: 1 + timeoutSeconds: 5 + failureThreshold: 20 +livenessProbe: + # We recommend adding a liveness probe to the proxy sidecar container. + httpGet: + path: /liveness + port: 9090 + # Number of seconds after the container has started before the first probe is scheduled. Defaults to 0. + # Not necessary when the startup probe is in use. + initialDelaySeconds: 0 + # Frequency of the probe. + periodSeconds: 60 + # Number of seconds after which the probe times out. + timeoutSeconds: 30 + # Number of times the probe is allowed to fail before the transition + # from healthy to failure state. + # + # If periodSeconds = 60, 5 tries will result in five minutes of + # checks. The proxy starts to refresh a certificate five minutes + # before its expiration. If those five minutes lapse without a + # successful refresh, the liveness probe will fail and the pod will be + # restarted. + failureThreshold: 5 +# We do not recommend adding a readiness probe under most circumstances ``` -2. Add `-use_http_health_check` and `-health-check-port` (optional) to your +2. Add `--http-address` and `--http-port` (optional) to your proxy container configuration under `command: `. > [proxy_with_http_health_check.yaml](proxy_with_http_health_check.yaml#L53-L76) @@ -103,3 +104,83 @@ args: - "--port=" - "" ``` + +### Readiness Health Check Configuration + +For most common usage, adding a readiness healthcheck to the proxy sidecar +container is unnecessary. An improperly configured readiness check can degrade +the application's availability. + +The proxy readiness probe fails when (1) the proxy used all its available +concurrent connections to a database, (2) the network connection +to the database is interrupted, (3) the database server is unavailable due +to a maintenance operation. These are transient states that usually resolve +within a few seconds. + +Most applications are resilient to transient database connection failures, and +do not need to be restarted. We recommend adding a readiness check to the +application container instead of the proxy container. The application can be +programmed to report whether it is ready to receive requests, and the healthcheck +can be tuned to restart the pod when the application is permanently stuck. + +You should use the proxy container's readiness probe when these circumstances +should cause k8s to terminate the entire pod: + +- The proxy can't connect to the database instances. +- The max number of connections are in use. + +When you do use the proxy pod's readiness probe, be sure to set the +`failureThreshold` and `periodSeconds` to avoid restarting the pod on frequent +transient failures. + +### Readiness Health Check Examples + +The DBA team performs database fail-overs drills without notice. A +batch job should fail if it cannot connect the database for 3 minutes. +Set the readiness check so that the pod will be terminated after 3 minutes +of consecutive readiness check failures. (6 failed readiness checks taken every 30 +seconds, 6 x 30sec = 3 minutes.) + +```yaml +readinessProbe: + httpGet: + path: /readiness + port: 9090 + initialDelaySeconds: 30 + # 30 sec period x 6 failures = 3 min until the pod is terminated + periodSeconds: 30 + failureThreshold: 6 + timeoutSeconds: 10 + successThreshold: 1 +``` + + +A web application has a database connection pool leak and the +engineering team can't find the root cause. To keep the system running, +the application should be automatically restarted if it consumes 50 connections +for more than 1 minute. + +```yaml + containers: + - name: my-application + image: gcr.io/my-container/my-application:1.1 + - name: cloud-sql-proxy + image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.0 + args: + # Set the --max-connections flag to 50 + - "--max-connections" + - "50" + - "--port=" + - "" +# ... + readinessProbe: + httpGet: + path: /readiness + port: 9090 + initialDelaySeconds: 10 + # 5 sec period x 12 failures = 60 sec until the pod is terminated + periodSeconds: 5 + failureThreshold: 12 + timeoutSeconds: 5 + successThreshold: 1 +``` diff --git a/examples/k8s-health-check/proxy_with_http_health_check.yaml b/examples/k8s-health-check/proxy_with_http_health_check.yaml index fa96fea5c..450661b19 100644 --- a/examples/k8s-health-check/proxy_with_http_health_check.yaml +++ b/examples/k8s-health-check/proxy_with_http_health_check.yaml @@ -99,7 +99,26 @@ spec: # Recommended configurations for health check probes. # Probe parameters can be adjusted to best fit the requirements of your application. # For details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ + startupProbe: + # The /startup probe returns OK when the proxy is ready to receive + # connections from the application. In this example, k8s will check + # once a second for 20 seconds. + # + # We strongly recommend adding a startup probe to the proxy sidecar + # container. This will ensure that service traffic will be routed to + # the pod only after the proxy has successfully started. + httpGet: + path: /startup + port: 9090 + periodSeconds: 1 + timeoutSeconds: 5 + failureThreshold: 20 livenessProbe: + # The /liveness probe returns OK as soon as the proxy application has + # begun its startup process and continues to return OK until the + # process stops. + # + # We recommend adding a liveness probe to the proxy sidecar container. httpGet: path: /liveness port: 9090 @@ -120,23 +139,22 @@ spec: # restarted. failureThreshold: 5 readinessProbe: + # The /readiness probe returns OK when the proxy can establish + # a new connections to its databases. + # + # Please use the readiness probe to the proxy sidecar with caution. + # An improperly configured readiness probe can cause unnecessary + # interruption to the application. See README.md for more detail. httpGet: path: /readiness port: 9090 - initialDelaySeconds: 0 + initialDelaySeconds: 10 periodSeconds: 10 - timeoutSeconds: 5 + timeoutSeconds: 10 # Number of times the probe must report success to transition from failure to healthy state. # Defaults to 1 for readiness probe. successThreshold: 1 - failureThreshold: 1 - startupProbe: - httpGet: - path: /startup - port: 9090 - periodSeconds: 1 - timeoutSeconds: 5 - failureThreshold: 20 + failureThreshold: 6 volumes: - name: secret: