Skip to content

Commit

Permalink
doc: Improve examples for k8s readiness probes. (#1759)
Browse files Browse the repository at this point in the history
This reorganizes and adds explanation to the startup and readiness probes.

Also, this updates our recommendation to only use a startup and liveness probe, but not a readiness probe under most circumstances.

Fixes #1757
  • Loading branch information
hessjcg authored Apr 19, 2023
1 parent ec56eba commit cb4f1e7
Show file tree
Hide file tree
Showing 2 changed files with 149 additions and 50 deletions.
161 changes: 121 additions & 40 deletions examples/k8s-health-check/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,18 @@ localhost with three endpoints:
- `/startup`: Returns 200 status when the proxy has finished starting up.
Otherwise returns 503 status.

- `/readiness`: Returns 200 status when the proxy has started, has available
connections if max connections have been set with the `--max-connections`
flag, and when the proxy can connect to all registered instances. Otherwise,
returns a 503 status. Optionally supports a min-ready query param (e.g.,
`/readiness?min-ready=3`) where the proxy will return a 200 status if the
proxy can connect successfully to at least min-ready number of instances. If
min-ready exceeds the number of registered instances, returns a 400.

- `/liveness`: Always returns 200 status. If this endpoint is not responding,
the proxy is in a bad state and should be restarted.

- `/readiness`: Returns 200 status when the proxy has started, has available
connections if max connections have been set with the `--max-connections`
flag, and when the proxy can connect to all registered instances. Otherwise,
returns a 503 status. Optionally supports a min-ready query param (e.g.,
`/readiness?min-ready=3`) where the proxy will return a 200 status if the
proxy can connect successfully to at least min-ready number of instances. If
min-ready exceeds the number of registered instances, returns a 400.


To configure the address, use `--http-address`. To configure the port, use
`--http-port`.

Expand All @@ -39,41 +40,41 @@ To configure the address, use `--http-address`. To configure the port, use
# Recommended configurations for health check probes.
# Probe parameters can be adjusted to best fit the requirements of your application.
# For details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
livenessProbe:
httpGet:
path: /liveness
port: 9090
# Number of seconds after the container has started before the first probe is scheduled. Defaults to 0.
# Not necessary when the startup probe is in use.
initialDelaySeconds: 0
# Frequency of the probe. Defaults to 10.
periodSeconds: 10
# Number of seconds after which the probe times out. Defaults to 1.
timeoutSeconds: 5
# Number of times the probe is allowed to fail before the transition from healthy to failure state.
# Defaults to 3.
failureThreshold: 1
readinessProbe:
httpGet:
path: /readiness
port: 9090
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 5
# Number of times the probe must report success to transition from failure to healthy state.
# Defaults to 1 for readiness probe.
successThreshold: 1
failureThreshold: 1
startupProbe:
httpGet:
path: /startup
port: 9090
periodSeconds: 1
timeoutSeconds: 5
failureThreshold: 20
# We recommend adding a startup probe to the proxy sidecar
# container. This will ensure that service traffic will be routed to
# the pod only after the proxy has successfully started.
httpGet:
path: /startup
port: 9090
periodSeconds: 1
timeoutSeconds: 5
failureThreshold: 20
livenessProbe:
# We recommend adding a liveness probe to the proxy sidecar container.
httpGet:
path: /liveness
port: 9090
# Number of seconds after the container has started before the first probe is scheduled. Defaults to 0.
# Not necessary when the startup probe is in use.
initialDelaySeconds: 0
# Frequency of the probe.
periodSeconds: 60
# Number of seconds after which the probe times out.
timeoutSeconds: 30
# Number of times the probe is allowed to fail before the transition
# from healthy to failure state.
#
# If periodSeconds = 60, 5 tries will result in five minutes of
# checks. The proxy starts to refresh a certificate five minutes
# before its expiration. If those five minutes lapse without a
# successful refresh, the liveness probe will fail and the pod will be
# restarted.
failureThreshold: 5
# We do not recommend adding a readiness probe under most circumstances
```

2. Add `-use_http_health_check` and `-health-check-port` (optional) to your
2. Add `--http-address` and `--http-port` (optional) to your
proxy container configuration under `command: `.
> [proxy_with_http_health_check.yaml](proxy_with_http_health_check.yaml#L53-L76)
Expand Down Expand Up @@ -103,3 +104,83 @@ args:
- "--port=<DB_PORT>"
- "<INSTANCE_CONNECTION_NAME>"
```
### Readiness Health Check Configuration
For most common usage, adding a readiness healthcheck to the proxy sidecar
container is unnecessary. An improperly configured readiness check can degrade
the application's availability.
The proxy readiness probe fails when (1) the proxy used all its available
concurrent connections to a database, (2) the network connection
to the database is interrupted, (3) the database server is unavailable due
to a maintenance operation. These are transient states that usually resolve
within a few seconds.
Most applications are resilient to transient database connection failures, and
do not need to be restarted. We recommend adding a readiness check to the
application container instead of the proxy container. The application can be
programmed to report whether it is ready to receive requests, and the healthcheck
can be tuned to restart the pod when the application is permanently stuck.
You should use the proxy container's readiness probe when these circumstances
should cause k8s to terminate the entire pod:
- The proxy can't connect to the database instances.
- The max number of connections are in use.
When you do use the proxy pod's readiness probe, be sure to set the
`failureThreshold` and `periodSeconds` to avoid restarting the pod on frequent
transient failures.

### Readiness Health Check Examples

The DBA team performs database fail-overs drills without notice. A
batch job should fail if it cannot connect the database for 3 minutes.
Set the readiness check so that the pod will be terminated after 3 minutes
of consecutive readiness check failures. (6 failed readiness checks taken every 30
seconds, 6 x 30sec = 3 minutes.)

```yaml
readinessProbe:
httpGet:
path: /readiness
port: 9090
initialDelaySeconds: 30
# 30 sec period x 6 failures = 3 min until the pod is terminated
periodSeconds: 30
failureThreshold: 6
timeoutSeconds: 10
successThreshold: 1
```


A web application has a database connection pool leak and the
engineering team can't find the root cause. To keep the system running,
the application should be automatically restarted if it consumes 50 connections
for more than 1 minute.

```yaml
containers:
- name: my-application
image: gcr.io/my-container/my-application:1.1
- name: cloud-sql-proxy
image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.0
args:
# Set the --max-connections flag to 50
- "--max-connections"
- "50"
- "--port=<DB_PORT>"
- "<INSTANCE_CONNECTION_NAME>"
# ...
readinessProbe:
httpGet:
path: /readiness
port: 9090
initialDelaySeconds: 10
# 5 sec period x 12 failures = 60 sec until the pod is terminated
periodSeconds: 5
failureThreshold: 12
timeoutSeconds: 5
successThreshold: 1
```
38 changes: 28 additions & 10 deletions examples/k8s-health-check/proxy_with_http_health_check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,26 @@ spec:
# Recommended configurations for health check probes.
# Probe parameters can be adjusted to best fit the requirements of your application.
# For details, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
startupProbe:
# The /startup probe returns OK when the proxy is ready to receive
# connections from the application. In this example, k8s will check
# once a second for 20 seconds.
#
# We strongly recommend adding a startup probe to the proxy sidecar
# container. This will ensure that service traffic will be routed to
# the pod only after the proxy has successfully started.
httpGet:
path: /startup
port: 9090
periodSeconds: 1
timeoutSeconds: 5
failureThreshold: 20
livenessProbe:
# The /liveness probe returns OK as soon as the proxy application has
# begun its startup process and continues to return OK until the
# process stops.
#
# We recommend adding a liveness probe to the proxy sidecar container.
httpGet:
path: /liveness
port: 9090
Expand All @@ -120,23 +139,22 @@ spec:
# restarted.
failureThreshold: 5
readinessProbe:
# The /readiness probe returns OK when the proxy can establish
# a new connections to its databases.
#
# Please use the readiness probe to the proxy sidecar with caution.
# An improperly configured readiness probe can cause unnecessary
# interruption to the application. See README.md for more detail.
httpGet:
path: /readiness
port: 9090
initialDelaySeconds: 0
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
timeoutSeconds: 10
# Number of times the probe must report success to transition from failure to healthy state.
# Defaults to 1 for readiness probe.
successThreshold: 1
failureThreshold: 1
startupProbe:
httpGet:
path: /startup
port: 9090
periodSeconds: 1
timeoutSeconds: 5
failureThreshold: 20
failureThreshold: 6
volumes:
- name: <YOUR-SA-SECRET-VOLUME>
secret:
Expand Down

0 comments on commit cb4f1e7

Please sign in to comment.