VC-36444: Add a health check and liveness probe #580

maelvls · 2024-10-01T13:36:21Z

Why? Customers don’t like the fact that OpenShift shows warnings whenever they install Venafi Kubernetes Agent. These warnings say that this pod doesn’t have a liveness and a readiness configuration in the Pod specification, and that the best practice is to have one for each pod.

What? The idea isn’t to define what “ready” means in the agent. The agent doesn’t receive traffic, so it wouldn’t make sense to configure a readiness probe (readiness probes are used by Kubernetes to know when to start routing traffic to that pod). And regarding the liveness probe, although it could be useful to indicate that the agent isn’t able to push data to the API, defining a “real” liveness probe is out of scope for this ticket. Instead, the idea is to implement a “ping pong” liveness and readiness probe so that OpenShift customers stop seeing warnings.

Note to the reviewer: I've chosen to combine the metrics endpoint (/metrics) with the readiness and liveness probes because I don't think there will be problems with combining the three.

Testing

I've used the tenant https://ven-tlspk.venafi.cloud/. To access the API key, use the user [email protected] and the password is visible in the page Production Accounts (private to Venafi). Then go to the settings and find the API key.

export APIKEY=...

Then, run:

venctl iam service-account agent create --name "$USER temp" \
  --vcp-region US \
  --output json \
  --owning-team $(curl -sS https://api.venafi.cloud/v1/teams -H "tppl-api-key: $APIKEY" | jq '.teams[0].id') \
  --output-file /tmp/agent-credentials.json \
  --api-key $APIKEY
make oci-push-preflight oci_preflight_image_name=ttl.sh/mael/venafi-agent oci_preflight_image_tag=v0.0.0-dev oci_platforms=linux/arm64
make helm-chart oci_preflight_image_tag=v0.0.0-dev helm_chart_version=0.0.0-dev oci_preflight_image_name=ttl.sh/mael/venafi-agent
helm push _bin/scratch/image/venafi-kubernetes-agent-0.0.0-dev.tgz oci://ttl.sh/mael/charts
helm upgrade -i -n venafi --create-namespace venafi-kubernetes-agent oci://ttl.sh/mael/charts/venafi-kubernetes-agent --version 0.0.0-dev \
  --set config.clusterName="$USER temp" --set config.clientId="$(jq -r .private_key /tmp/agent-credentials.json)"
kubectl create secret generic -n venafi agent-credentials --from-literal=privatekey.pem="$(jq -r .private_key /tmp/agent-credentials.json)" \
  --dry-run=client -o yaml | kubectl apply -f -

You should see that the pod now has a liveness and readiness probe:

$ k get -n venafi pod -l app.kubernetes.io/name=venafi-kubernetes-agent -oyaml | yq '.items[].spec.containers[] | with_entries(select(.key | test(".*Probe.*")))'  
livenessProbe:
  failureThreshold: 3
  httpGet:
    path: /healthz
    port: 8081
    scheme: HTTP
  initialDelaySeconds: 15
  periodSeconds: 20
  successThreshold: 1
  timeoutSeconds: 1
readinessProbe:
  failureThreshold: 3
  httpGet:
    path: /readyz
    port: 8081
    scheme: HTTP
  initialDelaySeconds: 5
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1

To check that the server works, I used telepresence (can't rely on k proxy since it doesn't have a service, can't reach the pod IP since I'm on macOS)

curl -fL https://app.getambassador.io/download/tel2oss/releases/download/v2.19.1/telepresence-darwin-arm64 -o ~/bin/telepresence
telepresence helm install
telepresence connect -n venafi

Then:

$ IP=$(k get -n venafi pod -l app.kubernetes.io/name=venafi-kubernetes-agent -oyaml | yq '.items[].status.podIP')

$ curl -D/dev/stderr http://$IP:8081/healthz
HTTP/1.1 200 OK
Date: Tue, 01 Oct 2024 14:17:50 GMT
Content-Length: 0

$ curl -D/dev/stderr http://$IP:8081/readyz
HTTP/1.1 200 OK
Date: Tue, 01 Oct 2024 14:18:13 GMT
Content-Length: 0

The pod should be ready:

$ k get -n venafi pod -l app.kubernetes.io/name=venafi-kubernetes-agent
NAME                                       READY   STATUS    RESTARTS        AGE
venafi-kubernetes-agent-669d9bc4d4-cmtlh   1/1     Running   19 (9m8s ago)   4h2m

pkg/agent/run.go

inteon · 2024-10-02T07:37:30Z

@maelvls do we also need the readiness probe?
The error message that we were seeing in OpenShift complains about missing health checks. I think that the liveness probe alone might be enough.

maelvls · 2024-10-03T07:34:16Z

@maelvls do we also need the readiness probe? The error message that we were seeing in OpenShift complains about missing health checks. I think that the liveness probe alone might be enough.

You are right, we both need the readiness and liveness probe. I've already added both to the deployment, so we should be good to go.

maelvls · 2024-10-03T07:35:44Z

By the way, how did you get this screenshot? Do you have an OpenShift cluster around? I'd be interested to test this new feature with a real OpenShift cluster.

inteon · 2024-10-03T07:53:57Z

By the way, how did you get this screenshot? Do you have an OpenShift cluster around? I'd be interested to test this new feature with a real OpenShift cluster.

I copied it from the Jira ticket 😄

wallrj · 2024-11-20T16:19:03Z

@maelvls do we also need the readiness probe? The error message that we were seeing in OpenShift complains about missing health checks. I think that the liveness probe alone might be enough.

You are right, we both need the readiness and liveness probe. I've already added both to the deployment, so we should be good to go.

Surely you were talking across purposes here. Tim said "I think that the liveness probe alone might be enough."

I do not think a readiness probe was required.

No health endpoint #439 (comment)

helm: add a health check and liveness probe

2d6a62f

inteon reviewed Oct 1, 2024

View reviewed changes

pkg/agent/run.go Outdated Show resolved Hide resolved

healthcheck: don't depend on --prometheus, always run healthcheck server

b2cf1cc

inteon approved these changes Oct 3, 2024

View reviewed changes

maelvls merged commit 21c0274 into master Oct 3, 2024
2 checks passed

maelvls deleted the VC-36444-health-check branch October 3, 2024 08:23

wallrj mentioned this pull request Nov 20, 2024

No health endpoint #439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VC-36444: Add a health check and liveness probe #580

VC-36444: Add a health check and liveness probe #580

maelvls commented Oct 1, 2024 •

edited

Loading

inteon commented Oct 2, 2024

maelvls commented Oct 3, 2024

maelvls commented Oct 3, 2024

inteon commented Oct 3, 2024

wallrj commented Nov 20, 2024

VC-36444: Add a health check and liveness probe #580

VC-36444: Add a health check and liveness probe #580

Conversation

maelvls commented Oct 1, 2024 • edited Loading

inteon commented Oct 2, 2024

maelvls commented Oct 3, 2024

maelvls commented Oct 3, 2024

inteon commented Oct 3, 2024

wallrj commented Nov 20, 2024

maelvls commented Oct 1, 2024 •

edited

Loading