Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness check behaves wrongly #3233

Closed
SvenW opened this issue Jun 7, 2023 · 14 comments · Fixed by #3276
Closed

Readiness check behaves wrongly #3233

SvenW opened this issue Jun 7, 2023 · 14 comments · Fixed by #3276
Assignees

Comments

@SvenW
Copy link

SvenW commented Jun 7, 2023

Describe the bug
The default readiness check behaves strangely. If the router gets a SIGINT/SIGTERM it cleans up all remaining connections, however it keeps responding {"status":"UP"} throughout the whole termination process which makes it get new connections that it wont be able to terminate in time before being killed by kubernetes. This is fine in the liveness probe part but not in the readiness however.

To Reproduce
Steps to reproduce the behavior:

  1. Setup services '...'
  2. Look at the /health endpoint
  3. See that it gives a up status even though it has received a SIGINT/SIGTERM

Expected behavior
As soon as the router gets a termination signal it should stop answering {"status":"UP"} on the readiness probe checks. Still fine to do it on the liveness probe though. Would be nice to be able to configure how long this termination grace period should be in the YAML.

Additional context
This is when running on GKE v1.24.12

@Meemaw
Copy link
Contributor

Meemaw commented Jun 7, 2023

You can already configure terminationGracePeriodSeconds:

terminationGracePeriodSeconds: 30

@SvenW
Copy link
Author

SvenW commented Jun 8, 2023

You can already configure terminationGracePeriodSeconds:

terminationGracePeriodSeconds: 30

Yes, but that's not sufficient unfortunately. In order for the readiness to behave correctly both that value and being able to configure how long the router itself is able to work on draining connections needs to be configurable. We need to be able to respond to requests during the time when the LB is still removing us and after that removal stop serving requests and drain old connections

@SvenW
Copy link
Author

SvenW commented Jun 12, 2023

I've bypassed this by adding a shell script that captures the signals and creates a file when it shuts down. The readiness probe then checks if that file exists in order to determine if the pod is still ready or not. This works but it's not especially nice :(

@garypen
Copy link
Contributor

garypen commented Jun 15, 2023

I agree this isn't a great situation. I've got a WIP PR which addresses this problem and I've bumped the priority to P1. My hope is that I'll find some time to fix this soon.

@garypen garypen self-assigned this Jun 15, 2023
@voslartomas
Copy link

voslartomas commented Jun 21, 2023

@SvenW Could you share what exactly did you do to overcome this issue please?

I have tried to add touch /tmp/shutdown file in preStop hook and then check it in readinees probe, but still I can see 502 when deploying new version of router.

@SvenW
Copy link
Author

SvenW commented Jun 21, 2023

@SvenW Could you share what exactly did you do to overcome this issue please?

I have tried to add touch /tmp/shutdown file in preStop hook and then check it in readinees probe, but still I can see 502 when deploying new version of router.

Sure. Disregard the ugliness in my little bash script here:

#!/bin/bash

trap 'kill -TERM $PID' TERM INT
exec /dist/router_wrapper.sh &
PID=$!
trap 'touch $SHUTDOWN_FILE' TERM INT
wait $PID
sleep 20
trap - TERM INT
kill -SIGINT $PID
wait $PID
EXIT_STATUS=$?

In order for it to work I had to send a SIGINT to the router pid, if I didn't it didn't receive the termination as it should, and then in my deployment.yaml for the router I had to rewrite the readiness probe to a bash based like this:

readinessProbe:
  exec:
    command:
    - sh
    - -c
    - "if [ -f /tmp/shutdown ]; then exit 1; else exit 0; fi;"

@voslartomas
Copy link

That makes sense, that is actually beautiful, thanks!

@voslartomas
Copy link

@SvenW And do you have any special configuration in deployment?

@SvenW
Copy link
Author

SvenW commented Jun 21, 2023

@SvenW And do you have any special configuration in deployment?

I just use the default terminationGracePeriodSeconds which is 30s and for the probe I've used the default values for now

@voslartomas
Copy link

All right, one more question why do you have that 20s sleep?

@SvenW
Copy link
Author

SvenW commented Jun 21, 2023

All right, one more question why do you have that 20s sleep?

Something that is less than the terminationGracePeriodSeconds so we have a little time to say we're not ready before removing the currently running requests as the router does for us.

@voslartomas
Copy link

All right, one more question why do you have that 20s sleep?

Something that is less than the terminationGracePeriodSeconds so we have a little time to say we're not ready before removing the currently running requests as the router does for us.

But this trap 'kill -TERM $PID' TERM INT first command will send term signal to router as soon as it will receive it right? What is that for? I thought that we want to catch that signal, first create shutdown file and wait for 20s and then send term signal to router? Or am I missing something?

@SvenW
Copy link
Author

SvenW commented Jun 21, 2023

All right, one more question why do you have that 20s sleep?

Something that is less than the terminationGracePeriodSeconds so we have a little time to say we're not ready before removing the currently running requests as the router does for us.

But this trap 'kill -TERM $PID' TERM INT first command will send term signal to router as soon as it will receive it right? What is that for? I thought that we want to catch that signal, first create shutdown file and wait for 20s and then send term signal to router? Or am I missing something?

It is catching the term signal now. We are sending the SIGINT signal to the pid after 20s. You can test it locally by building your local Dockerfile and then sending a SIGINT or SIGTERM signal to that container to validate that it's working as it should. If you have ideas of improving the bash script I would really appreciate it

@voslartomas
Copy link

All right it works, I just had to add preStop with sleep for 10 seconds for some reason, after that 0 502 errors on deployment. Thanks!

garypen added a commit that referenced this issue Jun 30, 2023
The router is a complex beast and it's not entirely straightforward to
determine if a router is live/ready.

From a kubernetes point of view, a service is:

- live [if it isn't
deadlocked](https://www.linkedin.com/posts/llarsson_betterdevopsonkubernetes-devops-devsecops-activity-7018587202121076736-LRxE)
 - ready if it is able to start accepting traffic

(For more details:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)

Our current health check isn't actually doing either of these things.
It's returning a payload which indicates if the router is "healthy" or
not and it's always returning "UP" (hard-code) with HTTP status code:
200. As long as it is serving traffic it appears "healthy".

To address this we need:

1. A mechanism within the router to define whether it is ready or live.
2. A new interface whereby an interested party may determine if the
router is ready or live.

Both of these questions are significantly complicated by the fact that
the router is architected to maintain connectivity in the face of
changing configuration, schemas, etc... Multiple interested parties may
be operating against the same router with different states (via
different connections) and thus could, in theory, get different answers.
However, I'm working on the assumption that, for k8s control purposes, a
client, would only be interested in the current (latest) state of the
router, so that's what we'll track.

---

1. How do we know if the router is ready or live?

Ready

 - Not currently processing a change to its set of input
 - Is in state Running
- Resources available (open files limit not reached) (although this
removes a pod from service, which would be bad)
 - Is live

Live
 - Health check enabled and responding

2. Extending our existing health interface

If we add support for query parameters named "ready" and "live" to our
existing health endpoint, then we can provide a backwards compatible
interface which leverages the existing "/health" endpoint. We would
support both POST and GET for maximum compatibility.

Sample queries:

```
curl -XPOST "http://localhost:8088/health?ready" OR curl  "http://localhost:8088/health?ready"

curl -XPOST "http://localhost:8088/health?live" OR curl "http://localhost:8088/health?live"
```

You could only query for one at a time (which I think is fine).

Note: Our health check returns a 200 with a JSON payload. It isn't easy
to process that in k8s. Especially when you can't install useful
utilities into your container image (like curl, jq, etc...) because of
security and CVEs. When the router is not live or ready, return a 503
which k8s interprets as a probe fail.

Fixes #3233 

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [x] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`

---------

Co-authored-by: Bryn Cooke <[email protected]>
Co-authored-by: Pierre Tondereau <[email protected]>
Co-authored-by: Avery Harnish <[email protected]>
Co-authored-by: bryn <[email protected]>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Geoffroy Couprie <[email protected]>
Co-authored-by: Simon Sapin <[email protected]>
Co-authored-by: o0Ignition0o <[email protected]>
Co-authored-by: Jesse Rosenberger <[email protected]>
Co-authored-by: Jeremy Lempereur <[email protected]>
Co-authored-by: Dylan Anthony <[email protected]>
BrynCooke added a commit that referenced this issue Jul 12, 2023
The router is a complex beast and it's not entirely straightforward to
determine if a router is live/ready.

From a kubernetes point of view, a service is:

- live [if it isn't
deadlocked](https://www.linkedin.com/posts/llarsson_betterdevopsonkubernetes-devops-devsecops-activity-7018587202121076736-LRxE)
 - ready if it is able to start accepting traffic

(For more details:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)

Our current health check isn't actually doing either of these things.
It's returning a payload which indicates if the router is "healthy" or
not and it's always returning "UP" (hard-code) with HTTP status code:
200. As long as it is serving traffic it appears "healthy".

To address this we need:

1. A mechanism within the router to define whether it is ready or live.
2. A new interface whereby an interested party may determine if the
router is ready or live.

Both of these questions are significantly complicated by the fact that
the router is architected to maintain connectivity in the face of
changing configuration, schemas, etc... Multiple interested parties may
be operating against the same router with different states (via
different connections) and thus could, in theory, get different answers.
However, I'm working on the assumption that, for k8s control purposes, a
client, would only be interested in the current (latest) state of the
router, so that's what we'll track.

---

1. How do we know if the router is ready or live?

Ready

 - Not currently processing a change to its set of input
 - Is in state Running
- Resources available (open files limit not reached) (although this
removes a pod from service, which would be bad)
 - Is live

Live
 - Health check enabled and responding

2. Extending our existing health interface

If we add support for query parameters named "ready" and "live" to our
existing health endpoint, then we can provide a backwards compatible
interface which leverages the existing "/health" endpoint. We would
support both POST and GET for maximum compatibility.

Sample queries:

```
curl -XPOST "http://localhost:8088/health?ready" OR curl  "http://localhost:8088/health?ready"

curl -XPOST "http://localhost:8088/health?live" OR curl "http://localhost:8088/health?live"
```

You could only query for one at a time (which I think is fine).

Note: Our health check returns a 200 with a JSON payload. It isn't easy
to process that in k8s. Especially when you can't install useful
utilities into your container image (like curl, jq, etc...) because of
security and CVEs. When the router is not live or ready, return a 503
which k8s interprets as a probe fail.

Fixes #3233 

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [x] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`

---------

Co-authored-by: Bryn Cooke <[email protected]>
Co-authored-by: Pierre Tondereau <[email protected]>
Co-authored-by: Avery Harnish <[email protected]>
Co-authored-by: bryn <[email protected]>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Geoffroy Couprie <[email protected]>
Co-authored-by: Simon Sapin <[email protected]>
Co-authored-by: o0Ignition0o <[email protected]>
Co-authored-by: Jesse Rosenberger <[email protected]>
Co-authored-by: Jeremy Lempereur <[email protected]>
Co-authored-by: Dylan Anthony <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants