[bitnami/redis] Improve sentinel prestop hook to prevent service interruption #6080

Gregy · 2021-04-12T18:06:42Z

This improves on #5528 by checking and waiting until the failover is finished on both the redis and the sentinel container termination. This completely eliminates a momentary service interruption during rollouts that happens with just #5528 applied.

Benefits

This eliminates the few seconds of downtime that happen with v13.0.1 because the redis container terminates before the sentinel prestop hook finishes running (in k8s containers are being stopped concurrently)

I have also replaced a hard-coded masterSet config value in the prestop script with proper config reference. This problem was mentioned by @srueg in this comment: #5528 (comment)

Possible drawbacks

This code can make pod deletes (and rollouts) slower by a few seconds.

Applicable issues

Additional information

Checklist

Chart version bumped in Chart.yaml according to semver.
Variables are documented in the README.md
Title of the PR starts with chart name (e.g. [bitnami/chart])

miguelaeh

Thank you very much for the PR!
Please take a look at my comments

bitnami/redis/templates/configmap-scripts.yaml

Gregy · 2021-04-16T09:20:03Z

I have reworked the waiting logic to use retry_while instead of loops like you suggested @miguelaeh . Can you please check if the new approach is acceptable?

miguelaeh

Thank you very much for implementing the changes!

But if there is a way to get the current value of the termination grace period we could cap this process to that minus 5s or something like that. That would make sure we exit quickly enough even when someone changes the grace period (but I do not think this chart exposes the grace period as a variable to be changed)

About this, we could just add it. It is just needed to add it to the values.yaml, README.md and to the statefulset. Then you will be able to use it just like {{ .Values.terminationGracePeriodSeconds }}.
After that, the minor version must be bumped in a minor instead of a patch.

Gregy · 2021-04-19T08:47:07Z

Thanks for the tip. Implemented.

github-actions · 2021-04-19T08:47:08Z

Great PR! Please pay attention to the following items before merging:

Files matching bitnami/*/values.yaml:

Is the PR adding a new container? Please reviewer, add it to the models (internal process)
Is the PR adding a new parameter? Please, ensure it’s documented in the README.md

This is an automatically generated QA checklist based on modified files

Gregy · 2021-04-19T09:51:55Z

Another PR (#6146) containing the hard-coded masterset value fix has been merged. I will remove the first commit from this PR to resolve the conflicts. The rest of this PR should still get merged after successful review.

juan131

We're doing a major refactoring of the chart at #6102
I will block this PR until the refactoring is completed.

Gregy · 2021-04-19T09:58:12Z

Juan, could you please explain why a work on the next major version should prevent a bugfix PR like this from being merged? This PR doesn't bring any backward compatibility breaks and would be useful to people using v13 of this chart.

juan131 · 2021-04-20T08:42:24Z

Hi @Gregy

The major refactoring PR was merged, could you please rebase your branch from master and update the PR? Then, please pin me and I'll be glad to review it.

Thanks in advance.

Gregy · 2021-04-20T10:00:33Z

Wow that was quick. Thanks for pinging me @juan131

Rebase done. Ready for review.

juan131

Sorry, I won't have time to check this properly today
I'll check it tomorrow morning

juan131 · 2021-04-21T13:35:36Z

bitnami/redis/README.md

+| Name                            | Description                                        | Value           |
+|---------------------------------|----------------------------------------------------|-----------------|
+| `kubeVersion`                   | Override Kubernetes version                        | `nil`           |
+| `nameOverride`                  | String to partially override common.names.fullname | `nil`           |
+| `fullnameOverride`              | String to fully override common.names.fullname     | `nil`           |
+| `commonLabels`                  | Labels to add to all deployed objects              | `{}`            |
+| `commonAnnotations`             | Annotations to add to all deployed objects         | `{}`            |
+| `clusterDomain`                 | Kubernetes cluster domain name                     | `cluster.local` |
+| `extraDeploy`                   | Array of extra objects to deploy with the release  | `[]`            |
+| `terminationGracePeriodSeconds` | Define termination grace period for all pods       | `30`            |


Shouldn't we be able to define different grace periods depending on the component type?

Just curious, did you use readmenator to generate the tables automatically?

I don't think so. The termination grace period is configured for the whole pod. Not the individual containers. We could theoreticaly make the setting seperate for sentinels, masters and replicas but if sentinels are used no masters and replicas are present and it doesn't make much sense to me to have a different grace period for master vs replica. So I am for keeping the setting global. But I can separate them if you wish.

No, I don't even have access to that repo. I have just modified the file with some help from a markdown table generator.

It doesn't make much sense to me to have a different grace period for master vs replica. So I am for keeping the setting global. But I can separate them if you wish.

I agree it's very unlikely you want to set a value for this setting on master nodes different from the one you set on replicas nodes. That said, it seems this is the approach we followed on other charts, and I'd follow it for consistency. See:

https://github.com/bitnami/charts/blob/master/bitnami/metallb/templates/controller/deployment.yaml#L34

https://github.com/bitnami/charts/blob/master/bitnami/metallb/templates/speaker/daemonset.yaml#L3

No, I don't even have access to that repo. I have just modified the file with some help from a markdown table generator.

Crap, i though this was already public. Sorry for the noise.

Ok, I have separated the configs.

This improves on bitnami#5528 by checking and waiting until the failover is finished on both the redis and the sentinel container. This completely eliminates momentary service interruption during rollouts. As we cannot guarantee the failover will be successful the wait time is capped by the termination grace period - 10s.

juan131 · 2021-04-22T09:31:57Z

bitnami/redis/README.md

+| Name                            | Description                                        | Value           |
+|---------------------------------|----------------------------------------------------|-----------------|
+| `kubeVersion`                   | Override Kubernetes version                        | `nil`           |
+| `nameOverride`                  | String to partially override common.names.fullname | `nil`           |
+| `fullnameOverride`              | String to fully override common.names.fullname     | `nil`           |
+| `commonLabels`                  | Labels to add to all deployed objects              | `{}`            |
+| `commonAnnotations`             | Annotations to add to all deployed objects         | `{}`            |
+| `clusterDomain`                 | Kubernetes cluster domain name                     | `cluster.local` |
+| `extraDeploy`                   | Array of extra objects to deploy with the release  | `[]`            |
+| `terminationGracePeriodSeconds` | Define termination grace period for all pods       | `30`            |


It doesn't make much sense to me to have a different grace period for master vs replica. So I am for keeping the setting global. But I can separate them if you wish.

I agree it's very unlikely you want to set a value for this setting on master nodes different from the one you set on replicas nodes. That said, it seems this is the approach we followed on other charts, and I'd follow it for consistency. See:

https://github.com/bitnami/charts/blob/master/bitnami/metallb/templates/controller/deployment.yaml#L34

https://github.com/bitnami/charts/blob/master/bitnami/metallb/templates/speaker/daemonset.yaml#L3

No, I don't even have access to that repo. I have just modified the file with some help from a markdown table generator.

Crap, i though this was already public. Sorry for the noise.

bitnami/redis/templates/scripts-configmap.yaml

bitnami/redis/templates/master/statefulset.yaml

bitnami/redis/templates/replicas/statefulset.yaml

bitnami/redis/templates/sentinel/statefulset.yaml

Gregy · 2021-04-22T13:01:24Z

Ouch, thanks for catching that!

miguelaeh

Thank you for all the changes @Gregy ,
LGTM!
We have to wait the internal CI to update the images tags

Signed-off-by: Bitnami Containers <[email protected]>

bitnami-bot · 2021-04-23T09:11:24Z

I have just updated the bitnami images with the latest known immutable tags:

"docker.io/bitnami/redis:6.2.2-debian-10-r3"
"docker.io/bitnami/redis-exporter:1.22.0-debian-10-r0"
"docker.io/bitnami/redis-sentinel-exporter:1.7.1-debian-10-r122"
"docker.io/bitnami/redis-sentinel:6.2.2-debian-10-r2"
"docker.io/bitnami/bitnami-shell:10"
"docker.io/bitnami/bitnami-shell:10"

Gregy · 2021-04-26T14:35:07Z

Awesome. Thank you very much Miguelaeh.

miguelaeh suggested changes Apr 13, 2021

View reviewed changes

Gregy force-pushed the master branch from 3df21f6 to 3f913ca Compare April 16, 2021 09:17

miguelaeh reviewed Apr 19, 2021

View reviewed changes

Gregy force-pushed the master branch from ccc817f to 3f6f695 Compare April 19, 2021 08:50

juan131 requested changes Apr 19, 2021

View reviewed changes

Gregy force-pushed the master branch from 3f6f695 to 7e73718 Compare April 19, 2021 10:07

Gregy force-pushed the master branch from 7e73718 to b5a9dff Compare April 20, 2021 09:58

juan131 reviewed Apr 21, 2021

View reviewed changes

Gregy force-pushed the master branch from b5a9dff to 4e0ba81 Compare April 21, 2021 15:16

juan131 reviewed Apr 22, 2021

View reviewed changes

Gregy requested a review from juan131 April 22, 2021 10:52

juan131 reviewed Apr 22, 2021

View reviewed changes

bitnami/redis/templates/master/statefulset.yaml Outdated Show resolved Hide resolved

bitnami/redis/templates/replicas/statefulset.yaml Outdated Show resolved Hide resolved

bitnami/redis/templates/sentinel/statefulset.yaml Outdated Show resolved Hide resolved

Gregy added 2 commits April 22, 2021 15:03

Separate terminationGracePeriod setings for each pod type

05870f1

make the use of REDISCLI_AUTH clear

beeec6d

Gregy force-pushed the master branch from 9d346a5 to beeec6d Compare April 22, 2021 13:03

Gregy requested a review from juan131 April 22, 2021 13:04

miguelaeh approved these changes Apr 23, 2021

View reviewed changes

[bitnami/redis] Update components versions

88fc1f0

Signed-off-by: Bitnami Containers <[email protected]>

bitnami-bot merged commit 943c301 into bitnami:master Apr 23, 2021

serkantul mentioned this pull request Apr 26, 2021

redis + sentinel master pod reschedule / deletion results in two masters #5543

Closed

simon-baatz mentioned this pull request Sep 20, 2022

[bitnami/redis] sentinel failover in prestop hook causes write loss #12598

Closed

simon-baatz mentioned this pull request Oct 20, 2022

[bitnami/redis] Fix data loss when executing failover #13021

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/redis] Improve sentinel prestop hook to prevent service interruption #6080

[bitnami/redis] Improve sentinel prestop hook to prevent service interruption #6080

Gregy commented Apr 12, 2021 •

edited

Loading

miguelaeh left a comment

Gregy commented Apr 16, 2021

miguelaeh left a comment •

edited

Loading

Gregy commented Apr 19, 2021

github-actions bot commented Apr 19, 2021 •

edited

Loading

Gregy commented Apr 19, 2021

juan131 left a comment

Gregy commented Apr 19, 2021 •

edited

Loading

juan131 commented Apr 20, 2021

Gregy commented Apr 20, 2021

juan131 left a comment

juan131 Apr 21, 2021

Gregy Apr 21, 2021

juan131 Apr 22, 2021

Gregy Apr 22, 2021 •

edited

Loading

juan131 Apr 22, 2021

Gregy commented Apr 22, 2021

miguelaeh left a comment

bitnami-bot commented Apr 23, 2021

Gregy commented Apr 26, 2021

[bitnami/redis] Improve sentinel prestop hook to prevent service interruption #6080

[bitnami/redis] Improve sentinel prestop hook to prevent service interruption #6080

Conversation

Gregy commented Apr 12, 2021 • edited Loading

miguelaeh left a comment

Choose a reason for hiding this comment

Gregy commented Apr 16, 2021

miguelaeh left a comment • edited Loading

Choose a reason for hiding this comment

Gregy commented Apr 19, 2021

github-actions bot commented Apr 19, 2021 • edited Loading

Gregy commented Apr 19, 2021

juan131 left a comment

Choose a reason for hiding this comment

Gregy commented Apr 19, 2021 • edited Loading

juan131 commented Apr 20, 2021

Gregy commented Apr 20, 2021

juan131 left a comment

Choose a reason for hiding this comment

juan131 Apr 21, 2021

Choose a reason for hiding this comment

Gregy Apr 21, 2021

Choose a reason for hiding this comment

juan131 Apr 22, 2021

Choose a reason for hiding this comment

Gregy Apr 22, 2021 • edited Loading

Choose a reason for hiding this comment

juan131 Apr 22, 2021

Choose a reason for hiding this comment

Gregy commented Apr 22, 2021

miguelaeh left a comment

Choose a reason for hiding this comment

bitnami-bot commented Apr 23, 2021

Gregy commented Apr 26, 2021

Gregy commented Apr 12, 2021 •

edited

Loading

miguelaeh left a comment •

edited

Loading

github-actions bot commented Apr 19, 2021 •

edited

Loading

Gregy commented Apr 19, 2021 •

edited

Loading

Gregy Apr 22, 2021 •

edited

Loading