-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/redis] Improve sentinel prestop hook to prevent service interruption #6080
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the PR!
Please take a look at my comments
I have reworked the waiting logic to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for implementing the changes!
But if there is a way to get the current value of the termination grace period we could cap this process to that minus 5s or something like that. That would make sure we exit quickly enough even when someone changes the grace period (but I do not think this chart exposes the grace period as a variable to be changed)
About this, we could just add it. It is just needed to add it to the values.yaml, README.md and to the statefulset. Then you will be able to use it just like {{ .Values.terminationGracePeriodSeconds }}
.
After that, the minor version must be bumped in a minor instead of a patch.
Thanks for the tip. Implemented. |
Great PR! Please pay attention to the following items before merging: Files matching
This is an automatically generated QA checklist based on modified files |
Another PR (#6146) containing the hard-coded masterset value fix has been merged. I will remove the first commit from this PR to resolve the conflicts. The rest of this PR should still get merged after successful review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're doing a major refactoring of the chart at #6102
I will block this PR until the refactoring is completed.
Juan, could you please explain why a work on the next major version should prevent a bugfix PR like this from being merged? This PR doesn't bring any backward compatibility breaks and would be useful to people using v13 of this chart. |
Hi @Gregy The major refactoring PR was merged, could you please rebase your branch from master and update the PR? Then, please pin me and I'll be glad to review it. Thanks in advance. |
Wow that was quick. Thanks for pinging me @juan131 Rebase done. Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I won't have time to check this properly today
I'll check it tomorrow morning
bitnami/redis/README.md
Outdated
| Name | Description | Value | | ||
|---------------------------------|----------------------------------------------------|-----------------| | ||
| `kubeVersion` | Override Kubernetes version | `nil` | | ||
| `nameOverride` | String to partially override common.names.fullname | `nil` | | ||
| `fullnameOverride` | String to fully override common.names.fullname | `nil` | | ||
| `commonLabels` | Labels to add to all deployed objects | `{}` | | ||
| `commonAnnotations` | Annotations to add to all deployed objects | `{}` | | ||
| `clusterDomain` | Kubernetes cluster domain name | `cluster.local` | | ||
| `extraDeploy` | Array of extra objects to deploy with the release | `[]` | | ||
| `terminationGracePeriodSeconds` | Define termination grace period for all pods | `30` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be able to define different grace periods depending on the component type?
Just curious, did you use readmenator to generate the tables automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The termination grace period is configured for the whole pod. Not the individual containers. We could theoreticaly make the setting seperate for sentinels, masters and replicas but if sentinels are used no masters and replicas are present and it doesn't make much sense to me to have a different grace period for master vs replica. So I am for keeping the setting global. But I can separate them if you wish.
No, I don't even have access to that repo. I have just modified the file with some help from a markdown table generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make much sense to me to have a different grace period for master vs replica. So I am for keeping the setting global. But I can separate them if you wish.
I agree it's very unlikely you want to set a value for this setting on master nodes different from the one you set on replicas nodes. That said, it seems this is the approach we followed on other charts, and I'd follow it for consistency. See:
- https://github.com/bitnami/charts/blob/master/bitnami/metallb/templates/controller/deployment.yaml#L34
- https://github.com/bitnami/charts/blob/master/bitnami/metallb/templates/speaker/daemonset.yaml#L3
No, I don't even have access to that repo. I have just modified the file with some help from a markdown table generator.
Crap, i though this was already public. Sorry for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have separated the configs.
This improves on bitnami#5528 by checking and waiting until the failover is finished on both the redis and the sentinel container. This completely eliminates momentary service interruption during rollouts. As we cannot guarantee the failover will be successful the wait time is capped by the termination grace period - 10s.
bitnami/redis/README.md
Outdated
| Name | Description | Value | | ||
|---------------------------------|----------------------------------------------------|-----------------| | ||
| `kubeVersion` | Override Kubernetes version | `nil` | | ||
| `nameOverride` | String to partially override common.names.fullname | `nil` | | ||
| `fullnameOverride` | String to fully override common.names.fullname | `nil` | | ||
| `commonLabels` | Labels to add to all deployed objects | `{}` | | ||
| `commonAnnotations` | Annotations to add to all deployed objects | `{}` | | ||
| `clusterDomain` | Kubernetes cluster domain name | `cluster.local` | | ||
| `extraDeploy` | Array of extra objects to deploy with the release | `[]` | | ||
| `terminationGracePeriodSeconds` | Define termination grace period for all pods | `30` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make much sense to me to have a different grace period for master vs replica. So I am for keeping the setting global. But I can separate them if you wish.
I agree it's very unlikely you want to set a value for this setting on master nodes different from the one you set on replicas nodes. That said, it seems this is the approach we followed on other charts, and I'd follow it for consistency. See:
- https://github.com/bitnami/charts/blob/master/bitnami/metallb/templates/controller/deployment.yaml#L34
- https://github.com/bitnami/charts/blob/master/bitnami/metallb/templates/speaker/daemonset.yaml#L3
No, I don't even have access to that repo. I have just modified the file with some help from a markdown table generator.
Crap, i though this was already public. Sorry for the noise.
Ouch, thanks for catching that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the changes @Gregy ,
LGTM!
We have to wait the internal CI to update the images tags
Signed-off-by: Bitnami Containers <[email protected]>
I have just updated the bitnami images with the latest known immutable tags:
|
Awesome. Thank you very much Miguelaeh. |
This improves on #5528 by checking and waiting until the failover is finished on both the redis and the sentinel container termination. This completely eliminates a momentary service interruption during rollouts that happens with just #5528 applied.
Benefits
This eliminates the few seconds of downtime that happen with v13.0.1 because the redis container terminates before the sentinel prestop hook finishes running (in k8s containers are being stopped concurrently)
I have also replaced a hard-coded masterSet config value in the prestop script with proper config reference. This problem was mentioned by @srueg in this comment: #5528 (comment)
Possible drawbacks
This code can make pod deletes (and rollouts) slower by a few seconds.
Applicable issues
Additional information
Checklist
Chart.yaml
according to semver.[bitnami/chart]
)