-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine deployment rollouts #1222
Refine deployment rollouts #1222
Conversation
There are cases when having a new Deployment may be taking above the default timeout of 120s. For instance, when a Deployment has multiple replicas, and each replica starts on a separate node, and the Deployment specifies new images, then just pulling these new images for each replica may be taking above the default timeout of 120s. Having the default time multiplied by the number of replicas should provide generally enough time for all replicas to start
Proper waiting is already performed earlier during Deplyment{apply: yes, wait: yes} - https://github.com/ansible-collections/kubernetes.core/blob/e6ac87409830b6c698b05dba875cca87d45ea761/plugins/module_utils/k8s/waiter.py#L27. And also not every Deployment change produces new RS/Pods. For example, changing Deployment labels won't cause new rollout, but will cause `until` loop to be invoked unnecessarily (when replicas=1).
… and `deletionTimestamp` Do not consider Pods marked for deletion when calculating tower_pod to address replicas scale down case - where normally Pods spawned recently are being taken for removal. As well as the case when operator kicked off but some old replicas are still terminating. Respect `creationTimestamp` so to make sure that the newest Pod is taken after Deployment application, in which case multiple RS Pods (from old RS and new RS) could be running simultaneously while the rollout is happening.
With the previous approach, not all associated (mounted) CM/Secrets changes caused the Deployment to be rolled out, but also the Deployment could have been rolled out unnecessary during e.g. Ingress or Service changes (which do not require Pod restarts). Previously existing Pod removal (state: absent) was not complete as other pods continued to exist, but also is not needed with this commit change due to added Pods annotations. The added Deployment Pod annotations now cause the new ReplicaSet version to be rolled out, effectively causing replacement of the previously existing Pods in accordance with the deployment `strategy` (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#deploymentstrategy-v1-apps, `RollingUpdate`) whenever there is a change in the associated CMs or Secrets referenced in annotations. This implementation is quite standard and widely used for Helm workflows - https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
@stanislav-zaprudskiy thanks for splitting out this work from the other PR. We'll get some eyes on this PR shortly |
@@ -13,9 +13,17 @@ | |||
- status.phase=Running | |||
register: tower_pod | |||
|
|||
- name: Set the resource pod as a variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will prevent us from grabbing a pod that is in terminating state. It also ensures that only one pod is grabbed (the oldest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding the corresponding commit message in case it got missed
commit b3a7436
Make AWX Pod variable to be calculated respecting `creationTimestamp` and `deletionTimestamp` Do not consider Pods marked for deletion when calculating tower_pod to address replicas scale down case - where normally Pods spawned recently are being taken for removal. As well as the case when operator kicked off but some old replicas are still terminating. Respect `creationTimestamp` so to make sure that the newest Pod is taken after Deployment application, in which case multiple RS Pods (from old RS and new RS) could be running simultaneously while the rollout is happening.
I've also encountered a couple of cases when Terminating
pod was picked up, and running awx-manage
command on it during the later tasks was not possible (the pod was gone already, etc).
Here indeed the oldest pod is taken, while later in the code the newest (the one most recently created) is used.
@@ -210,21 +250,10 @@ | |||
apply: yes | |||
definition: "{{ lookup('template', 'deployments/deployment.yaml.j2') }}" | |||
wait: yes | |||
wait_timeout: "{{ 120 * replicas or 120 }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stanislav-zaprudskiy could you expand on why you added this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the corresponding commit message just in case it got missed
commit e589ceb
When applying Deployment wait up to (timeout * replicas) There are cases when having a new Deployment may be taking above the default timeout of 120s. For instance, when a Deployment has multiple replicas, and each replica starts on a separate node, and the Deployment specifies new images, then just pulling these new images for each replica may be taking above the default timeout of 120s. Having the default time multiplied by the number of replicas should provide generally enough time for all replicas to start
The corresponding parameter wait: yes
(already provided) causes the task to wait until Deployment pods are ready, and with the default 120
seconds it could fail depending on the overall k8s and AWX configurations, causing the operator run to fail and be started again. Starting a new run won't solve the problem however, as it would run into the same timeout issue again and again until the new pods are ready - so by increasing maximum waiting time originally it saves from further failures and restarts.
Just to add, also the Deployment's rollout strategy configuration could increase the time for having the new pods ready.
There could be cases when scheduling new pods may not be possible (e.g. due to lack of resources, or wrong image configuration, etc) - but the operator would stuck (unnecessary) waiting. Having lower timeout value would make users aware of the problem earlier - but in multi-replica AWX configurations running in multi-node clusters where image caches aren't generally available, the lower timeout values provide too much false-positive failures of operator runs.
@@ -40,10 +40,10 @@ | |||
|
|||
- name: Set secret key secret | |||
set_fact: | |||
__secret_key_secret: '{{ _generated_secret_key["resources"] | default([]) | length | ternary(_generated_secret_key, _secret_key_secret) }}' | |||
secret_key: '{{ _generated_secret_key["resources"] | default([]) | length | ternary(_generated_secret_key, _secret_key_secret) }}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to do some testing around this before merging. cc @TheRealHaoLiu
"secrets/app_credentials", | ||
"storage/persistent", | ||
] %} | ||
checksum-{{ template | replace('/', '-') }}: "{{ lookup('template', template + '.yaml.j2') | md5 }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ This is a great trick! This is an elegant solution to the issue of deployments not being cycled when changes are made to the ConfigMap. @TheRealHaoLiu is impressed too.
@stanislav-zaprudskiy Thank you for this PR, it is exemplary. 🏆 |
SUMMARY
This is a groundwork which originated in and got split out from #1193 as being beneficial independently from that PR. It addresses some unintended behavior and edge cases during
Deployment
rollouts, making it more robust and reliable. It's better to be reviewed on a per commit basis, with details and rationale available in the corresponding commit messages.ISSUE TYPE
ADDITIONAL INFORMATION