Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine deployment rollouts #1222

Merged

Conversation

stanislav-zaprudskiy
Copy link
Contributor

SUMMARY

This is a groundwork which originated in and got split out from #1193 as being beneficial independently from that PR. It addresses some unintended behavior and edge cases during Deployment rollouts, making it more robust and reliable. It's better to be reviewed on a per commit basis, with details and rationale available in the corresponding commit messages.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
ADDITIONAL INFORMATION

There are cases when having a new Deployment may be taking above the
default timeout of 120s.
For instance, when a Deployment has multiple replicas, and each replica
starts on a separate node, and the Deployment specifies new images, then
just pulling these new images for each replica may be taking above the
default timeout of 120s.

Having the default time multiplied by the number of replicas should
provide generally enough time for all replicas to start
Proper waiting is already performed earlier during Deplyment{apply: yes, wait: yes} -
https://github.com/ansible-collections/kubernetes.core/blob/e6ac87409830b6c698b05dba875cca87d45ea761/plugins/module_utils/k8s/waiter.py#L27.

And also not every Deployment change produces new RS/Pods. For example,
changing Deployment labels won't cause new rollout, but will cause
`until` loop to be invoked unnecessarily (when replicas=1).
… and `deletionTimestamp`

Do not consider Pods marked for deletion when calculating tower_pod to
address replicas scale down case - where normally Pods spawned recently
are being taken for removal. As well as the case when operator kicked
off but some old replicas are still terminating.

Respect `creationTimestamp` so to make sure that the newest Pod is taken
after Deployment application, in which case multiple RS Pods (from old
RS and new RS) could be running simultaneously while the rollout is
happening.
With the previous approach, not all associated (mounted) CM/Secrets
changes caused the Deployment to be rolled out, but also the Deployment
could have been rolled out unnecessary during e.g. Ingress or Service
changes (which do not require Pod restarts).

Previously existing Pod removal (state: absent) was not complete as
other pods continued to exist, but also is not needed with this commit
change due to added Pods annotations.

The added Deployment Pod annotations now cause the new ReplicaSet
version to be rolled out, effectively causing replacement of the
previously existing Pods in accordance with the deployment `strategy`
(https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#deploymentstrategy-v1-apps,
`RollingUpdate`) whenever there is a change in the associated CMs or
Secrets referenced in annotations. This implementation is quite standard
and widely used for Helm workflows -
https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
@fosterseth
Copy link
Member

@stanislav-zaprudskiy thanks for splitting out this work from the other PR. We'll get some eyes on this PR shortly

@@ -13,9 +13,17 @@
- status.phase=Running
register: tower_pod

- name: Set the resource pod as a variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will prevent us from grabbing a pod that is in terminating state. It also ensures that only one pod is grabbed (the oldest).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding the corresponding commit message in case it got missed

commit b3a7436

Make AWX Pod variable to be calculated respecting `creationTimestamp` and `deletionTimestamp`

Do not consider Pods marked for deletion when calculating tower_pod to
address replicas scale down case - where normally Pods spawned recently
are being taken for removal. As well as the case when operator kicked
off but some old replicas are still terminating.

Respect `creationTimestamp` so to make sure that the newest Pod is taken
after Deployment application, in which case multiple RS Pods (from old
RS and new RS) could be running simultaneously while the rollout is
happening.

I've also encountered a couple of cases when Terminating pod was picked up, and running awx-manage command on it during the later tasks was not possible (the pod was gone already, etc).

Here indeed the oldest pod is taken, while later in the code the newest (the one most recently created) is used.

@@ -210,21 +250,10 @@
apply: yes
definition: "{{ lookup('template', 'deployments/deployment.yaml.j2') }}"
wait: yes
wait_timeout: "{{ 120 * replicas or 120 }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stanislav-zaprudskiy could you expand on why you added this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the corresponding commit message just in case it got missed

commit e589ceb

When applying Deployment wait up to (timeout * replicas)

There are cases when having a new Deployment may be taking above the
default timeout of 120s.
For instance, when a Deployment has multiple replicas, and each replica
starts on a separate node, and the Deployment specifies new images, then
just pulling these new images for each replica may be taking above the
default timeout of 120s.

Having the default time multiplied by the number of replicas should
provide generally enough time for all replicas to start

The corresponding parameter wait: yes (already provided) causes the task to wait until Deployment pods are ready, and with the default 120 seconds it could fail depending on the overall k8s and AWX configurations, causing the operator run to fail and be started again. Starting a new run won't solve the problem however, as it would run into the same timeout issue again and again until the new pods are ready - so by increasing maximum waiting time originally it saves from further failures and restarts.

Just to add, also the Deployment's rollout strategy configuration could increase the time for having the new pods ready.

There could be cases when scheduling new pods may not be possible (e.g. due to lack of resources, or wrong image configuration, etc) - but the operator would stuck (unnecessary) waiting. Having lower timeout value would make users aware of the problem earlier - but in multi-replica AWX configurations running in multi-node clusters where image caches aren't generally available, the lower timeout values provide too much false-positive failures of operator runs.

@@ -40,10 +40,10 @@

- name: Set secret key secret
set_fact:
__secret_key_secret: '{{ _generated_secret_key["resources"] | default([]) | length | ternary(_generated_secret_key, _secret_key_secret) }}'
secret_key: '{{ _generated_secret_key["resources"] | default([]) | length | ternary(_generated_secret_key, _secret_key_secret) }}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to do some testing around this before merging. cc @TheRealHaoLiu

@TheRealHaoLiu TheRealHaoLiu merged commit f042cb3 into ansible:devel Feb 28, 2023
"secrets/app_credentials",
"storage/persistent",
] %}
checksum-{{ template | replace('/', '-') }}: "{{ lookup('template', template + '.yaml.j2') | md5 }}"
Copy link
Member

@rooftopcellist rooftopcellist Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ This is a great trick! This is an elegant solution to the issue of deployments not being cycled when changes are made to the ConfigMap. @TheRealHaoLiu is impressed too.

@rooftopcellist
Copy link
Member

@stanislav-zaprudskiy Thank you for this PR, it is exemplary. 🏆
I have learned a lot from your contributions, and hope to learn more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants