-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Change pod readiness check mechanism #249
Conversation
tasks/change_config.yml
Outdated
args: | ||
executable: /bin/bash | ||
failed_when: "all_pods_ready.rc not in [ 0, 1 ]" | ||
failed_when: "all_pods_ready.rc != 0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why return code 1 was considered ok here, so i made this change, if there is something im not seeing, please comment @MonolithProjects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was like that because if grep
found no matches (all pods in Ready state), the command would return 1
and the task would fail. But with your approach it's fine to change this. I guess now you can even remove this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
Another question i have regarding this task, why are pods running in kube-system exempt from the check (metadata.namespace!=kube-system)? |
tasks/change_config.yml
Outdated
args: | ||
executable: /bin/bash | ||
failed_when: "all_pods_ready.rc not in [ 0, 1 ]" | ||
failed_when: "all_pods_ready.rc != 0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was like that because if grep
found no matches (all pods in Ready state), the command would return 1
and the task would fail. But with your approach it's fine to change this. I guess now you can even remove this line.
tasks/rolling_restart.yml
Outdated
args: | ||
executable: /bin/bash | ||
failed_when: "all_pods_ready.rc not in [ 0, 1 ]" | ||
failed_when: "all_pods_ready.rc != 0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
Hmm actually this is something i overlooked. It does not make much sense to me to exclude the pods in this namespace from the check. |
Thanks for the review, i made the changes that were requested. I don't work too much with Github, so i'm not sure if i should resolve the changes you requested. Cheers! |
That's fine, i will do it. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
If needed i can open an issue for this as well. The following happens:
The check if all pods are ready fails in some of my clusters due to grep catching pods that it is not supposed to, for example:
In the situation above i have pods that as part of their name have init, and the check never passes, so i changed it to check the metadata of the pod itself and figure out its phase https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase.
Type of change
How Has This Been Tested?
Tested on Ubuntu 22.04, RKE2 v1.27.12+rke2r1 on a dev cluster and one production cluster where the problems were happening.