fix: Change pod readiness check mechanism #249

lukapetrovic-git · 2024-09-16T13:29:28Z

Description

If needed i can open an issue for this as well. The following happens:

The check if all pods are ready fails in some of my clusters due to grep catching pods that it is not supposed to, for example:

In the situation above i have pods that as part of their name have init, and the check never passes, so i changed it to check the metadata of the pod itself and figure out its phase https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Small minor change not affecting the Ansible Role code (GitHub Actions Workflow, Documentation etc.)

How Has This Been Tested?

Tested on Ubuntu 22.04, RKE2 v1.27.12+rke2r1 on a dev cluster and one production cluster where the problems were happening.

lukapetrovic-git · 2024-09-16T16:32:41Z

tasks/change_config.yml

  args:
    executable: /bin/bash
-  failed_when: "all_pods_ready.rc not in [ 0, 1 ]"
+  failed_when: "all_pods_ready.rc != 0"


I am not sure why return code 1 was considered ok here, so i made this change, if there is something im not seeing, please comment @MonolithProjects

It was like that because if grep found no matches (all pods in Ready state), the command would return 1 and the task would fail. But with your approach it's fine to change this. I guess now you can even remove this line.

lukapetrovic-git · 2024-09-27T11:51:33Z

Another question i have regarding this task, why are pods running in kube-system exempt from the check (metadata.namespace!=kube-system)?
One example: When RKE2 service is restarted in my case Cilium pods also get restarted, they run in the kube-system ns and are crucial to the functioning of the cluster as a whole. Cheers!

MonolithProjects · 2024-10-06T15:33:16Z

tasks/change_config.yml

  args:
    executable: /bin/bash
-  failed_when: "all_pods_ready.rc not in [ 0, 1 ]"
+  failed_when: "all_pods_ready.rc != 0"


It was like that because if grep found no matches (all pods in Ready state), the command would return 1 and the task would fail. But with your approach it's fine to change this. I guess now you can even remove this line.

MonolithProjects · 2024-10-06T15:46:10Z

tasks/rolling_restart.yml

  args:
    executable: /bin/bash
-  failed_when: "all_pods_ready.rc not in [ 0, 1 ]"
+  failed_when: "all_pods_ready.rc != 0"


This is not needed anymore

MonolithProjects · 2024-10-06T15:59:41Z

Another question i have regarding this task, why are pods running in kube-system exempt from the check (metadata.namespace!=kube-system)? One example: When RKE2 service is restarted in my case Cilium pods also get restarted, they run in the kube-system ns and are crucial to the functioning of the cluster as a whole. Cheers!

Hmm actually this is something i overlooked. It does not make much sense to me to exclude the pods in this namespace from the check.

tasks/change_config.yml

tasks/rolling_restart.yml

lukapetrovic-git · 2024-10-10T14:03:53Z

Thanks for the review, i made the changes that were requested. I don't work too much with Github, so i'm not sure if i should resolve the changes you requested. Cheers!

MonolithProjects · 2024-10-12T17:18:07Z

Thanks for the review, i made the changes that were requested. I don't work too much with Github, so i'm not sure if i should resolve the changes you requested. Cheers!

That's fine, i will do it. Thanks!

MonolithProjects

LGTM

change pod check

87d7cc7

lukapetrovic-git marked this pull request as ready for review September 16, 2024 14:14

lukapetrovic-git changed the title ~~Change pod readiness check mechanism~~ fix: Change pod readiness check mechanism Sep 16, 2024

Change return code to only 0

e53df88

lukapetrovic-git commented Sep 16, 2024

View reviewed changes

Merge branch 'lablabs:main' into main

5bb7820

MonolithProjects requested changes Oct 6, 2024

View reviewed changes

tasks/change_config.yml Outdated Show resolved Hide resolved

tasks/rolling_restart.yml Outdated Show resolved Hide resolved

MonolithProjects added the bug Something isn't working label Oct 6, 2024

Add checks for pods in kube-system + remove failed_when

5e6523c

lukapetrovic-git requested a review from MonolithProjects October 10, 2024 14:01

MonolithProjects approved these changes Oct 12, 2024

View reviewed changes

MonolithProjects merged commit a77f6fd into lablabs:main Oct 12, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Change pod readiness check mechanism #249

fix: Change pod readiness check mechanism #249

lukapetrovic-git commented Sep 16, 2024 •

edited

Loading

lukapetrovic-git Sep 16, 2024

MonolithProjects Oct 6, 2024

lukapetrovic-git Oct 10, 2024

lukapetrovic-git commented Sep 27, 2024 •

edited

Loading

MonolithProjects Oct 6, 2024

MonolithProjects Oct 6, 2024

lukapetrovic-git Oct 10, 2024

MonolithProjects commented Oct 6, 2024

lukapetrovic-git commented Oct 10, 2024

MonolithProjects commented Oct 12, 2024

MonolithProjects left a comment

fix: Change pod readiness check mechanism #249

fix: Change pod readiness check mechanism #249

Conversation

lukapetrovic-git commented Sep 16, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

lukapetrovic-git Sep 16, 2024

Choose a reason for hiding this comment

MonolithProjects Oct 6, 2024

Choose a reason for hiding this comment

lukapetrovic-git Oct 10, 2024

Choose a reason for hiding this comment

lukapetrovic-git commented Sep 27, 2024 • edited Loading

MonolithProjects Oct 6, 2024

Choose a reason for hiding this comment

MonolithProjects Oct 6, 2024

Choose a reason for hiding this comment

lukapetrovic-git Oct 10, 2024

Choose a reason for hiding this comment

MonolithProjects commented Oct 6, 2024

lukapetrovic-git commented Oct 10, 2024

MonolithProjects commented Oct 12, 2024

MonolithProjects left a comment

Choose a reason for hiding this comment

lukapetrovic-git commented Sep 16, 2024 •

edited

Loading

lukapetrovic-git commented Sep 27, 2024 •

edited

Loading