Skip to content

Commit

Permalink
Check health on control-plane before acting on cluster
Browse files Browse the repository at this point in the history
  • Loading branch information
Zempashi committed Sep 15, 2023
1 parent c06ee1f commit d494c9d
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ For hooks where a variable-per-hook is exposed, see [hooks && plugins](hooks_and
| apiserver_manifest | control plane | "/etc/kubernetes/manifests/kube-apiserver.yaml" | filename to stat for presence in the process to discover already running control-plane |
| cluster_config | control plane | {} | config to be used by kubeadm for the `kind: CluserConfiguration` |
| control_plane_endpoint | control plane | "" (let kubeadm default) | control the "controlPlaneEndpoint" entry of the cluster_config. Could also be set as part of the cluster_config. Default to nothing but ansible-kubeadm will fail if not set in case of multi-control-plane nodes cluster |
| cp_health_check_bypass | control_plane | false | Bypass check on control-plane health |
| enable_kubeadm_patches | control plane | true | Deploy patches and pass `kubeadm_patch_dir` to kubeadm so that patch are applied |
| kube_control_plane_cidr | control plane | "" (let kubeadm default) | CIDR (eg "192.168.99.0/24") filter addresses for `_etcd_metrics_bind_address`, `_kube_apiserver_advertise_address`, `_kube_controller_manager_bind_address`, `_kube_scheduler_bind_address` |
| kube_apiserver_advertise_cidr | control plane | "" (let kubeadm default) | CIDR (eg "192.168.99.0/24") filter the advertise address to `_kube_apiserver_advertise_address` (override `kube_control_plane_cidr`) |
Expand Down
1 change: 1 addition & 0 deletions roles/preflight_check_cp/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
_config_upgrade_reasons: {}
_failure_reasons: {}
_upgrade_reasons: {}
cp_health_check_bypass: false

kube_version:
default_kube_version: '1.19'
24 changes: 24 additions & 0 deletions roles/preflight_check_cp/tasks/check_control_plane_health.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
- name: 'get kubeadm configmap if cluster running'
command: kubectl get nodes -o yaml
changed_when: false
check_mode: false
register: _all_nodes_yaml
environment:
KUBECONFIG: '{{ kubeconfig_admin }}'

- name: 'Check control-plane health'
set_fact:
_failure_reasons: >-
{%- set all_nodes = (_all_nodes_yaml.stdout|from_yaml)|'items']
|selectattr("metadata.labels.node-role\.kubernetes\.io/control-plane", "defined") -%}
{%- if all_nodes|map(attribute="status.conditions")
|map("selectattr", "type", "eq", "Ready")
|map("first")
|rejectattr("status", "eq", "True")
|list|length != 0 -%}
{%- set _ = _failure_reasons.dict(dict(
cp_health = "Some control plane are not healthy")) -%}
{%- endif -%}
{%- endif -%}
when: not cp_health_check_bypass|bool
5 changes: 5 additions & 0 deletions roles/preflight_check_cp/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,8 @@
- import_tasks: check_version.yml

- import_tasks: check_control_plane_endpoint.yml

- import_tasks: check_control_plane_health.yml
when:
- groups.cp_running|default([])|length > 0
- not cp_health_check_bypass|bool

0 comments on commit d494c9d

Please sign in to comment.