Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPDEV-96588] Kubemarine cluster reconfigure procedure #592

Merged
merged 17 commits into from
Feb 19, 2024

Conversation

ilia1243
Copy link
Contributor

@ilia1243 ilia1243 commented Feb 1, 2024

Description

  • Implement cluster reconfigure procedure
    • Support to reconfigure only kubeadm-managed properties this time.
  • Switch all previous reconfiguring logic to the new procedure.
  • Extend check_paas to check consistency of kubeadm-managed ConfigMaps, configurations on disk (manifests, kubelet config) and the inventory.
  • Fix bug: new balancer was not added to kube-apiserver cert SANs

Fixes #530

Solution

  • Support to reconfigure
    • services.kubeadm.apiServer
    • services.kubeadm.apiServer.certSANs
    • services.kubeadm.scheduler
    • services.kubeadm.controllerManager
    • services.kubeadm.etcd.local.extraArgs
    • services.kubeadm_kubelet (only very restricted set of properties)
    • services.kubeadm_kube-proxy
    • services.kubeadm_patches
  • Main flow is implemented in kubemarine/kubernetes/components.py
    • Only those components are reconfigured, the corresponding sections of which are specified in the procedure inventory.
    • Cluster configuration is generated from inventory and uploaded as ConfigMaps using kubeadm upload-config
    • Control plane manifests and kubelet configuration are generated from ConfigMaps using approapriate kubeadm phase.
    • kube-proxy ConfigMap is manually merged with inventory.
    • Specifial support to write new certificate using apiServer.certSANs and kubeadm init phase certs apiserver.
    • Previous files (manifests, kubelet configuration) are backed up, and compared with new generated files.
    • By default, components are restarted only if changes are detected.
    • The components are reconfigured on each relevant node one by one starting from control planes nodes,
      proceeding to the next nodes only after they are considered up and ready on the reconfigured node.
    • If changes in kubelet configuration are detected, all components are restarted and waited for readiness.
    • Working kube-apiserver is not required to reconfigure control plane components, but required to reconfigure kubelet and kube-proxy.
  • Reworked many procedures to use functionality of components.py.
    • manage_pss, manage_psp
    • deploy.kubernetes.audit task of installation procedure.
    • Some reconfiguration before Kubernetes upgrade in upgrade procedure.
    • cert_renew.
  • Extended PaaS check
    • control_plane.configuration_status generates manifests in dry run mode and compares with persisted manifests.
      • support etcd and kubeadm_patches.
    • services.kubelet.configuration is renamed to services.kubelet.pid_max
    • Added services.kubelet.configuration that
      1. generates kubelet configs in dry run mode and compares with configurations on disk
      2. generates kubelet-config ConfigMap and compares with current ConfigMap on the cluster.
    • Added custom merging and comparing of kubelet-config and kube-proxy ConfigMaps
      (new services.kube-proxy.configuration check)
    • All checks now take into account services.kubeadm_patches.
  • deploy.kubernetes.init task of add_node procedure now reconfigures cert SANs of kube-apiserver.
  • Added waiting for kube-proxy pods on worker nodes in many procedures.

API Changes

  • manage_pss procedure: replaced delete_default_pss and apply_default_pss tasks to single manage_pss task.
  • manage_psp procedure: replaced reconfigure_oob and reconfigure_plugin tasks to single reconfigure_psp task.

Regression Testing

  • manage_pss, manage_psp - cases that need to reconfigure the kube-apiserver.
  • cert_renew Kubernetes certs should restart all components
  • Upgrade from v1.27 to v1.29 with enabled PSS.
  • Reconfiguring of audit using install --tasks deploy.kubernetes.audit.

Test Cases

TestCase 1

Check reconfiguring of all kubeadm-managed properties

  • CRI: docker / containerd
  • Different operating systems
  • Different Kubernetes versions, at least v1.23, v1.25+
  • All-in-one scheme, scheme with two or more dedicated workers, and control planes.

Steps:

  1. Run reconfigure procedure providing new properties for all Kubernetes components
    (etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet, kube-proxy),
    with different patches.

ER: All components are reconfigured and restarted on all relevant nodes one by one.

TestCase 2

Reconfigure some supported kubeadm-managed properties.

Steps:

  1. Specify only subset of supported sections in the procedure inventory.

ER: Only affected components are reconfigured.

TestCase 3

Run procedure without changes.

Steps:

  1. Configure new properties in procedure inventory that are the same as already configured properties in cluster.yaml.

ER: Components are not reconfigured, not restarted, but pods are waited for.

TestCase 4

Run procedure with empty sections.

Steps:

  1. Change cluster.yaml manually, and configure empty sections in the procedure inventory.

ER: Only affected components are reconfigured.

TestCase 5

Configure new properties that break some components.

Steps:

  1. Install cluster with more than one control plane or worker.
  2. Configure new properties that break some components and run kubemarine reconfigure.

ER: Procedure fails after components on the first node not started.

TestCase 6

TC from issue #586

TestCase 7

PaaS check etcd configuration.

Steps:

  1. Install cluster >= v1.26
  2. Add some arguments manually to /etc/kubernetes/manifests/etcd.yaml`
  3. Run check_paas --tasks control_plane.configuration_status

ER: Test checks that manifests is not consistent with the inventory and kubeadm-config ConfigMap.

TestCase 8

PaaS check services.kubeadm_patches configuration.

Steps:

  1. Install cluster, and then add some patches to services.kubeadm_patches (but not on the cluster).
  2. Run check_paas --tasks control_plane.configuration_status,services.kubelet.configuration

ER: Test checks that manifests, kubelet config are not consistent with the inventory.

TestCase 9

PaaS check services.kubeadm_kubelet configuration.

Steps:

  1. Install cluster, and then add some properties manually to /var/lib/kubelet/config.yaml`
  2. Run check_paas --tasks services.kubelet.configuration

ER: Test checks that config.yaml, kubelet-config ConfigMap, and inventory are not consistent.

TestCase 10

PaaS check kubelet-config ConfigMap.

Steps:

  1. Install cluster of version >= v1.26,
  2. Add / change / delete some properties in services.kubeadm_kubelet (but not on the cluster).
  3. Run check_paas --tasks services.kubelet.configuration

ER: Test checks that the ConfigMap is not consistent with the inventory.

TestCase 11

PaaS check kube-proxy ConfigMap.

Steps:

  1. Install cluster of any version,
  2. Add / change some properties in services.kubeadm_kube-proxy (but not on the cluster).
  3. Run check_paas --tasks services.kube-proxy.configuration

ER: Test checks that the ConfigMap is not consistent with the inventory.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • Integration CI passed
  • Unit tests. If Yes list of new/changed tests with brief description
  • There is no merge conflicts

Unit tests

test_reconfigure.py - enrichment, reconfiguration of Kubernetes components
test_add_node.py - to cover issue #530
test_group.py - new NodeGroup.wait_commands_successful method
test_inventory.py - PSS / PSP enrichment
test_kubernetes_components.py - cover some functionality of kubernetes/components.py
test_manage_psp.py / test_manage_pss.py - Manage PSS / PSP enrichment and reconfiguration of kube-apiserver
test_upgrade.py - reconfiguration of kube-apiserver, kube-proxy during upgrade from v1.27 to v1.29

@ilia1243 ilia1243 marked this pull request as draft February 1, 2024 14:59
@ilia1243 ilia1243 force-pushed the feature/kubeadm_reconfigure branch from 96a26c4 to 5b7ce0d Compare February 1, 2024 15:19
@ilia1243 ilia1243 force-pushed the feature/kubeadm_reconfigure branch 6 times, most recently from 338befe to 495a513 Compare February 7, 2024 09:59
@ilia1243 ilia1243 added the bug Something isn't working label Feb 7, 2024
@ilia1243 ilia1243 requested review from alexarefev and n549 February 7, 2024 14:51
@ilia1243 ilia1243 force-pushed the feature/kubeadm_reconfigure branch from 495a513 to 0c8c654 Compare February 8, 2024 07:11
@ilia1243 ilia1243 force-pushed the feature/kubeadm_reconfigure branch from 4fea911 to 83bbbe0 Compare February 9, 2024 10:09
@ilia1243 ilia1243 changed the title [CPDEV-96588] Draft: Kubemarine cluster reconfigure procedure [CPDEV-96588] Kubemarine cluster reconfigure procedure Feb 9, 2024
@ilia1243 ilia1243 marked this pull request as ready for review February 9, 2024 10:30
@ilia1243 ilia1243 force-pushed the feature/kubeadm_reconfigure branch 3 times, most recently from 0c44999 to b1dd442 Compare February 16, 2024 14:43
@ilia1243 ilia1243 requested a review from shmo1218 February 16, 2024 14:49
@ilia1243 ilia1243 force-pushed the feature/kubeadm_reconfigure branch from b1dd442 to 789567b Compare February 19, 2024 08:22
@koryaga koryaga merged commit 6073239 into main Feb 19, 2024
41 checks passed
@koryaga koryaga deleted the feature/kubeadm_reconfigure branch February 19, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New balancer is not added to kube-apiserver cert SANs
4 participants