Skip to content

Commit

Permalink
[CPDEV-96588] Kubemarine cluster reconfigure procedure (#592)
Browse files Browse the repository at this point in the history
* Fix and optimize kubeadm_patches JSON schema

* Add JSON schema for new reconfigure procedure and prepare implementation stub

* Implement enrichment and finalization

* kubeadm reconfigure implementation

* Avoid delete pods, wait for containers refresh in pods

* Rework all existing code to use new API to reconfigure Kubernetes components

* Write new API server certificates if balancer is added

* Add documentation

* Extend `control_plane.configuration_status` PaaS check

Add services.kubelet.config PaaS check.

Generate manifests, kubelet config in dry run mode and compare with stored configs.

Added custom merging and comparing of kubelet-config and kube-proxy ConfigMaps

Added generating of kubelet-config in dry run mode for Kubernetes >= 1.26

Rework `kubernetes.admission` check.

* Rework flow to reconfigure all the components on one node before going to the next ones.

* linter fixes

* Add more unit tests

* Fix bug in components.wait_for_pods

Balancer or worker node could be chosen to run kubectl.

* Change order of tasks when disabling PSP

* Do not check consistency of kubelet-config ConfigMap for Kubernetes < v1.26

* More unit tests for components.py

* Changes to adapt to new procedure inventory merging

---------

Co-authored-by: sekr0614 <[email protected]>
  • Loading branch information
ilia1243 and koryaga authored Feb 19, 2024
1 parent 0b51aa9 commit 6073239
Show file tree
Hide file tree
Showing 44 changed files with 3,531 additions and 1,036 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,14 @@ Kubemarine is an open source, lightweight and powerful management tool built for
- [upgrade](documentation/Maintenance.md#upgrade-procedure)
- [backup](documentation/Maintenance.md#backup-procedure)
- [restore](documentation/Maintenance.md#restore-procedure)
- [reconfigure](documentation/Maintenance.md#reconfigure-procedure)
- [check_iaas](documentation/Kubecheck.md#iaas-procedure)
- [check_paas](documentation/Kubecheck.md#paas-procedure)
- [migrate_kubemarine](documentation/Maintenance.md#kubemarine-migration-procedure)
- [manage_psp](documentation/Maintenance.md#manage-psp-procedure)
- [manage_pss](documentation/Maintenance.md#manage-pss-procedure)
- [cert_renew](documentation/Maintenance.md#certificate-renew-procedure)
- [migrate_cri](documentation/Maintenance.md#migration-cri-procedure)
- [migrate_cri](documentation/Maintenance.md#cri-migration-procedure)
- [Single cluster inventory](documentation/Installation.md#configuration) for all operations, highly customizable
- Default values of all parameters in configurations with a minimum of required parameters
- [Control planes balancing](documentation/Installation.md#full-ha-scheme) with external balancers and VRRP
Expand Down
60 changes: 37 additions & 23 deletions documentation/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,7 @@ For more information about the structure of the inventory and how to specify the
* [Minimal All-in-one Inventory Example](../examples/cluster.yaml/allinone-cluster.yaml) - It provides the minimum set of parameters for deploying All-in-one scheme.
* [Minimal Mini-HA Inventory Example](../examples/cluster.yaml/miniha-cluster.yaml) - It provides the minimum set of parameters for deploying Mini-HA scheme.
#### Inventory validation
#### Inventory validation
When configuring the inventory, you can use your favorite IDE supporting YAML validation by JSON schema.
JSON schema for inventory file can be used by [URL](../kubemarine/resources/schemas/cluster.json?raw=1).
Expand Down Expand Up @@ -1094,23 +1094,25 @@ In the `services.kubeadm` section, you can override the original settings for ku
For more information about these settings, refer to the official Kubernetes documentation at [https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file](https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file).
By default, the installer uses the following parameters:

|Parameter| Default Value |
|---|----------------------------------------------------------|
|kubernetesVersion| `v1.26.11` |
|controlPlaneEndpoint| `{{ cluster_name }}:6443` |
|networking.podSubnet| `10.128.0.0/14` for IPv4 or `fd02::/48` for IPv6 |
|networking.serviceSubnet| `172.30.0.0/16` for IPv4 or `fd03::/112` for IPv6 |
|apiServer.certSANs| List with all nodes internal IPs, external IPs and names |
|apiServer.extraArgs.enable-admission-plugins| `NodeRestriction` |
|apiServer.extraArgs.profiling| `false` |
|apiServer.extraArgs.audit-log-path| `/var/log/kubernetes/audit/audit.log` |
|apiServer.extraArgs.audit-policy-file| `/etc/kubernetes/audit-policy.yaml` |
|apiServer.extraArgs.audit-log-maxage| `30` |
|apiServer.extraArgs.audit-log-maxbackup| `10` |
|apiServer.extraArgs.audit-log-maxsize| `100` |
|scheduler.extraArgs.profiling| `false` |
|controllerManager.extraArgs.profiling| `false` |
|controllerManager.extraArgs.terminated-pod-gc-threshold| `1000` |
| Parameter | Default Value | Description |
|-------------------------------------------------------|----------------------------------------------------------|--------------------------------------------------------------------------------------------------|
|kubernetesVersion | `v1.26.11` | |
|controlPlaneEndpoint | `{{ cluster_name }}:6443` | |
|networking.podSubnet | `10.128.0.0/14` for IPv4 or `fd02::/48` for IPv6 | |
|networking.serviceSubnet | `172.30.0.0/16` for IPv4 or `fd03::/112` for IPv6 | |
|apiServer.certSANs | List with all nodes internal IPs, external IPs and names | Custom SANs are only appended to, but do not override the default list |
|apiServer.extraArgs.enable-admission-plugins | `NodeRestriction` | `PodSecurityPolicy` plugin is added if [Admission psp](#admission-psp) is enabled |
|apiServer.extraArgs.feature-gates | | `PodSecurity=true` is added for Kubernetes < v1.28 if [Admission pss](#admission-pss) is enabled |
|apiServer.extraArgs.admission-control-config-file | `/etc/kubernetes/pki/admission.yaml` | Provided default value **overrides** custom value if [Admission pss](#admission-pss) is enabled. |
|apiServer.extraArgs.profiling | `false` | |
|apiServer.extraArgs.audit-log-path | `/var/log/kubernetes/audit/audit.log` | |
|apiServer.extraArgs.audit-policy-file | `/etc/kubernetes/audit-policy.yaml` | |
|apiServer.extraArgs.audit-log-maxage | `30` | |
|apiServer.extraArgs.audit-log-maxbackup | `10` | |
|apiServer.extraArgs.audit-log-maxsize | `100` | |
|scheduler.extraArgs.profiling | `false` | |
|controllerManager.extraArgs.profiling | `false` | |
|controllerManager.extraArgs.terminated-pod-gc-threshold| `1000` | |

The following is an example of kubeadm defaults override:

Expand Down Expand Up @@ -1147,10 +1149,7 @@ services:

**Note**: Those parameters remain in manifests files after Kubernetes upgrade. That is the proper way to preserve custom settings for system services.

**Warning**: These kubeadm parameters are configurable only during installation, currently.
Kubemarine currently do not provide special procedure to change these parameters after installation.
To reconfigure the parameters manually, refer to the official documentation at [https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-reconfigure](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-reconfigure).
For more information about generic approach of the cluster maintenance, refer to [Maintenance Basics](Maintenance.md#basics).
**Note**: These kubeadm parameters can be reconfigured after installation using [Reconfigure Procedure](Maintenance.md#reconfigure-procedure).

During init, join, upgrade procedures kubeadm runs `preflight` procedure to do some preliminary checks. In case of any error kubeadm stops working. Sometimes it is necessary to ignore some preflight errors to deploy or upgrade successfully.

Expand Down Expand Up @@ -1510,6 +1509,8 @@ By default, the installer uses the following parameters:

`serializeImagePulls` parameter defines whether the images will be pulled in parallel (false) or one at a time.

**Note**: Some of the parameters can be reconfigured after installation using [Reconfigure Procedure](Maintenance.md#reconfigure-procedure).

**Warning**: If you want to change the values of variables `podPidsLimit` and `maxPods`, you have to update the value of the `pid_max` (this value should not less than result of next expression: `maxPods * podPidsLimit + 2048`), which can be done using task `prepare.system.sysctl`. To get more info about `pid_max` you can go to [sysctl](#sysctl) section.

The following is an example of kubeadm defaults override:
Expand Down Expand Up @@ -1544,6 +1545,8 @@ By default, the installer uses the following parameters:

`conntrack.min` inherits the `services.sysctl.net.netfilter.nf_conntrack_max` value from [sysctl](#sysctl).

**Note**: These parameters can be reconfigured after installation using [Reconfigure Procedure](Maintenance.md#reconfigure-procedure).

#### kubeadm_patches

*Installation task*: `deploy.kubernetes`
Expand Down Expand Up @@ -1599,7 +1602,11 @@ services:

By default Kubemarine sets `bind-address` parameter of `kube-apiserver` to `node.internal_address` via patches at every control-plane node.

**Note**: If a parameter of control-plane pods is defined in `kubeadm.<service>.extraArgs` or is set by default by kubeadm and then redefined in `kubeadm.paches`, the pod manifest file will contain the same flag twice and the running pod will take into account the last mentioned value (taken from `kubeadm.patches`). This behaviour persists at the moment: https://github.com/kubernetes/kubeadm/issues/1601.
**Note**: These parameters can be reconfigured after installation using [Reconfigure Procedure](Maintenance.md#reconfigure-procedure).

**Note**: If a parameter of control-plane pods is defined in `kubeadm.<service>.extraArgs` or is set by default by kubeadm and then redefined in `services.kubeadm_patches`,
the pod manifest file will contain the same flag twice and the running pod will take into account the last mentioned value (taken from `services.kubeadm_patches`).
This behaviour persists at the moment: https://github.com/kubernetes/kubeadm/issues/1601.

#### kernel_security

Expand Down Expand Up @@ -5650,6 +5657,13 @@ plugins:
Application of the list merge strategy is allowed in the following sections:
* `plugins.installation.procedures`
* `services.kubeadm.apiServer.extraVolumes`
* `services.kubeadm.controllerManager.extraVolumes`
* `services.kubeadm.scheduler.extraVolumes`
* `services.kubeadm_patches.apiServer`
* `services.kubeadm_patches.controllerManager`
* `services.kubeadm_patches.etcd`
* `services.kubeadm_patches.kubelet`
* `services.kubeadm_patches.scheduler`
* `services.kernel_security.permissive`
* `services.modprobe`
* `services.etc_hosts`
Expand Down
28 changes: 24 additions & 4 deletions documentation/Kubecheck.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ This section provides information about the Kubecheck functionality.
- [201 Kubelet Status](#201-kubelet-status)
- [202 Nodes pid_max](#202-nodes-pid_max)
- [203 Kubelet Version](#203-kubelet-version)
- [233 Kubelet Configuration](#233-kubelet-configuration)
- [234 kube-proxy Configuration](#234-kube-proxy-configuration)
- [205 System Packages Versions](#205-system-packages-version)
- [205 CRI Versions](#205-cri-versions)
- [205 HAproxy Version](#205-haproxy-version)
Expand Down Expand Up @@ -388,8 +390,11 @@ The task tree is as follows:
* configuration
* kubelet
* status
* configuration
* pid_max
* version
* configuration
kube-proxy:
* configuration
* packages
* system
* recommended_versions
Expand Down Expand Up @@ -462,7 +467,7 @@ This test checks the status of the Kubelet service on all hosts in the cluster w

##### 202 Nodes pid_max

*Task*: `services.kubelet.configuration`
*Task*: `services.kubelet.pid_max`

This test checks that kubelet `maxPods` and `podPidsLimit` are correctly aligned with kernel `pid_max`.

Expand All @@ -472,6 +477,19 @@ This test checks that kubelet `maxPods` and `podPidsLimit` are correctly aligned

This test checks the Kubelet version on all hosts in a cluster.

##### 233 Kubelet Configuration

*Task*: `services.kubelet.configuration`

This test checks the consistency of the /var/lib/kubelet/config.yaml configuration
with `kubelet-config` ConfigMap and with the inventory.

##### 234 kube-proxy Configuration

*Task*: `services.kube-proxy.configuration`

This test checks the consistency of the `kube-proxy` ConfigMap with the inventory.

##### 204 Container Runtime Configuration Check

*Task*: `services.container_runtime.configuration`
Expand Down Expand Up @@ -647,13 +665,15 @@ This test verifies ETCD health.

*Task*: `control_plane.configuration_status`

This test verifies the consistency of the configuration (image version, `extra_args`, `extra_volumes`) of static pods of Control Plain like `kube-apiserver`, `kube-controller-manager` and `kube-scheduler`.
This test verifies the consistency of the configuration of static pods of Control Plain
for `kube-apiserver`, `kube-controller-manager`, `kube-scheduler`, and `etcd`.

##### 221 Control Plane Health Status

*Task*: `control_plane.health_status`

This test verifies the health of static pods `kube-apiserver`, `kube-controller-manager` and `kube-scheduler`.
This test verifies the health of static pods `kube-apiserver`, `kube-controller-manager`,
`kube-scheduler`, and `etcd`.

##### 222 Default Services Configuration Status

Expand Down
Loading

0 comments on commit 6073239

Please sign in to comment.