-
Notifications
You must be signed in to change notification settings - Fork 556
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial set of updates for v1.7 without detailed documentation for each topic. Signed-off-by: Andrey Smirnov <[email protected]>
- Loading branch information
Showing
13 changed files
with
896 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
--- | ||
title: "CA Rotation" | ||
description: "How to rotate Talos and Kubernetes API root certificate authorities." | ||
--- | ||
|
||
In general, you almost never need to rotate the root CA certificate and key for the Talos API and Kubernetes API. | ||
Talos sets up root certificate authorities with the lifetime of 10 years, and all Talos and Kubernetes API certificates are issued by these root CAs. | ||
So the rotation of the root CA is only needed if: | ||
|
||
- you suspect that the private key has been compromised; | ||
- you want to revoke access to the cluster for a leaked `talosconfig` or `kubeconfig`; | ||
- once in 10 years. | ||
|
||
## Overview | ||
|
||
There are some details which make Talos and Kubernetes API root CA rotation a bit different, but the general flow is the same: | ||
|
||
- generate new CA certificate and key; | ||
- add new CA certificate as 'accepted', so new certificates will be accepted as valid; | ||
- swap issuing CA to the new one, old CA as accepted; | ||
- refresh all certificates in the cluster; | ||
- remove old CA from 'accepted'. | ||
|
||
At the end of the flow, old CA is completely removed from the cluster, so all certificates issued by it will be considered invalid. | ||
|
||
Both rotation flows are described in detail below. | ||
|
||
## Talos API | ||
|
||
### Automated Talos API CA Rotation | ||
|
||
Talos API CA rotation doesn't interrupt connections within the cluster, and it doesn't require a reboot of the nodes. | ||
|
||
Run the following command in dry-run mode to see the steps which will be taken: | ||
|
||
```shell | ||
$ talosctl -n <CONTROLPLANE> rotate-ca --dry-run=true --talos=true --kubernetes=false | ||
> Starting Talos API PKI rotation, dry-run mode true... | ||
> Using config context: "talos-default" | ||
> Using Talos API endpoints: ["172.20.0.2"] | ||
> Cluster topology: | ||
- control plane nodes: ["172.20.0.2"] | ||
- worker nodes: ["172.20.0.3"] | ||
> Current Talos CA: | ||
... | ||
``` | ||
|
||
No changes will be done to the cluster in dry-run mode, so you can safely run it to see the steps. | ||
|
||
Before proceeding, make sure that you can capture the output of `talosctl` command, as it will contain the new CA certificate and key. | ||
Record a list of Talos API users to make sure they can all be updated with new `talosconfig`. | ||
|
||
Run the following command to rotate the Talos API CA: | ||
|
||
```shell | ||
$ talosctl -n <CONTROLPLANE> rotate-ca --dry-run=false --talos=true --kubernetes=false | ||
> Starting Talos API PKI rotation, dry-run mode false... | ||
> Using config context: "talos-default-268" | ||
> Using Talos API endpoints: ["172.20.0.2"] | ||
> Cluster topology: | ||
- control plane nodes: ["172.20.0.2"] | ||
- worker nodes: ["172.20.0.3"] | ||
> Current Talos CA: | ||
... | ||
> New Talos CA: | ||
... | ||
> Generating new talosconfig: | ||
context: talos-default | ||
contexts: | ||
talos-default: | ||
.... | ||
> Verifying connectivity with existing PKI: | ||
- 172.20.0.2: OK (version {{< release >}}) | ||
- 172.20.0.3: OK (version {{< release >}}) | ||
> Adding new Talos CA as accepted... | ||
- 172.20.0.2: OK | ||
- 172.20.0.3: OK | ||
> Verifying connectivity with new client cert, but old server CA: | ||
2024/04/17 21:26:07 retrying error: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: remote error: tls: unknown certificate authority" | ||
- 172.20.0.2: OK (version {{< release >}}) | ||
- 172.20.0.3: OK (version {{< release >}}) | ||
> Making new Talos CA the issuing CA, old Talos CA the accepted CA... | ||
- 172.20.0.2: OK | ||
- 172.20.0.3: OK | ||
> Verifying connectivity with new PKI: | ||
2024/04/17 21:26:08 retrying error: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"x509: Ed25519 verification failure\" while trying to verify candidate authority certificate \"talos\")" | ||
- 172.20.0.2: OK (version {{< release >}}) | ||
- 172.20.0.3: OK (version {{< release >}}) | ||
> Removing old Talos CA from the accepted CAs... | ||
- 172.20.0.2: OK | ||
- 172.20.0.3: OK | ||
> Verifying connectivity with new PKI: | ||
- 172.20.0.2: OK (version {{< release >}}) | ||
- 172.20.0.3: OK (version {{< release >}}) | ||
> Writing new talosconfig to "talosconfig" | ||
``` | ||
|
||
Once the rotation is done, stash the new Talos CA, update `secrets.yaml` (if using that for machine configuration generation) with new CA key and certificate. | ||
|
||
The new client `talosconfig` is written to the current directory as `talosconfig`. | ||
You can merge it to the default location with `talosctl config merge ./talosconfig`. | ||
|
||
If other client access `talosconfig` files needs to be generated, use `talosctl config new` with new `talosconfig`. | ||
|
||
> Note: if using [Talos API access from Kubernetes]({{< relref "./talos-api-access-from-k8s" >}}) feature, pods might need to be restarted manually to pick up new `talosconfig`. | ||
### Manual Steps for Talos API CA Rotation | ||
|
||
1. Generate new Talos CA (e.g. use `talosctl gen secrets` and use Talos CA). | ||
2. Patch machine configuration on all nodes updating `.machine.acceptedCAs` with new CA certificate. | ||
3. Generate `talosconfig` with client certificate generated with new CA, but still using old CA as server CA, verify connectivity, Talos should accept new client certificate. | ||
4. Patch machine configuration on all nodes updating `.machine.ca` with new CA certificate and key, and keeping old CA certificate in `.machine.acceptedCAs` (on worker nodes `.machine.ca` doesn't have the key). | ||
5. Generate `talosconfig` with both client certificate and server CA using new CA PKI, verify connectivity. | ||
6. Remove old CA certificate from `.machine.acceptedCAs` on all nodes. | ||
7. Verify connectivity. | ||
|
||
## Kubernetes API | ||
|
||
### Automated Kubernetes API CA Rotation | ||
|
||
The automated process only rotates Kubernetes API CA, used by the `kube-apiserver`, `kubelet`, etc. | ||
Other Kubernetes secrets might need to be rotated manually as required. | ||
Kubernetes pods might need to be restarted to handle changes, and communication within the cluster might be disrupted during the rotation process. | ||
|
||
Run the following command in dry-run mode to see the steps which will be taken: | ||
|
||
```shell | ||
$ talosctl -n <CONTROLPLANE> rotate-ca --dry-run=true --talos=false --kubernetes=true | ||
> Starting Kubernetes API PKI rotation, dry-run mode true... | ||
> Cluster topology: | ||
- control plane nodes: ["172.20.0.2"] | ||
- worker nodes: ["172.20.0.3"] | ||
> Building current Kubernetes client... | ||
> Current Kubernetes CA: | ||
... | ||
``` | ||
|
||
Before proceeding, make sure that you can capture the output of `talosctl` command, as it will contain the new CA certificate and key. | ||
As Talos API access will not be disrupted, the changes can be reverted back if needed by reverting machine configuration. | ||
|
||
Run the following command to rotate the Kubernetes API CA: | ||
|
||
```shell | ||
$ talosctl -n <CONTROLPLANE> rotate-ca --dry-run=false --talos=false --kubernetes=true | ||
> Starting Kubernetes API PKI rotation, dry-run mode false... | ||
> Cluster topology: | ||
- control plane nodes: ["172.20.0.2"] | ||
- worker nodes: ["172.20.0.3"] | ||
> Building current Kubernetes client... | ||
> Current Kubernetes CA: | ||
... | ||
> New Kubernetes CA: | ||
... | ||
> Verifying connectivity with existing PKI... | ||
- OK (2 nodes ready) | ||
> Adding new Kubernetes CA as accepted... | ||
- 172.20.0.2: OK | ||
- 172.20.0.3: OK | ||
> Making new Kubernetes CA the issuing CA, old Kubernetes CA the accepted CA... | ||
- 172.20.0.2: OK | ||
- 172.20.0.3: OK | ||
> Building new Kubernetes client... | ||
> Verifying connectivity with new PKI... | ||
2024/04/17 21:45:52 retrying error: Get "https://172.20.0.1:6443/api/v1/nodes": EOF | ||
- OK (2 nodes ready) | ||
> Removing old Kubernetes CA from the accepted CAs... | ||
- 172.20.0.2: OK | ||
- 172.20.0.3: OK | ||
> Verifying connectivity with new PKI... | ||
- OK (2 nodes ready) | ||
> Kubernetes CA rotation done, new 'kubeconfig' can be fetched with `talosctl kubeconfig`. | ||
``` | ||
|
||
At the end of the process, Kubernetes control plane components will be restarted to pick up CA certificate changes. | ||
Each node `kubelet` will re-join the cluster with new client certficiate. | ||
|
||
New `kubeconfig` can be fetched with `talosctl kubeconfig` command from the cluster. | ||
|
||
Kubernetes pods might need to be restarted manually to pick up changes to the Kubernetes API CA. | ||
|
||
### Manual Steps for Kubernetes API CA Rotation | ||
|
||
Steps are similar [to the Talos API CA rotation](#manual-steps-for-talos-api-ca-rotation), but use: | ||
|
||
- `.cluster.acceptedCAs` in place of `.machine.acceptedCAs`; | ||
- `.cluster.ca` in place of `.machine.ca`; | ||
- `kubeconfig` in place of `talosconfig`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
title: "Watchdog Timers" | ||
description: "Using hardware watchdogs to workaround hardware/software lockups." | ||
--- | ||
|
||
Talos Linux now supports hardware watchdog timers configuration. | ||
Hardware watchdog timers allow to reset (reboot) the system if the software stack becomes unresponsive. | ||
Please consult your hardware/VM documentation for the availability of the hardware watchdog timers. | ||
|
||
## Configuration | ||
|
||
To discover the available watchdog devices, run: | ||
|
||
```shell | ||
$ talosctl ls /sys/class/watchdog/ | ||
NODE NAME | ||
172.20.0.2 . | ||
172.20.0.2 watchdog0 | ||
172.20.0.2 watchdog1 | ||
``` | ||
|
||
The implementation of the watchdog device can be queried with: | ||
|
||
```shell | ||
$ talosctl read /sys/class/watchdog/watchdog0/identity | ||
i6300ESB timer | ||
``` | ||
|
||
To enable the watchdog timer, patch the machine configuration with the following: | ||
|
||
```yaml | ||
# watchdog.yaml | ||
apiVersion: v1alpha1 | ||
kind: WatchdogTimerConfig | ||
device: /dev/watchdog0 | ||
timeout: 5m | ||
``` | ||
```shell | ||
talosctl patch mc -p @watchdog.yaml | ||
``` | ||
|
||
Talos Linux will set up the watchdog time with a 5-minute timeout, and it will keep resetting the timer to prevent the system from rebooting. | ||
If the software becomes unresponsive, the watchdog timer will expire, and the system will be reset by the watchdog hardware. | ||
|
||
## Inspection | ||
|
||
To inspect the watchdog timer configuration, run: | ||
|
||
```shell | ||
$ talosctl get watchdogtimerconfig | ||
NODE NAMESPACE TYPE ID VERSION DEVICE TIMEOUT | ||
172.20.0.2 runtime WatchdogTimerConfig timer 1 /dev/watchdog0 5m0s | ||
``` | ||
|
||
To inspect the watchdog timer status, run: | ||
|
||
```shell | ||
$ talosctl get watchdogtimerstatus | ||
NODE NAMESPACE TYPE ID VERSION DEVICE TIMEOUT | ||
172.20.0.2 runtime WatchdogTimerStatus timer 1 /dev/watchdog0 5m0s | ||
``` | ||
|
||
Current status of the watchdog timer can also be inspected via Linux sysfs: | ||
|
||
```shell | ||
$ talosctl read /sys/class/watchdog/watchdog0/state | ||
active | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.