diff --git a/docs/pages/setup/operations/ca-rotation.mdx b/docs/pages/setup/operations/ca-rotation.mdx
index 6be371fca1769..934b20114de66 100644
--- a/docs/pages/setup/operations/ca-rotation.mdx
+++ b/docs/pages/setup/operations/ca-rotation.mdx
@@ -11,57 +11,82 @@ description: How to rotate Teleport's certificate authority
(!docs/pages/includes/tctl.mdx!)
-For cloud, login with a teleport user with editor privileges:
-```code
-# tsh logs you in and receives short-lived certificates
-$ tsh login --proxy=myinstance.teleport.sh --user=email@example.com
-# try out the connection
-$ tctl get nodes
-```
+ For Cloud, log in with a Teleport user with editor privileges:
+ ```code
+ # tsh logs you in and receives short-lived certificates
+ $ tsh login --proxy=myinstance.teleport.sh --user=email@example.com
+ # try out the connection
+ $ tctl get nodes
+ ```
-## Certificate Authority Rotation
-
-Take a look at the [Certificates chapter](../../architecture/authentication.mdx#authentication-in-teleport) in the
-architecture document to learn how the certificate authority rotation works.
+## Certificate Authority rotation
This section will show you how to implement certificate rotation in practice.
-During manual and semi-automatic certificate authority rotation, Teleport generates a new certificate
-authority and issues certificates for auth servers, proxies, nodes and users.
+
+ If you are using [CA
+ Pinning](../admin/adding-nodes.mdx#untrusted-auth-servers) when adding new
+ nodes, the CA pin will change after the rotation. Make sure you use the *new*
+ CA pin when adding nodes after rotation.
+
+
+### Rotation phases
+
+The rotation consists of several phases:
+
+- `standby`: All operations have completed or haven't started yet.
+- `init`: All components are notified of the rotation. A new certificate
+ authority is issued, but not used. It is necessary for remote trusted clusters
+ to fetch the new certificate authority, otherwise new clients will reject it.
+- `update_clients`: Internal clients certs are updated and reloaded. Servers
+ will use and respond with old credentials because clients have no idea about
+ new certificates at first.
+- `update_servers`: Servers reload and start serving TLS and SSH certificates
+ signed by the new certificate authority, but will still accept certificates
+ issued by the old certificate authority.
+- `rollback`: The rotation was aborted and is rolling back to the old
+ certificate authority.
-Rotation consists of several phases:
+### Rotation types
-- `standby` All operations have completed or haven't started yet.
-- `init` - All components are notified of the rotation. A new certificate authority is issued, but not used.
- It is necessary for remote trusted clusters to fetch the new certificate authority, otherwise the new clients
- will reject it.
-- `update_clients` - internal clients certs are updated and reloaded.
- Servers will use and respond with old credentials because clients have no idea about new certificates at first.
-- `update_servers` Servers will reload and would start serving
-TLS and SSH certificates signed by the new certificate authority, but will still accept certificates
-issued by old certificate authority.
-- `rollback` rotation is rolling back to the old certificate authority.
+There are two kinds of certificate rotations:
-Both in manual and semi-automatic rotation, cluster goes through the states above in sequence:
+- **Manual:** it is the cluster administrator's reponsibility to transition
+ between each phase of the rotation while monitoring the state of the cluster.
+ Manual rotations provide the greatest level of control, and are performed by
+ providing the desired phase using the `--phase` flag with the
+ `tctl auth rotate` command.
+- **Semi-automatic:** Teleport automatically transitions between phases of the
+ rotation after some amount of time (known as a *grace period*) elapses.
+
+For both types of rotations, the cluster goes through the phases in the
+following order:
- `standby` -> `init` -> `update_clients` -> `update_servers` -> `standby`
-Administrators can rollback all the changes before rotation is completed by entering `standby`.
+Administrators can abort the rotation and revert all changes any time before
+the rotation is completed by entering the `rollback` phase.
+
+```sh
+$ tctl auth rotate --phase=rollback --manual
+```
-For example, if admin has detected that some nodes failed to upgrade during `update_servers`,
-they can rollback to the previous certificate authority:
+For example, if an admin has detected that some nodes failed to upgrade during
+`update_servers`, they can roll back to the previous certificate authority, and
+the phase transitions look like this:
- `update_servers` -> `rollback` -> `standby`.
-Try rotation/rollback in manual mode first to understand all the edge-cases
-and gotchas before going with semi-automatic version.
+ Try rotation/rollback in manual mode first to understand all the edge-cases
+ and gotchas before going with semi-automatic version.
## Manual rotation
-In manual mode, we would transition between phases while monitoring the state of the cluster.
+In manual mode, we manually transition between phases while monitoring the state
+of the cluster.
**Start the rotation**
@@ -72,7 +97,7 @@ $ tctl auth rotate --phase=init --manual --type=host
Updated rotation phase to "init". To check status use 'tctl status'
```
-Cluster status will reflect active rotation in progress:
+Use `tctl` to confirm that there is an active rotation in progress:
```code
$ tctl status
@@ -96,23 +121,24 @@ $ tctl get nodes --format=json | jq '.[] | {hostname: .spec.hostname, rotation:
}
```
-Host `terminal` has updated it status to phase `init`. It has downloaded a new CA public key and is ready
-for state transitions.
+In this example, the node named `terminal` has updated its status to phase
+`init`. This means it has downloaded a new CA public key and is ready for state
+transitions.
-
-If some nodes are offline during rotation or have failed to update the status,
-you will lose connectivity after the transition `update_servers` -> `standby`. Make sure that all
-nodes are up to date with the transitions.
+
+ If some nodes are offline during rotation or have failed to update the status,
+ you will lose connectivity after the transition `update_servers` -> `standby`.
+ Make sure that all nodes are up to date with the transitions before
+ proceeding.
**Update clients**
-Execute transition `init` -> `update_clients`:
+Execute the transition from `init` to `update_clients`:
```code
$ tctl auth rotate --phase=update_clients --manual
-# Updated rotation phase to "init". To check status use 'tctl status'
+# Updated rotation phase to "update_clients". To check status use 'tctl status'
$ tctl status
# Cluster acme.cluster
# Version (=teleport.version=)
@@ -120,7 +146,8 @@ $ tctl status
```
-Clients will temporarily lose connectivity during proxy and auth servers restarts.
+ Clients will temporarily lose connectivity during Proxy and Auth Server
+ restarts.
Verify that nodes have caught up and now see the current cluster state:
@@ -136,11 +163,12 @@ $ tctl get nodes --format=json | jq '.[] | {hostname: .spec.hostname, rotation:
**Update servers**
-All nodes have caught up. Execute the transition `update_clients` -> `update_servers`:
+Now that all nodes have caught up, execute the transition from `update_clients`
+to `update_servers`:
```code
$ tctl auth rotate --phase=update_servers --manual
-# Updated rotation phase to "init". To check status use 'tctl status'
+# Updated rotation phase to "update_servers". To check status use 'tctl status'
$ tctl status
# Cluster acme.cluster
@@ -149,8 +177,9 @@ $ tctl status
```
-Usually if things go wrong, they go wrong at this transition. If you have lost connectivity to nodes,
-[rollback](#rollback) to the old certificate authority.
+ Usually if things go wrong, they go wrong at this transition. If you have lost
+ connectivity to nodes, [roll back](#rollback) to the old certificate
+ authority.
Verify that nodes have caught up:
@@ -169,28 +198,23 @@ $ tctl get nodes --format=json | jq '.[] | {hostname: .spec.hostname, rotation:
Before wrapping up, verify that you have not lost any nodes and can connect to them, for example:
```code
-$ tsh ssh hello@terminal hostname
+$ tsh ssh hello@terminal
```
-This is the last stage when you can rollback. If you have lost connectivity to nodes,
-[rollback](#rollback) to the old certificate authority.
+ This is the last stage where you have the opportunity to roll back. If you
+ have lost connectivity to nodes, [roll back](#rollback) to the old certificate
+ authority.
```code
$ tctl auth rotate --phase=standby --manual
-# Updated rotation phase to "init". To check status use 'tctl status'
-
-$ tctl status
-# Cluster acme.cluster
-# Version (=teleport.version=)
-# Host CA rotating servers (mode: manual, started: Sep 20 01:44:36 UTC, ending: Sep 21 07:44:36 UTC)
```
-Cluster status should indicate succesffully completed rotation.
+Verify that the rotation has completed with `tctl`:
```code
-tctl status
+$ tctl status
Cluster acme.cluster
Version (=teleport.version=)
Host CA rotated Sep 20 02:11:25 UTC
@@ -210,31 +234,26 @@ $ tctl get nodes --format=json | jq '.[] | {hostname: .spec.hostname, rotation:
}
```
-
-If you are using [CA Pinning](../admin/adding-nodes.mdx#untrusted-auth-servers) when adding new nodes, the CA pin will change after the rotation.
-Make sure you use the *new* CA pin when adding nodes after rotation.
-
-
## Semi-Automatic rotation
-Semi-automatic rotation executes the same steps as the manual rotation, but with a grace period between them.
-It currently does not track the states of the nodes and you can lose connectivity if things go wrong.
+ Semi-automatic rotation executes the same steps as the manual rotation, but
+ with a grace period between them. It currently does not track the states of
+ the nodes and you can lose connectivity if things go wrong.
-You can trigger semi-automatic rotation:
+You can trigger semi-automatic rotation by omitting the `--manual` and `--phase`
+flags.
```code
$ tctl auth rotate
```
-This will trigger a rotation process for both hosts and users with a *grace period* of 48 hours.
-During the grace period, certificates issued both by old and new certificate authority work.
+This will trigger a rotation process for both hosts and users with a default
+grace period of 48 hours. During the grace period, certificates issued both by
+old and new certificate authority work.
-You can customize grace period:
+You can customize grace period and CA type with additional flags:
```code
# Rotate only user certificates with a grace period of 200 hours:
@@ -248,33 +267,29 @@ The rotation takes time, especially for hosts, because each node in a cluster
needs to be notified that a rotation is taking place and request a new
certificate for itself before the grace period ends.
-
- Be careful when choosing a grace period when rotating host certificates. The grace period needs to be long enough for all nodes in a cluster to request a new certificate. If some nodes go offline during the
- rotation and come back only after the grace period has ended, they will be
- forced to leave the cluster, i.e. users will no longer be allowed to SSH
- into them.
-
+During semi-automatic rotations, Teleport will attempt to divide the grace
+period so that it spends an equal amount of time in each phase before
+transitioning to the next phase. This means that using a shorter grace period
+will result in faster state transitions.
+
+
+ Be careful when choosing a grace period when rotating host certificates.
+
-Check the cluster status of rotation:
+The grace period needs to be long enough for all nodes in a cluster to request a
+new certificate. If some nodes go offline during the rotation and come back only
+after the grace period has ended, they will be forced to leave the cluster, i.e.
+users will no longer be allowed to SSH into them.
+
+Check the cluster status:
```code
-tctl status
+$ tctl status
Cluster acme.cluster
Version (=teleport.version=)
Host CA initialized (mode: manual, started: Sep 20 01:44:36 UTC, ending: Sep 21 07:44:36 UTC)
```
-
- If you are using [CA Pinning](../admin/adding-nodes.mdx#untrusted-auth-servers) when adding new nodes, the CA pin will change after the rotation. Make sure you use the
- *new* CA pin when adding nodes after rotation.
-
-
Check the status of individual nodes:
```code
@@ -287,21 +302,22 @@ $ tctl get nodes --format=json | jq '.[] | {hostname: .spec.hostname, rotation:
}
```
-Host `terminal` has updated it status to phase `init`. It has downloaded a new CA public key and is ready
-for state transitions.
+The node named `terminal` has updated its status to phase `init`. This means it
+has downloaded a new CA public key and is ready for state transitions.
## Rollback
-Rollback is only possible before rotation enters `standby` state.
+Rollback must be performed before the rotation enters `standby` state.
-First, override the rotation to the manual rollback:
+First, enter the rollback phase with a manual phase transition:
```code
$ tctl auth rotate --phase=rollback --manual
# Updated rotation phase to "rollback". To check status use 'tctl status'
```
-Make sure that nodes that have updated have caught up:
+Make sure that any nodes which have already updated have caught up and entered
+the `rollback` phase.
```code
# Check rotation status of the nodes
@@ -313,5 +329,11 @@ $ tctl get nodes --format=json | jq '.[] | {hostname: .spec.hostname, rotation:
}
```
-If any of the nodes were lost and using the old cert authority, they should reconnect
-once you switch the control plane to the old cert authority.
+If connectivity to any of the nodes was lost during the rotation, this is likely
+because they were still using the old cert authority. Connectivity to these
+nodes should be restored when the rollback completes and the old certificate
+authority is made active.
+
+## Further reading
+
+How the [Teleport Certificate Authority](../../architecture/authentication.mdx#authentication-in-teleport) works.