Skip to content

Commit

Permalink
Autopilot enterprise docs (#15589)
Browse files Browse the repository at this point in the history
  • Loading branch information
raskchanky authored Jun 7, 2022
1 parent 1865d57 commit 55bc402
Show file tree
Hide file tree
Showing 10 changed files with 323 additions and 52 deletions.
17 changes: 13 additions & 4 deletions website/content/api-docs/system/ha-status.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,32 @@ $ curl \
"api_address": "http://10.0.0.2:8200",
"cluster_address": "https://10.0.0.2:8201",
"active_node": true,
"last_echo": null
"last_echo": null,
"version": "1.11.0",
"upgrade_version": "1.11.0",
"redundancy_zone": "a"
},
{
"hostname": "node2",
"api_address": "http://10.0.0.3:8200",
"cluster_address": "https://10.0.0.3:8201",
"active_node": false,
"last_echo": "2021-11-29T10:29:09.202235-05:00"
"last_echo": "2021-11-29T10:29:09.202235-05:00",
"version": "1.11.0",
"upgrade_version": "1.11.0",
"redundancy_zone": "a"
},
{
"hostname": "node3",
"api_address": "http://10.0.0.4:8200",
"cluster_address": "https://10.0.0.4:8201",
"active_node": false,
"last_echo": "2021-11-29T10:29:07.402548-05:00"
"last_echo": "2021-11-29T10:29:07.402548-05:00",
"version": "1.11.0",
"upgrade_version": "1.11.0",
"redundancy_zone": "a"
}
]
}

```
Note that in the above sample response, `upgrade_version` and `redundancy_zone` are Enterprise-only fields.
168 changes: 127 additions & 41 deletions website/content/api-docs/system/storage/raftautopilot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,52 +34,129 @@ $ curl \

```json
{
"Healthy": true,
"FailureTolerance": 1,
"Servers": {
"healthy": true,
"failure_tolerance": 1,
"servers": {
"raft1": {
"ID": "raft1",
"Name": "raft1",
"Address": "127.0.0.1:8201",
"NodeStatus": "alive",
"LastContact": "0s",
"LastTerm": 3,
"LastIndex": 459,
"Healthy": true,
"StableSince": "2021-03-19T20:14:11.831678-04:00",
"Status": "leader",
"Meta": null
"id": "raft1",
"name": "raft1",
"address": "127.0.0.1:8201",
"node_status": "alive",
"last_contact": "0s",
"last_term": 3,
"last_index": 459,
"healthy": true,
"stable_since": "2021-03-19T20:14:11.831678-04:00",
"status": "leader",
"meta": null
},
"raft2": {
"ID": "raft2",
"Name": "raft2",
"Address": "127.0.0.2:8201",
"NodeStatus": "alive",
"LastContact": "516.49595ms",
"LastTerm": 3,
"LastIndex": 459,
"Healthy": true,
"StableSince": "2021-03-19T20:14:19.831931-04:00",
"Status": "voter",
"Meta": null
"id": "raft2",
"name": "raft2",
"address": "127.0.0.2:8201",
"node_status": "alive",
"last_contact": "516.49595ms",
"last_term": 3,
"last_index": 459,
"healthy": true,
"stable_since": "2021-03-19T20:14:19.831931-04:00",
"status": "voter",
"meta": null
},
"raft3": {
"ID": "raft3",
"Name": "raft3",
"Address": "127.0.0.3:8201",
"NodeStatus": "alive",
"LastContact": "196.706591ms",
"LastTerm": 3,
"LastIndex": 459,
"Healthy": true,
"StableSince": "2021-03-19T20:14:25.83565-04:00",
"Status": "voter",
"Meta": null
"id": "raft3",
"name": "raft3",
"address": "127.0.0.3:8201",
"node_status": "alive",
"last_contact": "196.706591ms",
"last_term": 3,
"last_index": 459,
"healthy": true,
"stable_since": "2021-03-19T20:14:25.83565-04:00",
"status": "voter",
"meta": null
}
},
"Leader": "raft1",
"Voters": ["raft1", "raft2", "raft3"],
"NonVoters": null
"leader": "raft1",
"voters": ["raft1", "raft2", "raft3"],
"non_voters": null
}
```

### Enterprise Only
Vault Enterprise will include additional output in its API response to indicate the current state of redundancy zones,
automated upgrade progress (if any), and optimistic failure tolerance.

#### Sample Response (Enterprise)
```json
{
"failure_tolerance": 0,
"healthy": true,
"leader": "vault_1",
"optimistic_failure_tolerance": 3,
"redundancy_zones": {
"a": {
"servers": [
"vault_1",
"vault_2",
"vault_5"
],
"voters": [
"vault_1"
],
"failure_tolerance": 2
},
"b": {
"servers": [
"vault_3",
"vault_4"
],
"voters": [
"vault_3"
],
"failure_tolerance": 1
}
},
"upgrade_info": {
"other_version_non_voters": [
"vault_2",
"vault_4"
],
"other_version_voters": [
"vault_1",
"vault_3"
],
"redundancy_zones": {
"a": {
"target_version_non_voters": [
"vault_5"
],
"other_version_voters": [
"vault_1"
],
"other_version_non_voters": [
"vault_2"
]
},
"b": {
"other_version_voters": [
"vault_3"
],
"other_version_non_voters": [
"vault_4"
]
}
},
"status": "await-new-voters",
"target_version": "1.12.0",
"target_version_non_voters": [
"vault_5"
]
},
"voters": [
"vault_1",
"vault_3"
]
}
```

Expand Down Expand Up @@ -108,10 +185,13 @@ $ curl \
"last_contact_threshold": "10s",
"max_trailing_logs": 1000,
"min_quorum": 0,
"server_stabilization_time": "10s"
"server_stabilization_time": "10s",
"disable_upgrade_migration": true
}
```

Note that in the above sample response, `disable_upgrade_migration` is an Enterprise-only field.

## Set Configuration

This endpoint is used to modify the configuration of the autopilot subsystem of Integrated Storage.
Expand Down Expand Up @@ -143,6 +223,9 @@ This endpoint is used to modify the configuration of the autopilot subsystem of
- `server_stabilization_time` `(string: "10s")` - Minimum amount of time a server must
be in a stable, healthy state before it can be added to the cluster.

- `disable_upgrade_migration` `(bool: false)` - Disables automatically upgrading Vault using
autopilot. (Enterprise-only)

### Sample Request

```shell-session
Expand All @@ -162,6 +245,9 @@ $ curl \
"dead_server_last_contact_threshold": "24h",
"max_trailing_logs": "1000",
"min_quorum": "3",
"server_stabilization_time": "10s"
"server_stabilization_time": "10s",
"disable_upgrade_migration": true
}
```

Note that in the above sample payload, `disable_upgrade_migration` is an Enterprise-only field.
15 changes: 9 additions & 6 deletions website/content/docs/commands/operator/members.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,16 @@ Get the key status:

```shell-session
$ vault operator members
Host Name API Address Cluster Address ActiveNode Last Echo
--------- ----------- --------------- ---------- ---------
node1 http://10.0.0.2:8200 https://10.0.0.2:8201 true <nil>
node2 http://10.0.0.3:8200 https://10.0.0.3:8201 false 2021-11-29 10:19:39.236409 -0500 EST
node3 http://10.0.0.4:8200 https://10.0.0.4:8201 false 2021-11-29 10:19:37.436283 -0500 EST
Host Name API Address Cluster Address Active Node Version Upgrade Version Redundancy Zone Last Echo
--------- ----------- --------------- ----------- ------- --------------- --------------- ---------
josh-C02ZT9DYMD6R http://127.0.0.1:8200 https://127.0.0.1:8201 true 1.11.0 1.11.0 a n/a
josh-C02ZT9DYMD6R http://127.0.0.2:8200 https://127.0.0.2:8201 false 1.11.0 1.11.0 a 2022-05-23T15:51:19-07:00
josh-C02ZT9DYMD6R http://127.0.0.3:8200 https://127.0.0.3:8201 false 1.11.0 1.11.0 b 2022-05-23T15:51:19-07:00
josh-C02ZT9DYMD6R http://127.0.0.4:8200 https://127.0.0.4:8201 false 1.11.0 1.11.0 b 2022-05-23T15:51:22-07:00
josh-C02ZT9DYMD6R http://127.0.0.5:8200 https://127.0.0.5:8201 false 1.11.0 1.12.0 a 2022-05-23T15:51:20-07:00
~
```
Note that in the above output, `Upgrade Version` and `Redundancy Zone` are Enterprise-only fields.

## Usage

Expand Down
36 changes: 36 additions & 0 deletions website/content/docs/commands/operator/raft.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,39 @@ Servers:
Last Term: 3
Last Index: 38
```
Vault Enterprise will include additional output related to automated upgrades and redundancy zones.

#### Example Vault Enterprise Output

```text
Redundancy Zones:
a
Servers: vault_1, vault_2, vault_5
Voters: vault_1
Failure Tolerance: 2
b
Servers: vault_3, vault_4
Voters: vault_3
Failure Tolerance: 1
Upgrade Info:
Status: await-new-voters
Target Version: 1.12.0
Target Version Voters:
Target Version Non-Voters: vault_5
Other Version Voters: vault_1, vault_3
Other Version Non-Voters: vault_2, vault_4
Redundancy Zones:
a
Target Version Voters:
Target Version Non-Voters: vault_5
Other Version Voters: vault_1
Other Version Non-Voters: vault_2
b
Target Version Voters:
Target Version Non-Voters:
Other Version Voters: vault_3
Other Version Non-Voters: vault_4
```

### autopilot get-config

Expand Down Expand Up @@ -301,3 +334,6 @@ Flags applicable to this command are the following:
voting nodes.

- `server-stabilization-time` `(string)` - Minimum amount of time a server must be in a stable, healthy state before it can become a voter. Until that happens, it will be visible as a peer in the cluster, but as a non-voter, meaning it won't contribute to quorum.
-
- `disable-upgrade-migration` `(bool)` - Controls whether to disable automated
upgrade migrations, an Enterprise-only feature. The default is `false`.
24 changes: 23 additions & 1 deletion website/content/docs/concepts/integrated-storage/autopilot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ description: Learn about the autopilot subsystem of integrated raft storage in V

Autopilot enables automated workflows for managing Raft clusters. The current
feature set includes 3 main features: Server Stabilization, Dead Server Cleanup
and State API. These three features are introduced in Vault 1.7.
and State API. These three features were introduced in Vault 1.7. The Enterprise
feature set includes 2 main features: Automated Upgrades and Redundancy Zones.
These two features were introduced in Vault 1.11.

## Server Stabilization

Expand Down Expand Up @@ -65,6 +67,26 @@ behavior. Autopilot gets initialized with the following default values.

- `server_stabilization_time` - `10s`

## Automated Upgrades

Automated Upgrades lets you automatically upgrade a cluster of Vault nodes to a new version as
updated server nodes join the cluster. Once the number of nodes on the new version is
equal to or greater than the number of nodes on the old version, Autopilot will promote
the newer versioned nodes to voters, demote the older versioned nodes to non-voters,
and initiate a leadership transfer from the older version leader to one of the newer
versioned nodes. After the leadership transfer completes, the older versioned non-voting
nodes can be removed from the cluster.

## Redundancy Zones

Redundancy Zones provide both scaling and resiliency benefits by deploying non-voting
nodes alongside voting nodes on a per availability zone basis. When using redundancy zones,
each zone will have exactly one voting node and as many additional non-voting nodes as desired.
If the voting node in a zone fails, a non-voting node will be automatically promoted to
voter. If an entire zone is lost, a non-voting node from another zone will be promoted to voter,
maintaining quorum. These non-voting nodes function not only as hot standbys, but also
increase read scalability.

## Replication

Performance secondary clusters have their own Autopilot configuration, managed
Expand Down
18 changes: 18 additions & 0 deletions website/content/docs/configuration/storage/raft.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,24 @@ set [`disable_mlock`](/docs/configuration#disable_mlock) to `true`, and to disab
unhealthy and needs to be shown as such in the state API, a node has been marked
as dead needing eviction from Raft configuration, etc.

- `autopilot_update_interval` `(string: "2s")` - This is the interval after which
autopilot will poll Vault for any updates to the information it cares about. This
includes things like the autopilot configuration, current autopilot state, raft
configuration, known servers, latest raft index, and stats for all the known servers.
The information that autopilot receives will be used to calculate its next state.

- `autopilot_upgrade_version` `(string: "")` - This is an optional string that, if
provided, will be used reported to autopilot as Vault's version. This is then used
by autopilot when it makes decisions regarding
[automated upgrades](/docs/enterprise/automated-upgrades). If omitted, the
version of Vault currently in use will be used. Note that this string must conform
to [Semantic Versioning](https://semver.org). Use of this feature requires Vault
Enterprise.

- `autopilot_redundancy_zone` `(string: "")` - This is an optional string that specifies
Vault's [redundancy zone](/docs/enterprise/redundancy-zones). This is reported to autopilot
and is used to enhance scaling and resiliency. Use of this feature requires Vault Enterprise.

### `retry_join` stanza

- `leader_api_addr` `(string: "")` - Address of a possible leader node.
Expand Down
Loading

0 comments on commit 55bc402

Please sign in to comment.