Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autopilot enterprise docs #15589

Merged
merged 25 commits into from
Jun 7, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
146a19e
first draft of autopilot enterprise docs
raskchanky May 24, 2022
0d91724
add configuration changes
raskchanky May 25, 2022
3c8a699
Update website/content/api-docs/system/storage/raftautopilot.mdx
raskchanky May 27, 2022
9514852
Update website/content/api-docs/system/storage/raftautopilot.mdx
raskchanky May 27, 2022
ddcccac
Update website/content/api-docs/system/storage/raftautopilot.mdx
raskchanky May 27, 2022
64d2b05
Update website/content/docs/commands/operator/members.mdx
raskchanky May 27, 2022
98cc174
Update website/content/docs/commands/operator/raft.mdx
raskchanky May 27, 2022
b43b964
Update website/content/docs/concepts/integrated-storage/autopilot.mdx
raskchanky May 27, 2022
a33193b
Update website/content/docs/concepts/integrated-storage/autopilot.mdx
raskchanky May 27, 2022
88cef2a
Update website/content/docs/concepts/integrated-storage/autopilot.mdx
raskchanky May 27, 2022
40b3243
Update website/content/docs/concepts/integrated-storage/autopilot.mdx
raskchanky May 27, 2022
2f691f1
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 27, 2022
15e7624
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
5f57e79
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
1db0150
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
ef6252d
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
8ac580d
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
86efb5f
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
e8f3257
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
eb71697
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
3fa1a5f
Update website/content/docs/enterprise/automated-upgrades.mdx
raskchanky May 31, 2022
0520fde
Update website/content/docs/enterprise/redundancy-zones.mdx
raskchanky May 31, 2022
78b4b57
Update website/content/docs/enterprise/redundancy-zones.mdx
raskchanky May 31, 2022
ddde2de
feedback
raskchanky May 31, 2022
b9bdb17
a few more things
raskchanky Jun 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions website/content/api-docs/system/ha-status.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,32 @@ $ curl \
"api_address": "http://10.0.0.2:8200",
"cluster_address": "https://10.0.0.2:8201",
"active_node": true,
"last_echo": null
"last_echo": null,
"version": "1.11.0",
"upgrade_version": "1.11.0",
"redundancy_zone": "a"
},
{
"hostname": "node2",
"api_address": "http://10.0.0.3:8200",
"cluster_address": "https://10.0.0.3:8201",
"active_node": false,
"last_echo": "2021-11-29T10:29:09.202235-05:00"
"last_echo": "2021-11-29T10:29:09.202235-05:00",
"version": "1.11.0",
"upgrade_version": "1.11.0",
"redundancy_zone": "a"
},
{
"hostname": "node3",
"api_address": "http://10.0.0.4:8200",
"cluster_address": "https://10.0.0.4:8201",
"active_node": false,
"last_echo": "2021-11-29T10:29:07.402548-05:00"
"last_echo": "2021-11-29T10:29:07.402548-05:00",
"version": "1.11.0",
"upgrade_version": "1.11.0",
"redundancy_zone": "a"
}
]
}

```
Note that in the above sample response, `upgrade_version` and `redundancy_zone` are Enterprise only fields.
168 changes: 127 additions & 41 deletions website/content/api-docs/system/storage/raftautopilot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,52 +34,129 @@ $ curl \

```json
{
"Healthy": true,
"FailureTolerance": 1,
"Servers": {
"healthy": true,
"failure_tolerance": 1,
"servers": {
"raft1": {
"ID": "raft1",
"Name": "raft1",
"Address": "127.0.0.1:8201",
"NodeStatus": "alive",
"LastContact": "0s",
"LastTerm": 3,
"LastIndex": 459,
"Healthy": true,
"StableSince": "2021-03-19T20:14:11.831678-04:00",
"Status": "leader",
"Meta": null
"id": "raft1",
"name": "raft1",
"address": "127.0.0.1:8201",
"node_status": "alive",
"last_contact": "0s",
"last_term": 3,
"last_index": 459,
"healthy": true,
"stable_since": "2021-03-19T20:14:11.831678-04:00",
"status": "leader",
"meta": null
},
"raft2": {
"ID": "raft2",
"Name": "raft2",
"Address": "127.0.0.2:8201",
"NodeStatus": "alive",
"LastContact": "516.49595ms",
"LastTerm": 3,
"LastIndex": 459,
"Healthy": true,
"StableSince": "2021-03-19T20:14:19.831931-04:00",
"Status": "voter",
"Meta": null
"id": "raft2",
"name": "raft2",
"address": "127.0.0.2:8201",
"node_status": "alive",
"last_contact": "516.49595ms",
"last_term": 3,
"last_index": 459,
"healthy": true,
"stable_since": "2021-03-19T20:14:19.831931-04:00",
"status": "voter",
"meta": null
},
"raft3": {
"ID": "raft3",
"Name": "raft3",
"Address": "127.0.0.3:8201",
"NodeStatus": "alive",
"LastContact": "196.706591ms",
"LastTerm": 3,
"LastIndex": 459,
"Healthy": true,
"StableSince": "2021-03-19T20:14:25.83565-04:00",
"Status": "voter",
"Meta": null
"id": "raft3",
"name": "raft3",
"address": "127.0.0.3:8201",
"node_status": "alive",
"last_contact": "196.706591ms",
"last_term": 3,
"last_index": 459,
"healthy": true,
"stable_since": "2021-03-19T20:14:25.83565-04:00",
"status": "voter",
"meta": null
}
},
"Leader": "raft1",
"Voters": ["raft1", "raft2", "raft3"],
"NonVoters": null
"leader": "raft1",
"voters": ["raft1", "raft2", "raft3"],
"non_voters": null
}
```

### Enterprise Only
Vault Enterprise will include additional output in its API response, to indicate the current state of redundancy zones,
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
automated upgrade progress (if any), and optimistic failure tolerance.

#### Sample Response (Enterprise)
```json
{
"failure_tolerance": 0,
"healthy": true,
"leader": "vault_1",
"optimistic_failure_tolerance": 3,
"redundancy_zones": {
"a": {
"servers": [
"vault_1",
"vault_2",
"vault_5"
],
"voters": [
"vault_1"
],
"failure_tolerance": 2
},
"b": {
"servers": [
"vault_3",
"vault_4"
],
"voters": [
"vault_3"
],
"failure_tolerance": 1
}
},
"upgrade_info": {
"other_version_non_voters": [
"vault_2",
"vault_4"
],
"other_version_voters": [
"vault_1",
"vault_3"
],
"redundancy_zones": {
"a": {
"target_version_non_voters": [
"vault_5"
],
"other_version_voters": [
"vault_1"
],
"other_version_non_voters": [
"vault_2"
]
},
"b": {
"other_version_voters": [
"vault_3"
],
"other_version_non_voters": [
"vault_4"
]
}
},
"status": "await-new-voters",
"target_version": "1.12.0",
"target_version_non_voters": [
"vault_5"
]
},
"voters": [
"vault_1",
"vault_3"
]
}
```

Expand Down Expand Up @@ -108,10 +185,13 @@ $ curl \
"last_contact_threshold": "10s",
"max_trailing_logs": 1000,
"min_quorum": 0,
"server_stabilization_time": "10s"
"server_stabilization_time": "10s",
"disable_upgrade_migration": true
}
```

Note that in the above sample response, `disable_upgrade_migration` is an Enterprise only field.
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
raskchanky marked this conversation as resolved.
Show resolved Hide resolved

## Set Configuration

This endpoint is used to modify the configuration of the autopilot subsystem of Integrated Storage.
Expand Down Expand Up @@ -143,6 +223,9 @@ This endpoint is used to modify the configuration of the autopilot subsystem of
- `server_stabilization_time` `(string: "10s")` - Minimum amount of time a server must
be in a stable, healthy state before it can be added to the cluster.

- `disable_upgrade_migration` `(bool: false)` - Disables automatically upgrading Vault using
autopilot. (Enterprise only)

### Sample Request

```shell-session
Expand All @@ -162,6 +245,9 @@ $ curl \
"dead_server_last_contact_threshold": "24h",
"max_trailing_logs": "1000",
"min_quorum": "3",
"server_stabilization_time": "10s"
"server_stabilization_time": "10s",
"disable_upgrade_migration": true
}
```

Note that in the above sample payload, `disable_upgrade_migration` is an Enterprise only field.
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
15 changes: 9 additions & 6 deletions website/content/docs/commands/operator/members.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,16 @@ Get the key status:

```shell-session
$ vault operator members
Host Name API Address Cluster Address ActiveNode Last Echo
--------- ----------- --------------- ---------- ---------
node1 http://10.0.0.2:8200 https://10.0.0.2:8201 true <nil>
node2 http://10.0.0.3:8200 https://10.0.0.3:8201 false 2021-11-29 10:19:39.236409 -0500 EST
node3 http://10.0.0.4:8200 https://10.0.0.4:8201 false 2021-11-29 10:19:37.436283 -0500 EST

Host Name API Address Cluster Address Active Node Version Upgrade Version Redundancy Zone Last Echo
--------- ----------- --------------- ----------- ------- --------------- --------------- ---------
josh-C02ZT9DYMD6R http://127.0.0.1:8200 https://127.0.0.1:8201 true 1.11.0 1.11.0 a n/a
josh-C02ZT9DYMD6R http://127.0.0.2:8200 https://127.0.0.2:8201 false 1.11.0 1.11.0 a 2022-05-23T15:51:19-07:00
josh-C02ZT9DYMD6R http://127.0.0.3:8200 https://127.0.0.3:8201 false 1.11.0 1.11.0 b 2022-05-23T15:51:19-07:00
josh-C02ZT9DYMD6R http://127.0.0.4:8200 https://127.0.0.4:8201 false 1.11.0 1.11.0 b 2022-05-23T15:51:22-07:00
josh-C02ZT9DYMD6R http://127.0.0.5:8200 https://127.0.0.5:8201 false 1.11.0 1.12.0 a 2022-05-23T15:51:20-07:00
~
```
Note that in the above output, `Upgrade Version` and `Redundancy Zone` are Enterprise only fields.
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
raskchanky marked this conversation as resolved.
Show resolved Hide resolved

## Usage

Expand Down
36 changes: 36 additions & 0 deletions website/content/docs/commands/operator/raft.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,39 @@ Servers:
Last Term: 3
Last Index: 38
```
Vault Enterprise will include additional output related to automated upgrades and redundancy zones.

#### Example Vault Enterprise Output

```text
Redundancy Zones:
a
Servers: vault_1, vault_2, vault_5
Voters: vault_1
Failure Tolerance: 2
b
Servers: vault_3, vault_4
Voters: vault_3
Failure Tolerance: 1
Upgrade Info:
Status: await-new-voters
Target Version: 1.12.0
Target Version Voters:
Target Version Non-Voters: vault_5
Other Version Voters: vault_1, vault_3
Other Version Non-Voters: vault_2, vault_4
Redundancy Zones:
a
Target Version Voters:
Target Version Non-Voters: vault_5
Other Version Voters: vault_1
Other Version Non-Voters: vault_2
b
Target Version Voters:
Target Version Non-Voters:
Other Version Voters: vault_3
Other Version Non-Voters: vault_4
```

### autopilot get-config

Expand Down Expand Up @@ -301,3 +334,6 @@ Flags applicable to this command are the following:
voting nodes.

- `server-stabilization-time` `(string)` - Minimum amount of time a server must be in a stable, healthy state before it can become a voter. Until that happens, it will be visible as a peer in the cluster, but as a non-voter, meaning it won't contribute to quorum.
-
- `disable-upgrade-migration` `(bool)` - Controls whether to disable automated
upgrade migrations, an Enterprise only feature. The default is `false`.
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
24 changes: 23 additions & 1 deletion website/content/docs/concepts/integrated-storage/autopilot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ description: Learn about the autopilot subsystem of integrated raft storage in V

Autopilot enables automated workflows for managing Raft clusters. The current
feature set includes 3 main features: Server Stabilization, Dead Server Cleanup
and State API. These three features are introduced in Vault 1.7.
and State API. These three features were introduced in Vault 1.7. The Enterprise
feature set includes 2 main features: Automated Upgrades, and Redundancy Zones.
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
These two features were introduced in Vault 1.11.

## Server Stabilization

Expand Down Expand Up @@ -65,6 +67,26 @@ behavior. Autopilot gets initialized with the following default values.

- `server_stabilization_time` - `10s`

## Automated Upgrades

This capability automatically upgrades a cluster of Vault nodes to a new version as
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
updated server nodes join the cluster. Once the number of nodes on the new version is
equal to or greater than the number of nodes on the old version, Autopilot will promote
the newer versioned nodes to voters, demote the older versioned nodes to non-voters,
and initiate a leadership transfer from the older version leader to one of the newer
versioned nodes. After the leadership transfer finishes, the older versioned non-voting
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
nodes can be removed from the cluster.

## Redundancy Zones

Redundancy Zones provide both scaling and resiliency benefits by deploying non-voting
nodes alongside voting nodes on a per availability zone basis. When using redundancy zones,
each zone will have exactly 1 voting node and as many additional non-voting nodes as desired.
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
raskchanky marked this conversation as resolved.
Show resolved Hide resolved
If the voting node in a zone fails, a non-voting node will be automatically promoted to
voter. If an entire zone is lost, a non-voting node from another zone will be promoted to voter,
maintaining quorum. These non-voting nodes function not only as hot standbys, but also
increase read scalability.

## Replication

Performance secondary clusters have their own Autopilot configuration, managed
Expand Down
18 changes: 18 additions & 0 deletions website/content/docs/configuration/storage/raft.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,24 @@ set [`disable_mlock`](/docs/configuration#disable_mlock) to `true`, and to disab
unhealthy and needs to be shown as such in the state API, a node has been marked
as dead needing eviction from Raft configuration, etc.

- `autopilot_update_interval` `(string: "2s")` - This is the interval after which
autopilot will poll Vault for any updates to the information it cares about. This
includes things like the autopilot configuration, current autopilot state, raft
configuration, known servers, latest raft index, and stats for all the known servers.
The information that autopilot receives will be used to calculate its next state.

- `autopilot_upgrade_version` `(string: "")` - This is an optional string that, if
provided, will be used reported to autopilot as Vault's version. This is then used
by autopilot when it makes decisions regarding
[automated upgrades](/docs/enterprise/automated-upgrades). If omitted, the
version of Vault currently in use will be used. Note that this string must conform
to [Semantic Versioning](https://semver.org). Use of this feature requires Vault
Enterprise.

- `autopilot_redundancy_zone` `(string: "")` - This is an optional string that specifies
Vault's [redundancy zone](/docs/enterprise/redundancy-zones). This is reported to autopilot
and is used to enhance scaling and resiliency. Use of this feature requires Vault Enterprise.

### `retry_join` stanza

- `leader_api_addr` `(string: "")` - Address of a possible leader node.
Expand Down
Loading