Skip to content

Commit

Permalink
Merge #96265
Browse files Browse the repository at this point in the history
96265: loqrecovery,cli: add debug recover verify command r=erikgrinaker a=aliher1911

This commit adds debug recovery verify command which provides
the status of loss of quorum recovery plan application status.
The command is used after debug recover apply-plan was used to
stage a recovery plan on a cluster to check application progress.
It allows user to check which nodes still needs to be restarted,
outcome of recovery on restarted nodes and health of ranges on
the entire cluster.

Release note: None

Fixes #93043

Co-authored-by: Oleg Afanasyev <[email protected]>
  • Loading branch information
craig[bot] and aliher1911 committed Feb 8, 2023
2 parents d4f63d7 + cbf0914 commit ec38fb4
Show file tree
Hide file tree
Showing 21 changed files with 1,853 additions and 571 deletions.
33 changes: 31 additions & 2 deletions docs/generated/http/full.md
Original file line number Diff line number Diff line change
Expand Up @@ -7613,6 +7613,7 @@ Support status: [reserved](#support-status)
| ----- | ---- | ----- | ----------- | -------------- |
| plan_id | [bytes](#cockroach.server.serverpb.RecoveryVerifyRequest-bytes) | | PlanID is ID of the plan to verify. | [reserved](#support-status) |
| decommissioned_node_ids | [int32](#cockroach.server.serverpb.RecoveryVerifyRequest-int32) | repeated | DecommissionedNodeIDs is a set of nodes that should be marked as decommissioned in the cluster when loss of quorum recovery successfully applies. | [reserved](#support-status) |
| max_reported_ranges | [int32](#cockroach.server.serverpb.RecoveryVerifyRequest-int32) | | MaxReportedRanges is the maximum number of failed ranges to report. If more unhealthy ranges are found, error will be returned alongside range to indicate that ranges were cut short. | [reserved](#support-status) |



Expand All @@ -7631,14 +7632,42 @@ Support status: [reserved](#support-status)
| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| statuses | [cockroach.kv.kvserver.loqrecovery.loqrecoverypb.NodeRecoveryStatus](#cockroach.server.serverpb.RecoveryVerifyResponse-cockroach.kv.kvserver.loqrecovery.loqrecoverypb.NodeRecoveryStatus) | repeated | Statuses contain a list of recovery statuses of nodes updated during recovery. It also contains nodes that were expected to be live (not decommissioned by recovery) but failed to return status response. | [reserved](#support-status) |
| unavailable_ranges | [cockroach.roachpb.RangeDescriptor](#cockroach.server.serverpb.RecoveryVerifyResponse-cockroach.roachpb.RangeDescriptor) | repeated | UnavailableRanges contains descriptors of ranges that failed health checks. | [reserved](#support-status) |
| decommissioned_node_ids | [int32](#cockroach.server.serverpb.RecoveryVerifyResponse-int32) | repeated | DecommissionedNodeIDs contains list of decommissioned node id's. Only nodes that were decommissioned by the plan would be listed here, not all historically decommissioned ones. | [reserved](#support-status) |
| unavailable_ranges | [RecoveryVerifyResponse.UnavailableRanges](#cockroach.server.serverpb.RecoveryVerifyResponse-cockroach.server.serverpb.RecoveryVerifyResponse.UnavailableRanges) | | UnavailableRanges contains information about ranges that failed health check. | [reserved](#support-status) |
| decommissioned_node_statuses | [RecoveryVerifyResponse.DecommissionedNodeStatusesEntry](#cockroach.server.serverpb.RecoveryVerifyResponse-cockroach.server.serverpb.RecoveryVerifyResponse.DecommissionedNodeStatusesEntry) | repeated | DecommissionedNodeStatuses contains a map of requested IDs with their corresponding liveness statuses. | [reserved](#support-status) |






<a name="cockroach.server.serverpb.RecoveryVerifyResponse-cockroach.server.serverpb.RecoveryVerifyResponse.UnavailableRanges"></a>
#### RecoveryVerifyResponse.UnavailableRanges



| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| ranges | [cockroach.kv.kvserver.loqrecovery.loqrecoverypb.RangeRecoveryStatus](#cockroach.server.serverpb.RecoveryVerifyResponse-cockroach.kv.kvserver.loqrecovery.loqrecoverypb.RangeRecoveryStatus) | repeated | Ranges contains descriptors of ranges that failed health check. If there are too many ranges to report, error would contain relevant message. | [reserved](#support-status) |
| error | [string](#cockroach.server.serverpb.RecoveryVerifyResponse-string) | | Error contains an optional error if ranges validation can't complete. | [reserved](#support-status) |





<a name="cockroach.server.serverpb.RecoveryVerifyResponse-cockroach.server.serverpb.RecoveryVerifyResponse.DecommissionedNodeStatusesEntry"></a>
#### RecoveryVerifyResponse.DecommissionedNodeStatusesEntry



| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| key | [int32](#cockroach.server.serverpb.RecoveryVerifyResponse-int32) | | | |
| value | [cockroach.kv.kvserver.liveness.livenesspb.MembershipStatus](#cockroach.server.serverpb.RecoveryVerifyResponse-cockroach.kv.kvserver.liveness.livenesspb.MembershipStatus) | | | |






## ListTenants

Expand Down
7 changes: 7 additions & 0 deletions pkg/base/test_server_args.go
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,13 @@ type TestClusterArgs struct {
// A copy of an entry from this map will be copied to each individual server
// and potentially adjusted according to ReplicationMode.
ServerArgsPerNode map[int]TestServerArgs

// If reusable listeners is true, then restart should keep listeners untouched
// so that servers are kept on the same ports. It is up to the test to set
// proxy listeners to TestServerArgs.Listener that would survive
// net.Listener.Close() and then allow restarted server to use them again.
// See testutils.ListenerRegistry.
ReusableListeners bool
}

var (
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,7 @@ go_test(
"//pkg/util/stop",
"//pkg/util/timeutil",
"//pkg/util/tracing",
"//pkg/util/uuid",
"//pkg/workload/examples",
"@com_github_cockroachdb_datadriven//:datadriven",
"@com_github_cockroachdb_errors//:errors",
Expand Down
Loading

0 comments on commit ec38fb4

Please sign in to comment.