Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: evaluate decommission pre-checks #93758

Merged
merged 1 commit into from
Jan 20, 2023

Conversation

AlexTalks
Copy link
Contributor

@AlexTalks AlexTalks commented Dec 15, 2022

This adds support for the evaluation of the decommission readiness of a
node (or set of nodes), by simulating their liveness to have the
DECOMMISSIONING status and utilizing the allocator to ensure that we are
able to perform any actions needed to repair the range. This supports a
"strict" mode, in which case we expect all ranges to only need
replacement or removal due to the decommissioning status, or a more
permissive "non-strict" mode, which allows for other actions needed, as
long as they do not encounter errors in finding a suitable allocation
target. The non-strict mode allows us to permit situations where a range
may have more than one action needed to repair it, such as a range that
needs to reach its replication factor before the decommissioning replica
can be replaced, or a range that needs to finalize an atomic replication
change.

Depends on #94024.

Part of #91568

@AlexTalks AlexTalks requested review from a team as code owners December 15, 2022 22:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@AlexTalks AlexTalks requested a review from kvoli December 15, 2022 22:54
@AlexTalks AlexTalks force-pushed the dpf_evaluate branch 2 times, most recently from ff2f482 to c9f5e35 Compare December 17, 2022 00:58
@AlexTalks AlexTalks requested a review from a team as a code owner December 17, 2022 00:58
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Dec 19, 2022
This change implements the `DecommissionPreCheck` RPC on the `Admin`
service, using the support for evaluating node decommission readiness by
checking each range introduced in cockroachdb#93758. In checking node decommission
readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness
status are checked, and ranges with replicas on the checked nodes that
encounter errors in attempting to allocate replacement replicas are
reported in the response. Ranges that have replicas on multiple checked
nodes have their errors reported for each nodeID in the request list.

Depends on cockroachdb#93758, cockroachdb#90222.

Epic: CRDB-20924

Release note: None
@AlexTalks AlexTalks requested a review from a team January 7, 2023 09:27
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 7, 2023
This change implements the `DecommissionPreCheck` RPC on the `Admin`
service, using the support for evaluating node decommission readiness by
checking each range introduced in cockroachdb#93758. In checking node decommission
readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness
status are checked, and ranges with replicas on the checked nodes that
encounter errors in attempting to allocate replacement replicas are
reported in the response. Ranges that have replicas on multiple checked
nodes have their errors reported for each nodeID in the request list.

Depends on cockroachdb#93758, cockroachdb#90222.

Epic: CRDB-20924

Release note: None
@AlexTalks AlexTalks changed the title server: implement decommission pre-checks server: evaluate decommission pre-checks Jan 7, 2023
@AlexTalks AlexTalks force-pushed the dpf_evaluate branch 2 times, most recently from 65641db to 1e57036 Compare January 10, 2023 08:49
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 10, 2023
While we are correctly using the overrides set in the
`OverrideStorePool` for the purposes of the node liveness function, the
node count function did not properly incorporate the overrides
previously. This change rectifies that, using the preset overrides
specified at creation of the override store pool to calculate the number
of non-decommissioning, non-decommissioned nodes (alive or dead), as
viewed by the override store pool. This allows for correct calculation
of the number of needed voters, allowing us to correctly determine which
allocation action is needed for a range.

Depends on cockroachdb#93758.

Epic: CRDB-20924

Release note: None
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r2, 8 of 8 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks)


pkg/server/decommission.go line 51 at r3 (raw file):

type decommissionRangeCheckResult struct {
	desc         *roachpb.RangeDescriptor
	action       allocatorimpl.AllocatorAction

It would be nice if the allocator action didn't leak into the server pkg here. We discussed offline that this will only RPC the string.


pkg/server/decommission.go line 168 at r3 (raw file):

	nodeIDs []roachpb.NodeID,
	strictReadiness bool,
	collectTraces bool,

See below comment on trace overhead at high decommission replica count when this is enabled


pkg/server/decommission.go line 252 at r3 (raw file):

			}

			action, _, recording, rErr := evalStore.AllocatorCheckRange(ctx, &desc, overrideStorePool)

What is the expected mem overhead of collecting a trace recording on every range containing a replica on a decommissioning node?

I see that the check can pass in a list of Nodes that are to be decommissioned so it's foreseeable that you could be collecting a few hundred k traces (especially with multi-store).


pkg/server/decommission_test.go line 170 at r3 (raw file):

	hasAll := true
	for _, idx := range serverIdxs {
		hasAll = hasAll && desc.Replicas().HasReplicaOnNode(tc.Server(idx).NodeID())

nit: Why not just return early here?

for (...) {
    if !desc.Replicas().HasReplicaOnNode(tc.Server(idx).NodeID()) {
        return false
    }
}
return true

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/server/decommission.go line 51 at r3 (raw file):

Previously, kvoli (Austen) wrote…

It would be nice if the allocator action didn't leak into the server pkg here. We discussed offline that this will only RPC the string.

From the RPC, yes, but the action is used within the evaluation to determine a few things (see evaluateRangeCheckResult). In this return struct I can use a string of course, though it means the AllocatorAction will still be used in the server pkg.


pkg/server/decommission.go line 252 at r3 (raw file):

Previously, kvoli (Austen) wrote…

What is the expected mem overhead of collecting a trace recording on every range containing a replica on a decommissioning node?

I see that the check can pass in a list of Nodes that are to be decommissioned so it's foreseeable that you could be collecting a few hundred k traces (especially with multi-store).

This is a good point and was one of the things I most wanted input on in this review. Especially since right now, it's collecting traces always, and just returning them if collectTraces is on.

I think that we should collect traces only if requested, and we should also make sure we have a limit on the number of errors so that we can avoid huge memory overheads.


pkg/server/decommission_test.go line 170 at r3 (raw file):

Previously, kvoli (Austen) wrote…

nit: Why not just return early here?

for (...) {
    if !desc.Replicas().HasReplicaOnNode(tc.Server(idx).NodeID()) {
        return false
    }
}
return true

Can do that.

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/server/decommission.go line 51 at r3 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

From the RPC, yes, but the action is used within the evaluation to determine a few things (see evaluateRangeCheckResult). In this return struct I can use a string of course, though it means the AllocatorAction will still be used in the server pkg.

Done.


pkg/server/decommission.go line 168 at r3 (raw file):

Previously, kvoli (Austen) wrote…

See below comment on trace overhead at high decommission replica count when this is enabled

Done.


pkg/server/decommission.go line 252 at r3 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

This is a good point and was one of the things I most wanted input on in this review. Especially since right now, it's collecting traces always, and just returning them if collectTraces is on.

I think that we should collect traces only if requested, and we should also make sure we have a limit on the number of errors so that we can avoid huge memory overheads.

Done.

This adds support for the evaluation of the decommission readiness of a
node (or set of nodes), by simulating their liveness to have the
DECOMMISSIONING status and utilizing the allocator to ensure that we are
able to perform any actions needed to repair the range. This supports a
"strict" mode, in which case we expect all ranges to only need
replacement or removal due to the decommissioning status, or a more
permissive "non-strict" mode, which allows for other actions needed, as
long as they do not encounter errors in finding a suitable allocation
target. The non-strict mode allows us to permit situations where a range
may have more than one action needed to repair it, such as a range that
needs to reach its replication factor before the decommissioning replica
can be replaced, or a range that needs to finalize an atomic replication
change.

Depends on cockroachdb#94024.

Part of cockroachdb#91568

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 19, 2023
This change implements the `DecommissionPreCheck` RPC on the `Admin`
service, using the support for evaluating node decommission readiness by
checking each range introduced in cockroachdb#93758. In checking node decommission
readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness
status are checked, and ranges with replicas on the checked nodes that
encounter errors in attempting to allocate replacement replicas are
reported in the response. Ranges that have replicas on multiple checked
nodes have their errors reported for each nodeID in the request list.

Depends on cockroachdb#93758, cockroachdb#90222.

Epic: CRDB-20924

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 20, 2023
While we are correctly using the overrides set in the
`OverrideStorePool` for the purposes of the node liveness function, the
node count function did not properly incorporate the overrides
previously. This change rectifies that, using the preset overrides
specified at creation of the override store pool to calculate the number
of non-decommissioning, non-decommissioned nodes (alive or dead), as
viewed by the override store pool. This allows for correct calculation
of the number of needed voters, allowing us to correctly determine which
allocation action is needed for a range.

Depends on cockroachdb#93758.

Epic: CRDB-20924

Release note: None
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks)

@AlexTalks
Copy link
Contributor Author

bors r=kvoli

@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Build failed (retrying...):

@AlexTalks
Copy link
Contributor Author

bors r=kvoli

@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Already running a review

@craig craig bot merged commit 1b79102 into cockroachdb:master Jan 20, 2023
@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Build succeeded:

@AlexTalks AlexTalks deleted the dpf_evaluate branch January 20, 2023 23:58
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 21, 2023
While we are correctly using the overrides set in the
`OverrideStorePool` for the purposes of the node liveness function, the
node count function did not properly incorporate the overrides
previously. This change rectifies that, using the preset overrides
specified at creation of the override store pool to calculate the number
of non-decommissioning, non-decommissioned nodes (alive or dead), as
viewed by the override store pool. This allows for correct calculation
of the number of needed voters, allowing us to correctly determine which
allocation action is needed for a range.

Depends on cockroachdb#93758.

Epic: CRDB-20924

Release note: None
craig bot pushed a commit that referenced this pull request Jan 25, 2023
94983: allocator: correct node count calculations with overrides r=AlexTalks a=AlexTalks

While we are correctly using the overrides set in the
`OverrideStorePool` for the purposes of the node liveness function, the
node count function did not properly incorporate the overrides
previously. This change rectifies that, using the preset overrides
specified at creation of the override store pool to calculate the number
of non-decommissioning, non-decommissioned nodes (alive or dead), as
viewed by the override store pool. This allows for correct calculation
of the number of needed voters, allowing us to correctly determine which
allocation action is needed for a range.

Depends on #93758.

Epic: [CRDB-20924](https://cockroachlabs.atlassian.net/browse/CRDB-20924)

Release note: None

Co-authored-by: Alex Sarkesian <[email protected]>
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 27, 2023
This change implements the `DecommissionPreCheck` RPC on the `Admin`
service, using the support for evaluating node decommission readiness by
checking each range introduced in cockroachdb#93758. In checking node decommission
readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness
status are checked, and ranges with replicas on the checked nodes that
encounter errors in attempting to allocate replacement replicas are
reported in the response. Ranges that have replicas on multiple checked
nodes have their errors reported for each nodeID in the request list.

Depends on cockroachdb#93758, cockroachdb#90222.

Epic: CRDB-20924

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 28, 2023
This change implements the `DecommissionPreCheck` RPC on the `Admin`
service, using the support for evaluating node decommission readiness by
checking each range introduced in cockroachdb#93758. In checking node decommission
readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness
status are checked, and ranges with replicas on the checked nodes that
encounter errors in attempting to allocate replacement replicas are
reported in the response. Ranges that have replicas on multiple checked
nodes have their errors reported for each nodeID in the request list.

Depends on cockroachdb#93758, cockroachdb#90222.

Epic: CRDB-20924

Release note: None
craig bot pushed a commit that referenced this pull request Jan 28, 2023
93950: server: implement decommission pre-check api r=AlexTalks a=AlexTalks

This change implements the `DecommissionPreCheck` RPC on the `Admin`
service, using the support for evaluating node decommission readiness by
checking each range introduced in #93758. In checking node decommission
readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness
status are checked, and ranges with replicas on the checked nodes that
encounter errors in attempting to allocate replacement replicas are
reported in the response. Ranges that have replicas on multiple checked
nodes have their errors reported for each nodeID in the request list.

Depends on #93758, #90222.

Epic: [CRDB-20924](https://cockroachlabs.atlassian.net/browse/CRDB-20924)

Release note: None

Co-authored-by: Alex Sarkesian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants