-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: evaluate decommission pre-checks #93758
Conversation
ff2f482
to
c9f5e35
Compare
c9f5e35
to
6ddfcb7
Compare
6ddfcb7
to
8c4d3bb
Compare
This change implements the `DecommissionPreCheck` RPC on the `Admin` service, using the support for evaluating node decommission readiness by checking each range introduced in cockroachdb#93758. In checking node decommission readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness status are checked, and ranges with replicas on the checked nodes that encounter errors in attempting to allocate replacement replicas are reported in the response. Ranges that have replicas on multiple checked nodes have their errors reported for each nodeID in the request list. Depends on cockroachdb#93758, cockroachdb#90222. Epic: CRDB-20924 Release note: None
8c4d3bb
to
3dfe4dd
Compare
3dfe4dd
to
e98f254
Compare
e98f254
to
74b47e7
Compare
This change implements the `DecommissionPreCheck` RPC on the `Admin` service, using the support for evaluating node decommission readiness by checking each range introduced in cockroachdb#93758. In checking node decommission readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness status are checked, and ranges with replicas on the checked nodes that encounter errors in attempting to allocate replacement replicas are reported in the response. Ranges that have replicas on multiple checked nodes have their errors reported for each nodeID in the request list. Depends on cockroachdb#93758, cockroachdb#90222. Epic: CRDB-20924 Release note: None
74b47e7
to
9c05382
Compare
65641db
to
1e57036
Compare
While we are correctly using the overrides set in the `OverrideStorePool` for the purposes of the node liveness function, the node count function did not properly incorporate the overrides previously. This change rectifies that, using the preset overrides specified at creation of the override store pool to calculate the number of non-decommissioning, non-decommissioned nodes (alive or dead), as viewed by the override store pool. This allows for correct calculation of the number of needed voters, allowing us to correctly determine which allocation action is needed for a range. Depends on cockroachdb#93758. Epic: CRDB-20924 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r2, 8 of 8 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks)
pkg/server/decommission.go
line 51 at r3 (raw file):
type decommissionRangeCheckResult struct { desc *roachpb.RangeDescriptor action allocatorimpl.AllocatorAction
It would be nice if the allocator action didn't leak into the server pkg here. We discussed offline that this will only RPC the string.
pkg/server/decommission.go
line 168 at r3 (raw file):
nodeIDs []roachpb.NodeID, strictReadiness bool, collectTraces bool,
See below comment on trace overhead at high decommission replica count when this is enabled
pkg/server/decommission.go
line 252 at r3 (raw file):
} action, _, recording, rErr := evalStore.AllocatorCheckRange(ctx, &desc, overrideStorePool)
What is the expected mem overhead of collecting a trace recording on every range containing a replica on a decommissioning node?
I see that the check can pass in a list of Nodes that are to be decommissioned so it's foreseeable that you could be collecting a few hundred k traces (especially with multi-store).
pkg/server/decommission_test.go
line 170 at r3 (raw file):
hasAll := true for _, idx := range serverIdxs { hasAll = hasAll && desc.Replicas().HasReplicaOnNode(tc.Server(idx).NodeID())
nit: Why not just return early here?
for (...) {
if !desc.Replicas().HasReplicaOnNode(tc.Server(idx).NodeID()) {
return false
}
}
return true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/server/decommission.go
line 51 at r3 (raw file):
Previously, kvoli (Austen) wrote…
It would be nice if the allocator action didn't leak into the server pkg here. We discussed offline that this will only RPC the string.
From the RPC, yes, but the action is used within the evaluation to determine a few things (see evaluateRangeCheckResult
). In this return struct I can use a string of course, though it means the AllocatorAction
will still be used in the server pkg.
pkg/server/decommission.go
line 252 at r3 (raw file):
Previously, kvoli (Austen) wrote…
What is the expected mem overhead of collecting a trace recording on every range containing a replica on a decommissioning node?
I see that the check can pass in a list of Nodes that are to be decommissioned so it's foreseeable that you could be collecting a few hundred k traces (especially with multi-store).
This is a good point and was one of the things I most wanted input on in this review. Especially since right now, it's collecting traces always, and just returning them if collectTraces
is on.
I think that we should collect traces only if requested, and we should also make sure we have a limit on the number of errors so that we can avoid huge memory overheads.
pkg/server/decommission_test.go
line 170 at r3 (raw file):
Previously, kvoli (Austen) wrote…
nit: Why not just return early here?
for (...) { if !desc.Replicas().HasReplicaOnNode(tc.Server(idx).NodeID()) { return false } } return true
Can do that.
1e57036
to
8debc93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/server/decommission.go
line 51 at r3 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
From the RPC, yes, but the action is used within the evaluation to determine a few things (see
evaluateRangeCheckResult
). In this return struct I can use a string of course, though it means theAllocatorAction
will still be used in the server pkg.
Done.
pkg/server/decommission.go
line 168 at r3 (raw file):
Previously, kvoli (Austen) wrote…
See below comment on trace overhead at high decommission replica count when this is enabled
Done.
pkg/server/decommission.go
line 252 at r3 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
This is a good point and was one of the things I most wanted input on in this review. Especially since right now, it's collecting traces always, and just returning them if
collectTraces
is on.I think that we should collect traces only if requested, and we should also make sure we have a limit on the number of errors so that we can avoid huge memory overheads.
Done.
8debc93
to
bfa646d
Compare
This adds support for the evaluation of the decommission readiness of a node (or set of nodes), by simulating their liveness to have the DECOMMISSIONING status and utilizing the allocator to ensure that we are able to perform any actions needed to repair the range. This supports a "strict" mode, in which case we expect all ranges to only need replacement or removal due to the decommissioning status, or a more permissive "non-strict" mode, which allows for other actions needed, as long as they do not encounter errors in finding a suitable allocation target. The non-strict mode allows us to permit situations where a range may have more than one action needed to repair it, such as a range that needs to reach its replication factor before the decommissioning replica can be replaced, or a range that needs to finalize an atomic replication change. Depends on cockroachdb#94024. Part of cockroachdb#91568 Release note: None
bfa646d
to
09181f8
Compare
This change implements the `DecommissionPreCheck` RPC on the `Admin` service, using the support for evaluating node decommission readiness by checking each range introduced in cockroachdb#93758. In checking node decommission readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness status are checked, and ranges with replicas on the checked nodes that encounter errors in attempting to allocate replacement replicas are reported in the response. Ranges that have replicas on multiple checked nodes have their errors reported for each nodeID in the request list. Depends on cockroachdb#93758, cockroachdb#90222. Epic: CRDB-20924 Release note: None
While we are correctly using the overrides set in the `OverrideStorePool` for the purposes of the node liveness function, the node count function did not properly incorporate the overrides previously. This change rectifies that, using the preset overrides specified at creation of the override store pool to calculate the number of non-decommissioning, non-decommissioned nodes (alive or dead), as viewed by the override store pool. This allows for correct calculation of the number of needed voters, allowing us to correctly determine which allocation action is needed for a range. Depends on cockroachdb#93758. Epic: CRDB-20924 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks)
bors r=kvoli |
Build failed (retrying...): |
bors r=kvoli |
Already running a review |
Build succeeded: |
While we are correctly using the overrides set in the `OverrideStorePool` for the purposes of the node liveness function, the node count function did not properly incorporate the overrides previously. This change rectifies that, using the preset overrides specified at creation of the override store pool to calculate the number of non-decommissioning, non-decommissioned nodes (alive or dead), as viewed by the override store pool. This allows for correct calculation of the number of needed voters, allowing us to correctly determine which allocation action is needed for a range. Depends on cockroachdb#93758. Epic: CRDB-20924 Release note: None
94983: allocator: correct node count calculations with overrides r=AlexTalks a=AlexTalks While we are correctly using the overrides set in the `OverrideStorePool` for the purposes of the node liveness function, the node count function did not properly incorporate the overrides previously. This change rectifies that, using the preset overrides specified at creation of the override store pool to calculate the number of non-decommissioning, non-decommissioned nodes (alive or dead), as viewed by the override store pool. This allows for correct calculation of the number of needed voters, allowing us to correctly determine which allocation action is needed for a range. Depends on #93758. Epic: [CRDB-20924](https://cockroachlabs.atlassian.net/browse/CRDB-20924) Release note: None Co-authored-by: Alex Sarkesian <[email protected]>
This change implements the `DecommissionPreCheck` RPC on the `Admin` service, using the support for evaluating node decommission readiness by checking each range introduced in cockroachdb#93758. In checking node decommission readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness status are checked, and ranges with replicas on the checked nodes that encounter errors in attempting to allocate replacement replicas are reported in the response. Ranges that have replicas on multiple checked nodes have their errors reported for each nodeID in the request list. Depends on cockroachdb#93758, cockroachdb#90222. Epic: CRDB-20924 Release note: None
This change implements the `DecommissionPreCheck` RPC on the `Admin` service, using the support for evaluating node decommission readiness by checking each range introduced in cockroachdb#93758. In checking node decommission readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness status are checked, and ranges with replicas on the checked nodes that encounter errors in attempting to allocate replacement replicas are reported in the response. Ranges that have replicas on multiple checked nodes have their errors reported for each nodeID in the request list. Depends on cockroachdb#93758, cockroachdb#90222. Epic: CRDB-20924 Release note: None
93950: server: implement decommission pre-check api r=AlexTalks a=AlexTalks This change implements the `DecommissionPreCheck` RPC on the `Admin` service, using the support for evaluating node decommission readiness by checking each range introduced in #93758. In checking node decommission readiness, only nodes that have a valid, non-`DECOMMISSIONED` liveness status are checked, and ranges with replicas on the checked nodes that encounter errors in attempting to allocate replacement replicas are reported in the response. Ranges that have replicas on multiple checked nodes have their errors reported for each nodeID in the request list. Depends on #93758, #90222. Epic: [CRDB-20924](https://cockroachlabs.atlassian.net/browse/CRDB-20924) Release note: None Co-authored-by: Alex Sarkesian <[email protected]>
This adds support for the evaluation of the decommission readiness of a
node (or set of nodes), by simulating their liveness to have the
DECOMMISSIONING status and utilizing the allocator to ensure that we are
able to perform any actions needed to repair the range. This supports a
"strict" mode, in which case we expect all ranges to only need
replacement or removal due to the decommissioning status, or a more
permissive "non-strict" mode, which allows for other actions needed, as
long as they do not encounter errors in finding a suitable allocation
target. The non-strict mode allows us to permit situations where a range
may have more than one action needed to repair it, such as a range that
needs to reach its replication factor before the decommissioning replica
can be replaced, or a range that needs to finalize an atomic replication
change.
Depends on #94024.
Part of #91568