Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: method to check decommission by valid replica replacement availability #91571

Closed
AlexTalks opened this issue Nov 9, 2022 · 0 comments
Closed
Assignees
Labels
A-kv-decom-rolling-restart Decommission and Rolling Restarts A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@AlexTalks
Copy link
Contributor

AlexTalks commented Nov 9, 2022

As part of #90752, we will need to determine the viability of node decommission by evaluating the possibility of replacing each replica on the decommissioning node with a new replica on a valid, available store. In evaluating each replica that exists on the node(s), we can also gather errors and potential remediation steps so that the decommission may become viable.

Jira issue: CRDB-21322

@AlexTalks AlexTalks added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. A-kv-decom-rolling-restart Decommission and Rolling Restarts labels Nov 9, 2022
@AlexTalks AlexTalks self-assigned this Nov 9, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Nov 9, 2022
AlexTalks added a commit to AlexTalks/cockroach that referenced this issue Dec 19, 2022
This adds support for the evaluation of the decommission readiness of a
node (or set of nodes), by simulating their liveness to have the
DECOMMISSIONING status and utilizing the allocator to ensure that we are
able to perform any actions needed to repair the range. This supports a
"strict" mode, in which case we expect all ranges to only need
replacement or removal due to the decommissioning status, or a more
permissive "non-strict" mode, which allows for other actions needed, as
long as they do not encounter errors in finding a suitable allocation
target. The non-strict mode allows us to permit situations where a range
may have more than one action needed to repair it, such as a range that
needs to reach its replication factor before the decommissioning replica
can be replaced, or a range that needs to finalize an atomic replication
change.

Depends on cockroachdb#92367.

Part of cockroachdb#91571

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this issue Dec 22, 2022
This change refactors parts of the replicate queue's `PlanOneChange(..)`
and `addOrRemove{Non}Voters(..)` functions to reusable helper functions
that simplify usage of the allocator and deduplicate repeated code
paths. The change also adds convenience methods to the `AllocatorAction`
enum, to move certain determinations (such as if a computed allocator
action is a remove or a replace) closer to the allocator type it is
based on. These changes move more of the logic needed to use the
allocator into the `allocatorimpl` package itself, enabling usage of the
allocator outside of the replicate queue.

Part of cockroachdb#91571.

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this issue Jan 5, 2023
This change refactors parts of the replicate queue's `PlanOneChange(..)`
and `addOrRemove{Non}Voters(..)` functions to reusable helper functions
that simplify usage of the allocator and deduplicate repeated code
paths. The change also adds convenience methods to the `AllocatorAction`
enum, to move certain determinations (such as if a computed allocator
action is a remove or a replace) closer to the allocator type it is
based on. These changes move more of the logic needed to use the
allocator into the `allocatorimpl` package itself, enabling usage of the
allocator outside of the replicate queue.

Part of cockroachdb#91571.

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this issue Jan 7, 2023
This change refactors parts of the replicate queue's `PlanOneChange(..)`
and `addOrRemove{Non}Voters(..)` functions to reusable helper functions
that simplify usage of the allocator and deduplicate repeated code
paths. The change also adds convenience methods to the `AllocatorAction`
enum, to move certain determinations (such as if a computed allocator
action is a remove or a replace) closer to the allocator type it is
based on. These changes move more of the logic needed to use the
allocator into the `allocatorimpl` package itself, enabling usage of the
allocator outside of the replicate queue.

Part of cockroachdb#91571.

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this issue Jan 7, 2023
This adds support for the evaluation of the decommission readiness of a
node (or set of nodes), by simulating their liveness to have the
DECOMMISSIONING status and utilizing the allocator to ensure that we are
able to perform any actions needed to repair the range. This supports a
"strict" mode, in which case we expect all ranges to only need
replacement or removal due to the decommissioning status, or a more
permissive "non-strict" mode, which allows for other actions needed, as
long as they do not encounter errors in finding a suitable allocation
target. The non-strict mode allows us to permit situations where a range
may have more than one action needed to repair it, such as a range that
needs to reach its replication factor before the decommissioning replica
can be replaced, or a range that needs to finalize an atomic replication
change.

Depends on cockroachdb#94024.

Part of cockroachdb#91571

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-decom-rolling-restart Decommission and Rolling Restarts A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

1 participant