Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: make visible the ranges that fail to move during a decommission #76249

Closed
cameronnunez opened this issue Feb 8, 2022 · 0 comments · Fixed by #76516
Closed

server: make visible the ranges that fail to move during a decommission #76249

cameronnunez opened this issue Feb 8, 2022 · 0 comments · Fixed by #76516
Assignees
Labels
A-cli-server CLI commands that pertain to CockroachDB server processes A-kv-decom-rolling-restart Decommission and Rolling Restarts A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security

Comments

@cameronnunez
Copy link
Contributor

cameronnunez commented Feb 8, 2022

When a replica transfer stalls during a decommission, we currently have no visibility of which replicas are “stuck.” The only progress indicator is a counter of the remaining replicas on the node being decommissioned, without any indications for what ranges these replicas are for. Today, discovering these ranges requires a manual search.

We need visibility of the ranges of the stalling replicas in order to optimize our ability diagnose stuck decommissioning processes.

Needed for #74158.

Epic CRDB-11843

Jira issue: CRDB-13046

@cameronnunez cameronnunez added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-decom-rolling-restart Decommission and Rolling Restarts A-kv-observability A-cli-server CLI commands that pertain to CockroachDB server processes labels Feb 8, 2022
@cameronnunez cameronnunez self-assigned this Feb 8, 2022
@blathers-crl blathers-crl bot added the T-server-and-security DB Server & Security label Feb 8, 2022
cameronnunez added a commit to cameronnunez/cockroach that referenced this issue Feb 28, 2022
…ssioning

Fixes cockroachdb#76249. Informs cockroachdb#74158.

This patch makes it so that when a decommission stalls the descriptions of
the "stuck" replicas are printed to the operator.

Release note (cli change): If decommissioning stalls, the replicas that are
failing to move are printed to the operator.

Release justification: low risk, high benefit changes to existing functionality
cameronnunez added a commit to cameronnunez/cockroach that referenced this issue Mar 2, 2022
…ssioning

Fixes cockroachdb#76249. Informs cockroachdb#74158.

This patch makes it so that when a decommission stalls the descriptions of
the "stuck" replicas are printed to the operator.

Release note (cli change): If decommissioning stalls, the replicas that are
failing to move are printed to the operator.

Release justification: low risk, high benefit changes to existing functionality
@craig craig bot closed this as completed in f8e03e2 Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli-server CLI commands that pertain to CockroachDB server processes A-kv-decom-rolling-restart Decommission and Rolling Restarts A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant