Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c2c: cutover is not resilient to node shutdown #103534

Closed
msbutler opened this issue May 17, 2023 · 1 comment · Fixed by #103835
Closed

c2c: cutover is not resilient to node shutdown #103534

msbutler opened this issue May 17, 2023 · 1 comment · Fixed by #103835
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery

Comments

@msbutler
Copy link
Collaborator

msbutler commented May 17, 2023

If the coordinator node issuing revert range requests during cutover fails over, another node will be not able to resume the work. This occurs because the current implementation of cutover uses the the job progress FractionCompleted oneOf field. Sadly, during regular ingestion, we use the progress high_water field, so when cutover begins, we write over the job progress's high water mark.

When a node then tries to resume cutover after the og coordinator dies, it can't, because it attempts to check the high water mark which is unreadible.

This was seen #103008 (comment)

Jira issue: CRDB-28065

Epic CRDB-25146

@msbutler msbutler added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery labels May 17, 2023
@blathers-crl
Copy link

blathers-crl bot commented May 17, 2023

cc @cockroachdb/disaster-recovery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants