c2c: cutover is not resilient to node shutdown #103534

msbutler · 2023-05-17T17:50:10Z

If the coordinator node issuing revert range requests during cutover fails over, another node will be not able to resume the work. This occurs because the current implementation of cutover uses the the job progress FractionCompleted oneOf field. Sadly, during regular ingestion, we use the progress high_water field, so when cutover begins, we write over the job progress's high water mark.

When a node then tries to resume cutover after the og coordinator dies, it can't, because it attempts to check the high water mark which is unreadible.

This was seen #103008 (comment)

Jira issue: CRDB-28065

Epic CRDB-25146

blathers-crl · 2023-05-17T17:50:12Z

cc @cockroachdb/disaster-recovery

Until we fix cockroachdb#103534, a node shutdown during cutover will fail. Skip them for now. Fixes cockroachdb#103825 Fixes cockroachdb#103575 Fixes cockroachdb#103701 Release note: none

msbutler added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery labels May 17, 2023

blathers-crl bot added the A-disaster-recovery label May 17, 2023

msbutler assigned stevendanna May 17, 2023

msbutler mentioned this issue May 19, 2023

roachtest: c2c/shutdown/src/coordinator failed #103655

Closed

exalate-issue-sync bot assigned livlobo and stevendanna and unassigned stevendanna and livlobo May 24, 2023

msbutler mentioned this issue May 25, 2023

streamingccl: store replicated time in details #103835

Merged

craig bot closed this as completed in 0ab47ef May 25, 2023

stevendanna mentioned this issue May 26, 2023

release-23.1: streamingccl: store replicated time in details #103941

Merged

github-project-automation bot added this to Disaster Recovery Backlog Aug 28, 2024

github-project-automation bot moved this to Done in Disaster Recovery Backlog Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

c2c: cutover is not resilient to node shutdown #103534

c2c: cutover is not resilient to node shutdown #103534

msbutler commented May 17, 2023 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented May 17, 2023

c2c: cutover is not resilient to node shutdown #103534

c2c: cutover is not resilient to node shutdown #103534

Comments

msbutler commented May 17, 2023 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented May 17, 2023

msbutler commented May 17, 2023 •

edited by exalate-issue-sync bot

Loading