Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALTER SCHEDULE fails when creating an incremental backup from a pre-23.2 full backup schedule #131127

Closed
kev-cao opened this issue Sep 20, 2024 · 2 comments · Fixed by #131231
Closed
Assignees
Labels
A-disaster-recovery branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. P-3 Issues/test failures with no fix SLA T-disaster-recovery

Comments

@kev-cao
Copy link
Contributor

kev-cao commented Sep 20, 2024

Describe the problem

When using ALTER SCHEDULE on a pre-23.2 full backup (specifically one from before this patch without clusterID set) that does not have a corresponding incremental schedule, the ALTER will fail, stating that “scheduled job created without a cluster ID (SQLSTATE XXUUU)”.

To Reproduce

What did you do? Describe in your own words.

If possible, provide steps to reproduce the behavior:

  1. Set up a pre-23.2 CockroachDB cluster
  2. Create a full-only backup schedule
  3. Upgrade the cluster to 23.2+
  4. Run an ALTER SCHEDULE query on the created schedule that will create an incremental

Expected behavior
The ALTER SCHEDULE command should succeed and the corresponding incremental should be created.

Jira issue: CRDB-42390

@kev-cao kev-cao added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery T-disaster-recovery labels Sep 20, 2024
@kev-cao kev-cao self-assigned this Sep 20, 2024
Copy link

blathers-crl bot commented Sep 20, 2024

Hi @kev-cao, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Sep 20, 2024

cc @cockroachdb/disaster-recovery

@kev-cao kev-cao added branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Sep 20, 2024
@exalate-issue-sync exalate-issue-sync bot added P-2 Issues/test failures with a fix SLA of 3 months P-3 Issues/test failures with no fix SLA and removed P-2 Issues/test failures with a fix SLA of 3 months labels Sep 23, 2024
craig bot pushed a commit that referenced this issue Sep 24, 2024
131106: clusterversion: introduce rac2 cluster version gates r=sumeerbhola a=kvoli

Introduce two new cluster version gates:

```
V24_3_UseRACV2WithV1EntryEncoding
V24_3_UseRACV2Full
```

Upon a range leader first encountering
`V24_3_UseRACV2WithV1EntryEncoding` via `handleRaftReadyRaftMuLocked`,
it will begin a new term using the replication flow control v2 protocol,
creating a `RangeController` but continue using the v1 entry encoding
and raft still operating in push mode.

Upon a range leader first encountering `V24_3_UseRACV2Full`, it will continue
using the replication flow control v2 protocol, but will now switch to
using the V2 entry encoding.

Note that the necessary protocol migration at the leader, (base) =>
`V24_3_UseRACV2WithV1EntryEncoding` occurs before any other calls in
`handleRaftReadyRaftMuLocked`.

The two version gates are necessary to ensure there are never v2 encoded
entries in the raft log while there is a possibility of a leader running
v1.

---

Move `EnabledWhenLeaderLevel` from `replica_rac2` to the parent package
`kvflowcontrol` and rename `V2EnabledWhenLeaderLevel` to reflect the
move to a shared v1/v2 package.

Also move the corresponding function `racV2EnabledWhenLeaderLevel` to
`kvflowcontrol`. `GetV2EnabledWhenLeaderLevel` will check if there are
testing knob overrides for the enabled level, and if not continue
returning `V2NotEnabledWhenLeader`. Some commentary and todos are also
left around this function, for when we enable the protocol and
separately, pull mode.

Resolves: #131102
Release note: None

131231: backupccl: fill incremental cluster id on alter schedule r=msbutler a=kev-cao

When altering the recurrence of a singleton full backup schedule created before v23.2, the corresponding new incremental schedule creation will fail due to a missing cluster ID. This patch ensures that the cluster ID is set when creating the incremental.

Fixes: #131127

Release note (bug fix): Fixed a bug introduced in v23.2.0 where creating a new incremental schedule via `ALTER SCHEDULE` on a full backup schedule created on an older version would fail.

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Kevin Cao <[email protected]>
@craig craig bot closed this as completed in f943f81 Sep 24, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in Disaster Recovery Backlog Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. P-3 Issues/test failures with no fix SLA T-disaster-recovery
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

1 participant