Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: schedules with include_all_virtual_clusters fail to unmarshal on older nodes #111458

Closed
adityamaru opened this issue Sep 28, 2023 · 2 comments
Labels
A-disaster-recovery branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. db-cy-23 T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Sep 28, 2023

In 23.1.9+ we backported a change #106376 that added include_all_virtual_clusters as an alias to include_all_secondary_tenants. When a node executes CREATE SCHEDULE it bakes the provided backup statement into the scheduled job row such that any node in the cluster should be able to unmarshal the row and kickoff the backup job. In a situation where a cluster has a mix of pre-23.1.9 and post 23.1.9 binaries and if a schedule is created on a 23.1.9+ node, then the nodes on pre-23.1.9 binaries will not know how to unmarshal a schedule with the include_all_virtual_clusters baked into it.

Reproduction:

roachprod create local -n 3
roachprod stage local:1,2 release v23.1.9
roachprod stage local:3 release v23.1.8

roachprod sql local:1
root@localhost:29002/system/defaultdb> CREATE SCHEDULE schedule_label
                                    ->   FOR BACKUP INTO 'nodelocal://1/test/backups/schedule_test'
                                    ->     WITH revision_history, include_all_virtual_clusters
                                    ->     RECURRING '@daily';

roachprod sql local:3
root@localhost:29004/system/defaultdb> show schedules;
          id         |        label         | schedule_status |           next_run            |  state  | recurrence | jobsrunning | owner |            created
 |                                                                                                                                                          command
---------------------+----------------------+-----------------+-------------------------------+---------+------------+-------------+-------+------------------------------
-+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------
  903957822271160321 | sql-stats-compaction | ACTIVE          | 2023-09-28 22:00:00+00        | pending | @hourly    |           0 | node  | 2023-09-28 21:27:14.619996+00
 | {}
  903958340602331137 | schedule_label       | ACTIVE          | 2023-09-29 00:00:00+00        | NULL    | @daily     |           0 | root  | 2023-09-28 21:29:52.771303+00
 | {"backup_statement": "BACKUP INTO LATEST IN 'nodelocal://1/test/backups/schedule_test' WITH OPTIONS (revision_history = true, detached)", "backup_type": 1, "chain_prot
ected_timestamp_records": true, "dependent_schedule_id": 903958340607049729, "protected_timestamp_record": "111fa672-da9c-4b07-aa87-4ec15eb4b392"}
  903959370988322817 | schedule_label       | ACTIVE          | 2023-09-28 21:35:07.259294+00 | NULL    | @weekly    |           0 | root  | 2023-09-28 21:35:07.223886+00
 | {"backup_statement": "BACKUP INTO 'nodelocal://1/test/backups/schedule_test2' WITH OPTIONS (revision_history = true, detached, include_all_virtual_clusters = true)", "
chain_protected_timestamp_records": true, "dependent_schedule_id": 903959370984849409, "unpause_on_success": 903959370984849409}
(3 rows)
(error encountered after some results were delivered)
ERROR: crdb_internal.pb_to_json(): marshaling cockroach.jobs.jobspb.ExecutionArguments; msg=args:<type_url:"type.googleapis.com/cockroach.ccl.backupccl.ScheduledBackupExe
cutionArgs" value:"\010\001\022\227\001BACKUP INTO LATEST IN 'nodelocal://1/test/backups/schedule_test2' WITH OPTIONS (revision_history = true, detached, include_all_virt
ual_clusters = true)0\201\200\224\371\242\310\340\305\0148\001" >: at or near "include_all_virtual_clusters": syntax error
SQLSTATE: 42601
DETAIL: source SQL:
BACKUP INTO LATEST IN 'nodelocal://1/test/backups/schedule_test2' WITH OPTIONS (revision_history = true, detached, include_all_virtual_clusters = true)
                                                                                                                   ^
HINT: try \h BACKUP

Notes:

  • For this to be an issue the schedule has to have been created with an include_all_virtual_clusters option. No one except internal CockroachCloud is expected to have this option set.

  • it is not only SHOW SCHEDULES on an older patch release that is busted, even the schedule adoption loop on these older nodes will constantly fail to adopt this scheduled job.

Jira issue: CRDB-31907

@adityamaru adityamaru added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery T-multitenant Issues owned by the multi-tenant virtual team branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 labels Sep 28, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 28, 2023

cc @cockroachdb/disaster-recovery

@adityamaru adityamaru removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 28, 2023
@adityamaru
Copy link
Contributor Author

Closing this issue as the blast radius has been identified as only being internal to Cockroach.

@exalate-issue-sync exalate-issue-sync bot removed the T-multitenant Issues owned by the multi-tenant virtual team label Sep 28, 2023
@knz knz added the db-cy-23 label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. db-cy-23 T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants