-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanconfig: AUTO SPAN CONFIG job does not handle duplicate after being restored #70173
Comments
I think we want to simply exclude backing up certain kinds of jobs, notably these automatic ones. Do we want to do the same for #68434? |
Yeah still trying to get a repro on 68434, but for some reason, I don't see duplicate schedules on restore. Either I'm doing something wrong or we already have something in place, will check it out tomorrow. |
I'm not very familiar with the span config job but is there checkpointed state that needs to be resumed from on restore into a fresh cluster? If we don't back it up, and simply rely on the new one created on the restoring cluster, will they reconcile to the same state as on the cluster that ran the backup, post restore? |
There's no checkpointing, not yet -- right now it's just a scaffold of a job. When restoring, it'd be fine to discard the checkpointed state if any. |
My vote is we modify the job to, on Resume(), check if there is a duplicate and if so, choose to either exit or cancel the other one. Right now there's no persisted state, so it's maybe fine to just say that RESTORE always cancels the restored one, but in the future, if that changed, and there were some state in the job, it isn't clear that the restore job is always the one we want to discard, so I'd rather the job itself make the choice of which one it keeps? |
Hi @irfansharif, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Capturing a conversation elsewhere, @adityamaru mentioned that #75060 might be addressed with this issue as well. As such, we should be able to run |
Fixes cockroachdb#70173. When restoring a cluster, we don't want to end up with two instances of the singleton reconciliation job. Release note: None
Fixes #70173. When restoring a cluster, we don't want to end up with two instances of the singleton reconciliation job. Release note: None
Fixes cockroachdb#70173. When restoring a cluster, we don't want to end up with two instances of the singleton reconciliation job. Release note: None
Describe the problem
A full cluster backup will backup all jobs (including automatic jobs) in the cluster. A cluster restore into a fresh cluster will restore the jobs from the backup into the system table of the cluster being restored into.
The
AUTO SPAN CONFIG
job is automatically created on cluster startup. Performing a full cluster restore, will result in 2 such entries for theAUTO SPAN CONFIG
job. One started by the restoring cluster, and one from the backup.To Reproduce
BACKUP INTO 'nodelocal://0/foo'
cockroach-data/extern
directory.RESTORE FROM LATEST IN 'nodelocal://0/foo'
SHOW AUTOMATIC JOBS
Expected behavior
Semantics need to be ironed out for cluster backup and restore performed in a dedicated environment, and also as a secondary tenant.
BACKUP TENANT
andRESTORE TENANT
performed by the system tenant should be okay and only result in a single entry for the span config job. This is because a tenant restore runs in the host tenants' registry and simply writes all keys from[TenantPrefix, TenantPrefix.End]
in a newly created, empty tenant. The creation of a tenant does not lead to aAUTO SPAN CONFIG
job being triggered, therefore the restored job entry should be the only one of that kind.Environment:
Epic CRDB-8816
Jira issue: CRDB-9970
The text was updated successfully, but these errors were encountered: