-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/schemachanger,backupccl: plot a path for adoption of the declarative schema changer #73071
Comments
cc @cockroachdb/bulk-io |
I've added an epic link in the description, because we'll have to do something about this if we're to use the declarative schema changer in production in 22.1. |
Spoke with @dt about this. More or less we seemed to agree on the suggested proposal. The feeling was that we will be better off duplicating even the metadata of the statements to each descriptor in play. In practice, this looks like:
This approach has the nice property that we can, given a set of descriptors in a restore, retain the relevant parts of the schema change state machine without needing to involve jobs everywhere. Until we can get out of the game of dealing with descriptors in backups altogether (🤞, the only way we're ever going to make Eventually, when we go to remove the old schema changer, we'll need to do something about upgrading descriptor representations and mapping the old schema changer state into the new one. That, or we retain the code but disallow its use for at least a version. I hear there are upcoming new limitations on the lifetime of support for backups. Similarly, we'll need to gain some invariants around how old a running job is allowed to be. I'll file a separate issue about that. |
This all sounds reasonable to me. I guess this implies we should also replace the Beyond backups, this does feel conceptually nicer somehow. Effectively persisting the schema changer state in the descriptors will have the nice side-benefit of forcing us to be more careful about backward compatibility and helps us avoid doing dumb things. Having nothing in the new-schema-change job details (not including the descriptor IDs at the job top level) makes that more future-proof too, which honestly is a relief because I'm not sure that that's well tested in mixed-version clusters ("stmt gets executed on new node, job gets adopted by old node" scenarios or vice-versa). This also leverages the possibilities offered by descriptor validation and post-deserialization changes, etc. |
Nice, glad we've found a happy place to land on this. I'm going to assign myself this and take it for December. It'd be cool to have a backup and restore test of running schema changes using the declarative framework before the new year. |
Hi @ajwerner, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
76116: schemachanger,scpb,catalog: move schema changer state into descriptors r=ajwerner a=ajwerner This work is in support of #73071. The first two commit breaks dependencies so descpb can import scpb. Then we add the state to the descriptor protos. Finally we adopt for use. This does not yet test or synthesize the job. Co-authored-by: Andrew Werner <[email protected]>
Overview
Restores currently jump through some hoops to leverage the existing, imperative schema changer when restoring descriptors which are currently mid-schema change. These same mechanisms won't perfectly work for the declarative schema changer. This issue encompasses both
Background
In the old world, schema changes operated on a per-descriptor basis and all of their state was implied by the descriptor itself (modulo some description sort of things in the job). This was something of a boon to BACKUP/RESTORE; from a descriptor, the job could be recreated relatively trivially as it didn’t have much state (not that trivially, there’s still a bunch of complexity related to filtering out things that maybe don’t exist).
The above scheme has several known problems which we're trying to fix:
The declarative schema changer seeks to move away from the descriptor mutation world. Changes will be decomposed into sets of schema elements to be added or removed. These elements will traverse statuses. The planner will formulate a plan based solely on the current set of (element, direction, status) tuples. This is nice because it lets us decompose a bunch of logic in order to build more robust primitives.
A downside is that, from the descriptor alone, we can’t really reconstruct the job state at full fidelity. For a little while, we probably can reconstruct it sufficiently. At the end of the day, we’re not eager to break compatibility with existing descriptors (way too much headache for their users) and we’re trying to not stray too far from the concepts of descriptors which already exist.
Problem statement
Describe the solution you'd like
One family of solutions involves moving more of the schema change state back onto the descriptors. One could imagine that we retain an in-sync set of elements and their statuses on the descriptors and in the job. One could go further and just have the job reference the descriptors. This actually wouldn't be that big of a change. There'd still be some work to figure out which elements to trash when restoring. This would be new work, but it wouldn't be too crazy, I don't think. That would give us a path forward in terms of restoring these descriptors in the absence of jobs.
Separately, we could then write migrations which synthesize elements and their statuses from mutations. This does not need to be done until we want to remove the old schema changer.
Additional context
By having all of the state in the descriptors instead of in the jobs, we may run afoul of other versioning considerations for full cluster restore.
In the transactional schema change RFC, there’s a notion of a descriptor which is in a state that needs to be rolled back. This is a bit hand-wavy in the RFC.
Epic CRDB-2356
The text was updated successfully, but these errors were encountered: