Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: declarative, state-based schema changer #59288

Merged
merged 1 commit into from
Mar 4, 2021

Conversation

thoszhang
Copy link
Contributor

@thoszhang thoszhang commented Jan 22, 2021

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I think it's worth soliciting some more eyes even without the reference level description, even even it ends up just being an FYI / skim for them.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang)


docs/RFCS/20210115_new_schema_changer.md, line 55 at r1 (raw file):

- Phase: Context in which the execution of an op occurs. For today's schema changes, the possible phases are:
    - statement execution
    - pre-commit (of the user transaction)

we use "user transaction" casually in conversation but it might not be obvious to a reader. suggest adding a bullet defining "user transaction" in this terminology section.


docs/RFCS/20210115_new_schema_changer.md, line 135 at r1 (raw file):

Link to Figure 2: [Output of `EXPLAIN (DDL)`](https://cockroachdb.github.io/scplan/viz.html#H4sIAAAAAAAA/+xYUW/iOBB+hl9R5fHUbokDFPYgUkqoFB1Hqy3b1ak6IZO4xWrq5GJn93qn/e8nh4SY4oDTBB2V+sjMZPx58s3MFzz8GMFweXLyb7NB48Xqh+vHlKFoTnVultj7ib3hwwXyhxpl8BGdtLRfuY3orfvU3vpsjyfj2Xh+PZ38of2ZuvXMrUvdIHODzzdfLyfOKPU0G42fUiy6IQGjZ2DaW2CsqT3/9sV5fW5nC1ZBYLc0wAsJQJAB7KkC7OcAL63Rb1fOZDK2My9olUUFdAkqI0UFgCIqYOSo7qyJY1szAVS7NKiOBFQ7A9XNQW2kI+AiR/HK08sh7LpIMSIZ1TspIqNVhMjQixAZYAuRCg5DxvJuhqNdiKNTiEPgsXV5O57O9pZiAwGD0SNiNENw/xAQRvE/aKj1tdNVzGDA4MJHJ64PKR1qQcxQpJkDFpkD5mXWJYIeN/9C3XDxaRT48TMZnDPPHJwngUmwuZkp+bHOZLrCQ/tisbeOA9vHEPiM1v6FxB/7Ps+3jmFRjLbD2EuI1OA8wGfsv6xjHcKuBIuYMxCRt7b9P7DHluuIbluIOE8ONXdaNvBMxTKEEX6GkQRRksPJUXXArhOamalpntIlDNFQIwFBGR3bNRHoZoXWIR76uySNcP6MGuNsHCGX4YDQgqe+Qz9/xrodqbyBVWrHU8qpq2fk71QpJ1TJWaKNCPoxTzk0f0ISHsUE/xXva6nvKKI4INJTC1EGbImibUa8KlwaTFkQoVGZ6ithENIqv4KFSuJa26/z0X7vtP0kTCZK4/v/aDvJrKiTxql0ETXSSqAQQTqKYocIwnHD3pMr1ybP32wQ48zMU1q2rZ1S9uKjoeZBukReEkzaZ2aevyioc2bmh9lfrm+kUd0zk39XVe1R6gbhp9/hE7I8D3mroWQj6kY4ZEFkIx8xdE1k6mNnO45K6C/H3kWG6R79NVXTX7N3q7+uSuivWdI5dvnOSSh1wSml10yppNvfzCNHfQOMDrcBUglg17cBSm5/pQ0wJm7gYfK4QXXJLijRb3uF2q2gkNTKo7Qubg+lkL6qbLi7t2y4aq3X460H7rM5HzCWzXm9xV3Vv4VUBr1FvG8RrjDwd1OrWol0nReiuiotHE8Vbp9kOejlAb98V0oQLir0ntTFpYTer6dkyR0vofv0gH3/6OrD5RBoSYvAewvIe6vPXdX/qErqsxotyU3voI89yNYzpM46iSJaiJTM+WoVBXzugLasbICTEXTr68TX0+gmXvjYPboJBLjGBxc1TyDxhb7p4h8aSV0j1al9jlVLAL4QQK8+mtpREIabRK1nY36Q9lDCXuJV+4I8WlLzBW9IFzzg362GLnXx3W+A2luhho/aQ+5ug+9uQ7q7Da6ijY7Uxde6UeNaT+5pLSgi7Mgq1PzZ/C8AAP//g35tdeYgAAA=)

Op edges are again represented by arrows with solid black lines. Each set of op edges in a stage correspond exactly to the set of ops to be executed in that stage. Targets which do not undergo any change within a stage, because they are "waiting" for dependencies to be satisfied in a later stage, have arrows with dotted lines within that stage. Note that the generated stages are compatible with each dependency edge indicated in the output of `EXPLAIN (DDL, DEPS)` (Figure 1).

I'm a little confused even after reading this - is explain(ddl) a subset of explain(ddl,deps)? or a different thing entirely?

@postamar
Copy link
Contributor

postamar commented Feb 2, 2021

I ended up reading this RFC draft after having read the code in #58979 and what is explained here corresponds neatly and clearly to my understanding of the code 👍

Copy link
Contributor Author

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some more detailed notes I have.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)


docs/RFCS/20210115_new_schema_changer.md, line 55 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

we use "user transaction" casually in conversation but it might not be obvious to a reader. suggest adding a bullet defining "user transaction" in this terminology section.

Done.


docs/RFCS/20210115_new_schema_changer.md, line 135 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I'm a little confused even after reading this - is explain(ddl) a subset of explain(ddl,deps)? or a different thing entirely?

I added a few words. Does that help? Neither is a subset of the other.

@thoszhang thoszhang marked this pull request as ready for review February 2, 2021 20:54
@thoszhang thoszhang requested a review from a team as a code owner February 2, 2021 20:54
@thoszhang
Copy link
Contributor Author

I just tagged a bunch of SQL and SQL-adjacent people. This RFC is not quite finished but we thought it'd be good to have some eyes on it.

There is an accompanying PR: #58979

craig bot pushed a commit that referenced this pull request Feb 2, 2021
58979: schemachanger,*: introduce new schema changer r=lucy-zhang a=ajwerner

This PR introduces the new schema changer, for which a partially written RFC (including a "guide-level explanation") can be found in #59288. It contains a somewhat complete implementation for a subset of `ALTER TABLE ... ADD COLUMN`. Like the existing implementation, these schema changes are planned and partially executed in the user transaction, but most of the work happens in an asynchronous job that runs after the transaction has committed. We use the index backfiller to build new columns, as described in #47989.

See the RFC and individual commits for details.

Missing things that will be in later PRs:

- Making execution changes to support concurrent writes for columns added using the index backfiller. We currently don't have a way to write column values which are not part of the primary index. See #59149. (#58990 takes care of the backfill part).
- Ensuring compatible coexistence with the old schema changer; in particular, polling for queued mutations to finish before any schema changes can start, using child transactions within the user transaction (#56588). The new schema changer assumes it has exclusive rights to perform schema changes on the table, which anticipates the transactional schema changes of the future.
- Reverting schema changes when failed or canceled.
- Properly checkpointing and reading job progress for the backfill. Currently there's a placeholder implementation that doesn't interact with the job at all.

Co-authored-by: Andrew Werner <[email protected]>
@RaduBerinde
Copy link
Member

This is really cool! Very nice work!

I suggest adding a testing mode where a mock "executor" simply records information about each step and then dumps the list out as text. Then you can have datadriven tests for the "planning" part.

@petermattis
Copy link
Collaborator

Tangential: every time I see "declarative schema changer" I imagine being able to declare the state of my schema (i.e. here is the way I want my table to look) and have the system figure out the series of table/index/column/constraint adds/drops rather than having to specify them imperatively.

@ajwerner
Copy link
Contributor

ajwerner commented Feb 3, 2021

Tangential: every time I see "declarative schema changer" I imagine being able to declare the state of my schema (i.e. here is the way I want my table to look) and have the system figure out the series of table/index/column/constraint adds/drops rather than having to specify them imperatively.

Not quite, however, the builder concepts below the builder are quite amenable to the idea. Something like:

CREATE TABLE foo (i INT PRIMARY KEY);
ALTER TABLE foo ADD COLUMN j INT NOT NULL DEFAULT 42;

as equivalent to:

CREATE TABLE foo (i INT PRIMARY KEY);
CREATE OR ALTER TABLE foo (i INT PRIMARY KEY, j INT NOT NULL DEFAULT 42);

Fun idea. Don't know what getting this sort of idea even out there for standardization would look like but changing sql syntax is far from our goal.

@RaduBerinde
Copy link
Member

I suggest adding a testing mode where a mock "executor" simply records information about each step and then dumps the list out as text. Then you can have datadriven tests for the "planning" part.

Never mind, I looked at the PR and you already do this (scbld tests).

@petermattis
Copy link
Collaborator

Fun idea. Don't know what getting this sort of idea even out there for standardization would look like but changing sql syntax is far from our goal.

Standardization would be way off. Perhaps a tool that CRDB provides: cockroach schema-changes <old-schema> <new-schema> which outputs a list of the ALTER/CREATE/DROP statements.

@bdarnell
Copy link
Contributor

bdarnell commented Feb 3, 2021

Perhaps a tool that CRDB provides: cockroach schema-changes which outputs a list of the ALTER/CREATE/DROP statements.

I've recently stumbled across a tool that diffs two postgres schemas (on a live PG instance instead of two .sql files) and outputs the alter statements: https://databaseci.com/docs/migra

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a testing mode where a mock "executor" simply records information about each step and then dumps the list out as text. Then you can have datadriven tests for the "planning" part.

Hmm, I feel like I'm missing something but about how this is distinctly different from just dumping out the ops in a datadriven test? Like why get the "executor" involved at all? It's possible that we're saying the same thing.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @dt, @jordanlewis, @otan, @postamar, @RaduBerinde, @rafiss, and @yuzefovich)

@otan
Copy link
Contributor

otan commented Feb 4, 2021

Took a long-ish skim -- this work looks very exciting and much nicer to work with, looking forward to it! The state transition are pretty awesome and seem useful for debugging too!

@otan otan removed their request for review February 12, 2021 03:13
@thoszhang thoszhang force-pushed the schema-changer-rfc branch 2 times, most recently from 83c333b to 7f3f5fc Compare March 3, 2021 02:53
@thoszhang thoszhang force-pushed the schema-changer-rfc branch 3 times, most recently from 0d3301d to 9a06e30 Compare March 3, 2021 05:45
@thoszhang
Copy link
Contributor Author

I filled out the "Reference-level explanation." This is finished as a first draft, and is ready for another look.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @dt, @jordanlewis, @lucy-zhang, @postamar, @RaduBerinde, @rafiss, and @yuzefovich)


docs/RFCS/20210115_new_schema_changer.md, line 20 at r3 (raw file):

Schema changes in CRDB have always been guided by the state-based abstraction, but their current implementation has serious deficiencies, some of which make the implementation unsuitable for the impending introduction of [transactional schema changes](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20200909_transactional_schema_changes.md). The current schema changer has hard-coded states and transitions that best accomodate a single column or index change, and is fundamentally limited in its ability to support other schema changes and their combinations. This has become untenable as the number of supported DDL statements has grown.

Problems with the schema changer which threaten the correctness and stability of schema changes, both current and future, include:

I could see a point about coupling and modularity. Most code today has to live in the SQL package make it hard to leverage in testing.


docs/RFCS/20210115_new_schema_changer.md, line 90 at r3 (raw file):

```go
scpb.Node{
	Target: &scpb.Column{

I think this is missing a direction.


docs/RFCS/20210115_new_schema_changer.md, line 303 at r3 (raw file):

- More clarity is needed on the semantics of locking vs. non-locking updates in the context of concurrent schema changes.
  - A related question is [how to deal with `DROP TABLE`](https://github.com/cockroachdb/cockroach/issues/61256), and its relative `TRUNCATE TABLE`, in the presence of ongoing schema changes.

one particular challenge here is that the schema change state is only present on the job which makes applying it especially awkward.

@thoszhang thoszhang force-pushed the schema-changer-rfc branch 2 times, most recently from 9565d67 to fc39ffc Compare March 3, 2021 18:34
Copy link
Contributor Author

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, @dt, @jordanlewis, @postamar, @RaduBerinde, @rafiss, and @yuzefovich)


docs/RFCS/20210115_new_schema_changer.md, line 20 at r3 (raw file):

Previously, ajwerner wrote…

I could see a point about coupling and modularity. Most code today has to live in the SQL package make it hard to leverage in testing.

Done. I folded this into the "difficult of testing" point.


docs/RFCS/20210115_new_schema_changer.md, line 90 at r3 (raw file):

Previously, ajwerner wrote…

I think this is missing a direction.

I kind of screwed this up overall, but it should be fixed.


docs/RFCS/20210115_new_schema_changer.md, line 303 at r3 (raw file):

Previously, ajwerner wrote…

one particular challenge here is that the schema change state is only present on the job which makes applying it especially awkward.

Done. I think that's actually an important larger point, that the source of truth for the schema change state is now in the job, so I added a sentence elsewhere.

@thoszhang thoszhang force-pushed the schema-changer-rfc branch from fc39ffc to ee8a5f4 Compare March 3, 2021 19:15
@ajwerner
Copy link
Contributor

ajwerner commented Mar 3, 2021

Another point that I don't really see represented here because it's not really represented in code is this idea that there are a number of optimizations we may want to perform to the set of operations we need to traverse. One challenge there is that some of these optimizations may need to know about elements in the schema that are not being changed. Some of these optimizations seem rather important IIRC.

Release justification: Non-production code change

Release note: None
@thoszhang thoszhang force-pushed the schema-changer-rfc branch from ee8a5f4 to 32f3e6c Compare March 3, 2021 22:36
@thoszhang
Copy link
Contributor Author

Another point that I don't really see represented here because it's not really represented in code is this idea that there are a number of optimizations we may want to perform to the set of operations we need to traverse. One challenge there is that some of these optimizations may need to know about elements in the schema that are not being changed. Some of these optimizations seem rather important IIRC.

Good catch. I added a subsection about this.

@thoszhang
Copy link
Contributor Author

Would anyone like to take more time to read this? Feel free to say yes, because feedback is always appreciated. But if not, I'll plan to merge this sometime tomorrow (pending final sign-off from @jordanlewis).

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: thanks for pushing this over the finish line. This will be useful as a guide, reference and artifact of our decisionmaking for years to come.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, @dt, @jordanlewis, @postamar, @RaduBerinde, @rafiss, and @yuzefovich)

@thoszhang
Copy link
Contributor Author

TFTRs

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 4, 2021

Build succeeded:

@craig craig bot merged commit 7ddc401 into cockroachdb:master Mar 4, 2021
@thoszhang thoszhang deleted the schema-changer-rfc branch March 4, 2021 17:10
@nickjackson
Copy link

Fun idea. Don't know what getting this sort of idea even out there for standardization would look like but changing sql syntax is far from our goal.

Standardization would be way off. Perhaps a tool that CRDB provides: cockroach schema-changes <old-schema> <new-schema> which outputs a list of the ALTER/CREATE/DROP statements.

I would really appreciate this feature. I've requested similar before. https://forum.cockroachlabs.com/t/declarative-schema-migrations/3334

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants