Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: implement ADD/DROP schema changes for row-level TTL #76216

Merged
merged 8 commits into from
Feb 16, 2022

Conversation

otan
Copy link
Contributor

@otan otan commented Feb 8, 2022

These commits adds the "scarier" part of the row level TTL job, in particular:

  • DROP TABLE/DATABASE/SCHEMA (this required adding it to the declarative schema changer as well, see the last two commits)
  • SET (ttl) for the first time, which adds the automatic column
  • RESET (ttl), which drops the automatic column

This is handled in both the old and new schema changer.

@otan otan requested a review from a team February 8, 2022 04:13
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@otan otan mentioned this pull request Feb 8, 2022
30 tasks
@otan
Copy link
Contributor Author

otan commented Feb 8, 2022

my reading of the new schema changer code is that I "only" need to support DROP TABLE/DROP DATABASE/DROP SCHEMA. i'll look into that tomorrow.

the way add/drop column is being used here is via ALTER TABLE ... SET/RESET, which means it would use the original schema changer anyway. but i'm not sure how "mixing" supported and unsupported statements works.

Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this up for review so quickly. I think it's a good, solid change overall, as far as I can tell (I don't know much about scheduled jobs). The drop column is dodgy of course (not your doing, it always was dodgy) and that's exactly the kind of thing the declarative schema changer will be able to help with.

I do have a couple of high level comments and questions:

  1. This is missing table descriptor validation checks for the TTL descriptor. I don't think they can wait until later. Every time you add a field in the TTL proto, you should have a corresponding validation check and unit tests in tabledesc. This will strengthen all of your tests so, so much.

  2. Wouldn't it be possible to do without the new mutation types? It feels like the TTL descriptor could simply have some kind of status enum indicating whether the schedule:
    a. needs to be updated,
    b. needs to be deleted,
    c. no-op.

ccing @ajwerner for his 👀 on this last one, he might want to weigh in on whether this is a good idea or not.

Reviewed 3 of 3 files at r1, 14 of 14 files at r2, 9 of 9 files at r3, 6 of 6 files at r4, 5 of 5 files at r5, 2 of 2 files at r6, 3 of 3 files at r7, 3 of 3 files at r8, 11 of 11 files at r9, 2 of 2 files at r10, 5 of 5 files at r11, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan)


pkg/sql/alter_table.go, line 1765 at r7 (raw file):

		// Do not have to do anything here.
	case before != nil && after != nil:
		if before.DeletionCron != after.DeletionCron {

What if before you have "" and after you have "@hourly"? Do we care about those kinds of no-ops?


pkg/sql/create_table.go, line 2342 at r5 (raw file):

// defaultTTLScheduleCron is the default cron duration for row-level TTL.
// defaultTTLScheduleCron cannot be a cluster setting as this would involve
// changing all existing schedules to match the new setting.

Would it make sense to enforce the deletion_cron field in the descriptor to be non-empty instead? I won't insist on it, but it feels like that would be more robust.


pkg/sql/schema_changer.go, line 388 at r8 (raw file):

	if tableDesc.Dropped() && tableDesc.GetRowLevelTTL() != nil {
		if err := sc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
			scheduleID := tableDesc.GetRowLevelTTL().ScheduleID

Is this guaranteed to be non-zero at this point?


pkg/sql/schema_changer.go, line 1222 at r11 (raw file):

						}
						scTable.RowLevelTTL.ScheduleID = j.ScheduleID()
					}

nit: shouldn't this have been in an earlier commit?


pkg/sql/show_create.go, line 195 at r5 (raw file):

		}
		if cron := ttl.DeletionCron; cron != "" {
			storageParams = append(storageParams, fmt.Sprintf(`ttl_delete_batch_size = '%s'`, cron))

Wrong format string. I expect this will show up in tests also.


pkg/sql/catalog/descpb/structured.proto, line 612 at r9 (raw file):

  optional TableDescriptor.RowLevelTTL row_level_ttl = 1 [(gogoproto.customname) = "RowLevelTTL"];
}

Is this message likely to have more fields added to it? It doesn't seem necessary, to be honest.


pkg/sql/schema_changer_test.go, line 7516 at r9 (raw file):

			setup: `
			CREATE DATABASE t;
			USE t;

FYI, these USE t don't seem to be necessary. I don't care much either way.

@otan
Copy link
Contributor Author

otan commented Feb 8, 2022

Wouldn't it be possible to do without the new mutation types? It feels like the TTL descriptor could simply have some kind of status enum indicating whether the schedule:
a. needs to be updated,
b. needs to be deleted,
c. no-op.

i can see this getting complicated because if something fails in between the reverse mutations will also need to dive deeper into the tabledescriptor proto and zero things out, but only on add column / drop column if it matches a certain column name.

for me, the separate mutation feels cleaner, but yeah more input would be good. i'll hold off on making that change (as it is not trivial :) ) for now

@postamar
Copy link
Contributor

postamar commented Feb 9, 2022

The way I understand it, mutations represent a future change to the table descriptor itself. It seemed to me that the TTL descriptor was effectively the source of truth for something not in the table descriptor, so to me it didn't fit into that mold. Of course, if there's subtle interplay with adding/removing the ttl column, that would invalidate my argument. In any case having a new type of mutation isn't the worst thing in the world but if it's avoidable it's better.

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing table descriptor validation checks for the TTL descriptor

done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @hourly and @postamar)


pkg/sql/alter_table.go, line 1765 at r7 (raw file):

Previously, postamar (Marius Posta) wrote…

What if before you have "" and after you have "@hourly"? Do we care about those kinds of no-ops?

the net effect doesn't change much, rather not keep it too complicated for now.


pkg/sql/create_table.go, line 2342 at r5 (raw file):

Previously, postamar (Marius Posta) wrote…

Would it make sense to enforce the deletion_cron field in the descriptor to be non-empty instead? I won't insist on it, but it feels like that would be more robust.

i can see it both ways. i preferred this way as this paves the way for a "default" in future, and it makes the logic for displaying the cron on SHOW CREATE TABLE less gnarly if the user is using the default.


pkg/sql/schema_changer.go, line 388 at r8 (raw file):

Previously, postamar (Marius Posta) wrote…

Is this guaranteed to be non-zero at this point?

In theory no, but no harm in adding a check in.
Worth noting that the deleteSchedule no-ops even if it was 0 anyway


pkg/sql/show_create.go, line 195 at r5 (raw file):

Previously, postamar (Marius Posta) wrote…

Wrong format string. I expect this will show up in tests also.

nice catch, fixed.


pkg/sql/catalog/descpb/structured.proto, line 612 at r9 (raw file):

Previously, postamar (Marius Posta) wrote…

Is this message likely to have more fields added to it? It doesn't seem necessary, to be honest.

how strongly do you feel about this? i guess, ideally, i would've added this into the ColumnDescriptor mutation with an extra field symbolizing whether to add or drop a TTL. it's easy to extend this way, and in practice i'm not sure there's too big of a difference.

@otan
Copy link
Contributor Author

otan commented Feb 9, 2022

i've tacked on two commits at the end, which adds the dropping of schedules on DROP TABLE on the new schema changer. it works in the logic test, but my go generate output generated a bit more than i was expecting and i'm sure i screwed up an abstraction somewhere. eyes on that bit greatly appreciated.

i'm also happy to make those last two commits a separate PR to review, and disable the local declarative schema change tests for DROP TABLE clearing schedule IDs for now. let me know what your preference is, it's a pretty juicy PR as is.

@otan otan requested a review from a team February 9, 2022 04:48
@otan otan force-pushed the add_or_drop_col branch 3 times, most recently from 0c28f5a to 5b564b9 Compare February 9, 2022 09:11
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some testing around add a ttl expression that references a column added in the current transaction, added in the current statement. Then, to drive home my point, make adding that column fail by having an error in its default expression or check constraint.

I think when we talk about the use of a mutation, we need to unpack the expected behavior in the face of other mutations and failure. The case where you currently don't use a mutation, which I think is troubling, is when you update an existing set of parameters. If another mutation in the same transaction fails, what should happen? I'm inclined to say that you actually do need a mutation and you need to make it richer.

pkg/sql/alter_table.go Show resolved Hide resolved
Comment on lines +704 to +764
if err := sc.maybeUpdateScheduledJobsForRowLevelTTL(ctx, tableDesc); err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the principle around doing this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is accounting for DROP TABLE (one of the commits here). is there a better place for it? i can put it in drop_table.go if you prefer, but figured it should be as close to the DROP TABLE actually executing as possible.

pkg/sql/alter_table.go Show resolved Hide resolved
@ajwerner
Copy link
Contributor

ajwerner commented Feb 9, 2022

I propose that we add the limitation to these TTL changes that look a lot like the alter primary key changes whereby we don't allow any of these changes in a transaction and we don't allow them concurrent with any other schema changes. That will help us alleviate many (but not all) of my concerns.

@otan
Copy link
Contributor Author

otan commented Feb 10, 2022

i've tacked on 2 commits at the end which blocks schema changes / ttl config changes whilst a ttl mutation is in progress

@otan otan force-pushed the add_or_drop_col branch 3 times, most recently from f335ed2 to 29d61b7 Compare February 10, 2022 10:19
@otan otan requested review from postamar and ajwerner February 11, 2022 03:50
@otan otan marked this pull request as ready for review February 13, 2022 22:11
@otan otan requested a review from a team as a code owner February 13, 2022 22:11
@otan otan requested review from a team and stevendanna and removed request for a team February 13, 2022 22:11
@shermanCRL shermanCRL requested a review from miretskiy February 13, 2022 22:19
@shermanCRL shermanCRL removed the request for review from a team February 13, 2022 22:19
@otan
Copy link
Contributor Author

otan commented Feb 13, 2022

this is rebased & ready for review!

the changefeed stuff is all gone now, sorry for the false pings for cdc / bulk.

"delete-schedule",
mu.txn,
sessiondata.InternalExecutorOverride{User: security.RootUserName()},
"DELETE FROM system.scheduled_jobs WHERE schedule_id = $1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop schedule perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i avoided using DROP SCHEDULE as it would allow users to DROP schedules as well. currently we've explicitly banned that. (also this was mostly form following the deleteSchedule code in the sql package; we have no hooks for this ... yet ...

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM mod minor things

pkg/sql/logictest/testdata/logic_test/row_level_ttl Outdated Show resolved Hide resolved
FAMILY fam_0_id_text_crdb_internal_expiration (id, text, crdb_internal_expiration)
) WITH (ttl = 'on', ttl_automatic_column = 'on', ttl_expire_after = '10 days':::INTERVAL)

statement error cannot modify TTL whilst another schema change is in progress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this error clearer? It might be confusing because we're in an implicit transaction, also because it feels like it's all ttl related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've gone with cannot modify TTL settings while another schema change on the table is being processed

@@ -906,6 +906,12 @@ func (s *TestState) SwapDescriptorSubComment(
return nil
}

// DeleteSchedule implements scexec.DescriptorMetadataUpdater
func (s *TestState) DeleteSchedule(ctx context.Context, id int64) error {
s.LogSideEffectf("delete schedule ID %d", id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that you've added this, can we get a nice data-driven test in schemachanger/testdata. You do, setup and statements to set up the world and then test for the drop of the table and let it rewrite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. couldn't print out the schedule ID here though as it is non deterministic.

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review!

@@ -906,6 +906,12 @@ func (s *TestState) SwapDescriptorSubComment(
return nil
}

// DeleteSchedule implements scexec.DescriptorMetadataUpdater
func (s *TestState) DeleteSchedule(ctx context.Context, id int64) error {
s.LogSideEffectf("delete schedule ID %d", id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. couldn't print out the schedule ID here though as it is non deterministic.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed what i can, but deferring to sql-schema on all the schema change logic.

one thought: do we need to make sure the TTL column is not referenced in a computed col or partial index def?

pkg/sql/alter_table.go Outdated Show resolved Hide resolved
@otan otan force-pushed the add_or_drop_col branch 2 times, most recently from 90a24e7 to 9e191c2 Compare February 15, 2022 20:57
@otan
Copy link
Contributor Author

otan commented Feb 15, 2022

thanks!

one thought: do we need to make sure the TTL column is not referenced in a computed col or partial index def?

answered in the other PR!

@otan
Copy link
Contributor Author

otan commented Feb 15, 2022

bors r=rafiss,ajwerner

need to run, tests are looking green, sorry if i interrupt your bors run

otan added 8 commits February 16, 2022 16:37
This commit drops all schedules tied to the table's TTL job when the
table is dropped through DROP TABLE/SCHEMA/DATABASE.

Release note: None
This commit handles adding a TTL to a table using
`ALTER TABLE ... SET ...`.

To accomplish this, we needed to make add column finalization and adding
the TTL struct / scheduled job atomic. We add a new `ModifyRowLevelTTL`
mutation which the same mutation ID, with approprate rollback handling
if, e.g. ADD COLUMN succeeds but setting the row-level TTL fails.

Release note: None
This commit implements changing the DEFAULT and ON UPDATE expressions
automatically whenever ttl_expire_after is changed.

Release note: None
This commit makes `ALTER TABLE ... RESET (ttl)` drop the automatic TTL
column. To do this, the `DROP COLUMN` logic is extracted into a separate
function, then used along with a `ModifyRowLevelTTL` mutation.

An extra check is added to the schema changer to ensure we aren't
re-adding a schedule that doesn't yet exist.

Release note: None
This commit drops the schedule ID on the new schema changer during a
DROP TABLE.

Release note: None
This commit prohibits dropping the TTL automatic column if TTL has been
defined on the table.

Release note: None
This commit blocks other schema changes whilst TTL is running, analagous
to ALTER COLUMN TYPE / ALTER PRIMARY KEY. This saves the state space to
think about in conjunction with other schema changes.

We also block changes to TTL whilst another schema change is running for
similar reasons.

Release note: None
@craig
Copy link
Contributor

craig bot commented Feb 16, 2022

Canceled.

@otan
Copy link
Contributor Author

otan commented Feb 16, 2022

i love bors silently failing due to a rebase conflict

bors r=rafiss,ajwerner

@craig
Copy link
Contributor

craig bot commented Feb 16, 2022

Build succeeded:

@craig craig bot merged commit 512320f into cockroachdb:master Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants