-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: implement ADD/DROP schema changes for row-level TTL #76216
Conversation
my reading of the new schema changer code is that I "only" need to support DROP TABLE/DROP DATABASE/DROP SCHEMA. i'll look into that tomorrow. the way add/drop column is being used here is via ALTER TABLE ... SET/RESET, which means it would use the original schema changer anyway. but i'm not sure how "mixing" supported and unsupported statements works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this up for review so quickly. I think it's a good, solid change overall, as far as I can tell (I don't know much about scheduled jobs). The drop column is dodgy of course (not your doing, it always was dodgy) and that's exactly the kind of thing the declarative schema changer will be able to help with.
I do have a couple of high level comments and questions:
-
This is missing table descriptor validation checks for the TTL descriptor. I don't think they can wait until later. Every time you add a field in the TTL proto, you should have a corresponding validation check and unit tests in
tabledesc
. This will strengthen all of your tests so, so much. -
Wouldn't it be possible to do without the new mutation types? It feels like the TTL descriptor could simply have some kind of status enum indicating whether the schedule:
a. needs to be updated,
b. needs to be deleted,
c. no-op.
ccing @ajwerner for his 👀 on this last one, he might want to weigh in on whether this is a good idea or not.
Reviewed 3 of 3 files at r1, 14 of 14 files at r2, 9 of 9 files at r3, 6 of 6 files at r4, 5 of 5 files at r5, 2 of 2 files at r6, 3 of 3 files at r7, 3 of 3 files at r8, 11 of 11 files at r9, 2 of 2 files at r10, 5 of 5 files at r11, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan)
pkg/sql/alter_table.go, line 1765 at r7 (raw file):
// Do not have to do anything here. case before != nil && after != nil: if before.DeletionCron != after.DeletionCron {
What if before you have "" and after you have "@hourly"? Do we care about those kinds of no-ops?
pkg/sql/create_table.go, line 2342 at r5 (raw file):
// defaultTTLScheduleCron is the default cron duration for row-level TTL. // defaultTTLScheduleCron cannot be a cluster setting as this would involve // changing all existing schedules to match the new setting.
Would it make sense to enforce the deletion_cron
field in the descriptor to be non-empty instead? I won't insist on it, but it feels like that would be more robust.
pkg/sql/schema_changer.go, line 388 at r8 (raw file):
if tableDesc.Dropped() && tableDesc.GetRowLevelTTL() != nil { if err := sc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { scheduleID := tableDesc.GetRowLevelTTL().ScheduleID
Is this guaranteed to be non-zero at this point?
pkg/sql/schema_changer.go, line 1222 at r11 (raw file):
} scTable.RowLevelTTL.ScheduleID = j.ScheduleID() }
nit: shouldn't this have been in an earlier commit?
pkg/sql/show_create.go, line 195 at r5 (raw file):
} if cron := ttl.DeletionCron; cron != "" { storageParams = append(storageParams, fmt.Sprintf(`ttl_delete_batch_size = '%s'`, cron))
Wrong format string. I expect this will show up in tests also.
pkg/sql/catalog/descpb/structured.proto, line 612 at r9 (raw file):
optional TableDescriptor.RowLevelTTL row_level_ttl = 1 [(gogoproto.customname) = "RowLevelTTL"]; }
Is this message likely to have more fields added to it? It doesn't seem necessary, to be honest.
pkg/sql/schema_changer_test.go, line 7516 at r9 (raw file):
setup: ` CREATE DATABASE t; USE t;
FYI, these USE t
don't seem to be necessary. I don't care much either way.
i can see this getting complicated because if something fails in between the reverse mutations will also need to dive deeper into the tabledescriptor proto and zero things out, but only on add column / drop column if it matches a certain column name. for me, the separate mutation feels cleaner, but yeah more input would be good. i'll hold off on making that change (as it is not trivial :) ) for now |
The way I understand it, mutations represent a future change to the table descriptor itself. It seemed to me that the TTL descriptor was effectively the source of truth for something not in the table descriptor, so to me it didn't fit into that mold. Of course, if there's subtle interplay with adding/removing the ttl column, that would invalidate my argument. In any case having a new type of mutation isn't the worst thing in the world but if it's avoidable it's better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing table descriptor validation checks for the TTL descriptor
done.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @hourly and @postamar)
pkg/sql/alter_table.go, line 1765 at r7 (raw file):
Previously, postamar (Marius Posta) wrote…
What if before you have "" and after you have "@hourly"? Do we care about those kinds of no-ops?
the net effect doesn't change much, rather not keep it too complicated for now.
pkg/sql/create_table.go, line 2342 at r5 (raw file):
Previously, postamar (Marius Posta) wrote…
Would it make sense to enforce the
deletion_cron
field in the descriptor to be non-empty instead? I won't insist on it, but it feels like that would be more robust.
i can see it both ways. i preferred this way as this paves the way for a "default" in future, and it makes the logic for displaying the cron on SHOW CREATE TABLE less gnarly if the user is using the default.
pkg/sql/schema_changer.go, line 388 at r8 (raw file):
Previously, postamar (Marius Posta) wrote…
Is this guaranteed to be non-zero at this point?
In theory no, but no harm in adding a check in.
Worth noting that the deleteSchedule no-ops even if it was 0 anyway
pkg/sql/show_create.go, line 195 at r5 (raw file):
Previously, postamar (Marius Posta) wrote…
Wrong format string. I expect this will show up in tests also.
nice catch, fixed.
pkg/sql/catalog/descpb/structured.proto, line 612 at r9 (raw file):
Previously, postamar (Marius Posta) wrote…
Is this message likely to have more fields added to it? It doesn't seem necessary, to be honest.
how strongly do you feel about this? i guess, ideally, i would've added this into the ColumnDescriptor
mutation with an extra field symbolizing whether to add or drop a TTL. it's easy to extend this way, and in practice i'm not sure there's too big of a difference.
i've tacked on two commits at the end, which adds the dropping of schedules on DROP TABLE on the new schema changer. it works in the logic test, but my go generate output generated a bit more than i was expecting and i'm sure i screwed up an abstraction somewhere. eyes on that bit greatly appreciated. i'm also happy to make those last two commits a separate PR to review, and disable the local declarative schema change tests for DROP TABLE clearing schedule IDs for now. let me know what your preference is, it's a pretty juicy PR as is. |
0c28f5a
to
5b564b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some testing around add a ttl expression that references a column added in the current transaction, added in the current statement. Then, to drive home my point, make adding that column fail by having an error in its default expression or check constraint.
I think when we talk about the use of a mutation, we need to unpack the expected behavior in the face of other mutations and failure. The case where you currently don't use a mutation, which I think is troubling, is when you update an existing set of parameters. If another mutation in the same transaction fails, what should happen? I'm inclined to say that you actually do need a mutation and you need to make it richer.
if err := sc.maybeUpdateScheduledJobsForRowLevelTTL(ctx, tableDesc); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the principle around doing this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is accounting for DROP TABLE (one of the commits here). is there a better place for it? i can put it in drop_table.go
if you prefer, but figured it should be as close to the DROP TABLE actually executing as possible.
I propose that we add the limitation to these TTL changes that look a lot like the alter primary key changes whereby we don't allow any of these changes in a transaction and we don't allow them concurrent with any other schema changes. That will help us alleviate many (but not all) of my concerns. |
5b564b9
to
610306f
Compare
i've tacked on 2 commits at the end which blocks schema changes / ttl config changes whilst a ttl mutation is in progress |
f335ed2
to
29d61b7
Compare
29d61b7
to
214362e
Compare
this is rebased & ready for review! the changefeed stuff is all gone now, sorry for the false pings for cdc / bulk. |
214362e
to
c60be86
Compare
"delete-schedule", | ||
mu.txn, | ||
sessiondata.InternalExecutorOverride{User: security.RootUserName()}, | ||
"DELETE FROM system.scheduled_jobs WHERE schedule_id = $1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop schedule perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i avoided using DROP SCHEDULE as it would allow users to DROP schedules as well. currently we've explicitly banned that. (also this was mostly form following the deleteSchedule
code in the sql
package; we have no hooks for this ... yet ...
c60be86
to
c80b878
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM mod minor things
FAMILY fam_0_id_text_crdb_internal_expiration (id, text, crdb_internal_expiration) | ||
) WITH (ttl = 'on', ttl_automatic_column = 'on', ttl_expire_after = '10 days':::INTERVAL) | ||
|
||
statement error cannot modify TTL whilst another schema change is in progress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this error clearer? It might be confusing because we're in an implicit transaction, also because it feels like it's all ttl
related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've gone with cannot modify TTL settings while another schema change on the table is being processed
pkg/sql/schemachanger/scplan/internal/opgen/opgen_relation_depended_on_by.go
Outdated
Show resolved
Hide resolved
@@ -906,6 +906,12 @@ func (s *TestState) SwapDescriptorSubComment( | |||
return nil | |||
} | |||
|
|||
// DeleteSchedule implements scexec.DescriptorMetadataUpdater | |||
func (s *TestState) DeleteSchedule(ctx context.Context, id int64) error { | |||
s.LogSideEffectf("delete schedule ID %d", id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that you've added this, can we get a nice data-driven test in schemachanger/testdata
. You do, setup
and statements to set up the world and then test
for the drop of the table and let it rewrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. couldn't print out the schedule ID here though as it is non deterministic.
c80b878
to
5fbc816
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the review!
@@ -906,6 +906,12 @@ func (s *TestState) SwapDescriptorSubComment( | |||
return nil | |||
} | |||
|
|||
// DeleteSchedule implements scexec.DescriptorMetadataUpdater | |||
func (s *TestState) DeleteSchedule(ctx context.Context, id int64) error { | |||
s.LogSideEffectf("delete schedule ID %d", id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. couldn't print out the schedule ID here though as it is non deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed what i can, but deferring to sql-schema on all the schema change logic.
one thought: do we need to make sure the TTL column is not referenced in a computed col or partial index def?
pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_drop_column.go
Outdated
Show resolved
Hide resolved
90a24e7
to
9e191c2
Compare
thanks!
answered in the other PR! |
bors r=rafiss,ajwerner need to run, tests are looking green, sorry if i interrupt your bors run |
This commit drops all schedules tied to the table's TTL job when the table is dropped through DROP TABLE/SCHEMA/DATABASE. Release note: None
This commit handles adding a TTL to a table using `ALTER TABLE ... SET ...`. To accomplish this, we needed to make add column finalization and adding the TTL struct / scheduled job atomic. We add a new `ModifyRowLevelTTL` mutation which the same mutation ID, with approprate rollback handling if, e.g. ADD COLUMN succeeds but setting the row-level TTL fails. Release note: None
This commit implements changing the DEFAULT and ON UPDATE expressions automatically whenever ttl_expire_after is changed. Release note: None
This commit makes `ALTER TABLE ... RESET (ttl)` drop the automatic TTL column. To do this, the `DROP COLUMN` logic is extracted into a separate function, then used along with a `ModifyRowLevelTTL` mutation. An extra check is added to the schema changer to ensure we aren't re-adding a schedule that doesn't yet exist. Release note: None
Release note: None
This commit drops the schedule ID on the new schema changer during a DROP TABLE. Release note: None
This commit prohibits dropping the TTL automatic column if TTL has been defined on the table. Release note: None
This commit blocks other schema changes whilst TTL is running, analagous to ALTER COLUMN TYPE / ALTER PRIMARY KEY. This saves the state space to think about in conjunction with other schema changes. We also block changes to TTL whilst another schema change is running for similar reasons. Release note: None
9e191c2
to
6934649
Compare
Canceled. |
i love bors silently failing due to a rebase conflict bors r=rafiss,ajwerner |
Build succeeded: |
These commits adds the "scarier" part of the row level TTL job, in particular:
This is handled in both the old and new schema changer.