Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ttl: show table name for jobs in SHOW SCHEDULES #107294

Merged

Conversation

chrisseto
Copy link
Contributor

@chrisseto chrisseto commented Jul 20, 2023

Previously, SHOW SCHEDULES would only show a table's ID for row level TTL jobs. For the convenience of debugging, this commit upgrades the label column to contain the table's name instead of its ID.

Rather than performing table name resolution at query time, the job name is set at creation time and updated by the schema change during table renames. This minimizes the likelihood of SHOW SCHEDULES being rendered unusable when a cluster is in an unstable state.

Epic: CRDB-18322
Fixes: #93774

@chrisseto chrisseto requested review from a team as code owners July 20, 2023 18:54
@chrisseto chrisseto requested review from rhu713 and removed request for a team July 20, 2023 18:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@chrisseto
Copy link
Contributor Author

I was a bit uncertain about how to best handle certain cases. These are marked as "TODO" and I intend to resolve them before merging. I would appreciate any input!

if tableDesc.HasRowLevelTTL() {
const (
opname = `update ttl job names`
query = `update system.scheduled_jobs set schedule_name = 'row-level-ttl-' || $1 WHERE schedule_id = $2`
Copy link
Contributor

@Xiang-Gu Xiang-Gu Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it better to use capitals and some parenthesis here like

UPDATE system.scheduled_jobs SET schedule_name = ('row-level-ttl-' || $1) WHERE schedule_id = $2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sj.SetOwner(owner)
sj.SetScheduleDetails(jobspb.ScheduleDetails{
Wait: jobspb.ScheduleDetails_WAIT,
// If a job fails, try again at the allocated cron time.
OnError: jobspb.ScheduleDetails_RETRY_SCHED,
})

if err := sj.SetSchedule(ttl.DeletionCronOrDefault()); err != nil {
// TODO: Is it better to check that RowLevelTTL is not nil and return an error?
Copy link
Member

@dt dt Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TODO(username(s)): here/below.

(If they're to be discussed in review and resolved before merging, can do TODO(BEFORE MERGING) or something to indicate as such)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to dt's point

Also, If we're setting schedule for a nil TTL, it's a critical bug that we should let it fail hard. but it's not nice to panic. We could return a assertion error with errors.AssertionFailedf to feel safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. This was intended to be a discussion at review time. I'll use TODO(BEFORE MERGING) in the future!

Also, If we're setting schedule for a nil TTL, it's a critical bug that we should let it fail hard. but it's not nice to panic. We could return a assertion error with errors.AssertionFailedf to feel safe.

Agreed that panicking isn't great. More generally, is the style in CRDB is to assume that programmers are doing the right thing or is it to be paranoid? Because of how this function got refactored and how go's type system works, callers could accidentally cause nil dereferences. I find myself running into these types of cases frequently. In general I prefer to write types in such a way that this can't happen but that's not always on option (See: this PR). My personal rule is to be paranoid at boundaries and expect well shaped data inside of said boundaries. I don't really have a sense of what is or is not considered to be a boundary yet. sql.CreateRowLevelTTLScheduledJob passes tblDesc right through, so we could do a check at the package level but this metaphorical boundary might exist somewhere else instead. What would either of you do for this specific situation and does that follow a more generalizable rule?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm, I think you already made good points here, we can do the check in CreateRowLevelTTLScheduledJob given it's a public function at the package level. I personally often makes assumptions in non-public functions, of course sometimes do such kind of validations in a helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added the check to CreateRowLevelTTLScheduleJob.

Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! only minor things

----
0

subtest end

# Ensure that SHOW SCHEDULES's label column is updated to reflect the new table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test case where the schedule name uses id (mock with a manual update in test) and it's changed to use name after renaming the table? Just want to make sure your update query works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -232,6 +233,29 @@ func (n *renameTableNode) startExec(params runParams) error {
return err
}

// If this table has row level ttl enabled, update the schedule_name of all
// row-level-ttl jobs with the name table name.
if tableDesc.HasRowLevelTTL() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe extract this logic into a separate helper function. I'm ok with what it's like now though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a super clear place to put it right now. Could we save the refactor until rename table is implemented in the declarative schema changer? I think having two consumers will make the eventual "home" of this logic much more clear.

sj.SetOwner(owner)
sj.SetScheduleDetails(jobspb.ScheduleDetails{
Wait: jobspb.ScheduleDetails_WAIT,
// If a job fails, try again at the allocated cron time.
OnError: jobspb.ScheduleDetails_RETRY_SCHED,
})

if err := sj.SetSchedule(ttl.DeletionCronOrDefault()); err != nil {
// TODO: Is it better to check that RowLevelTTL is not nil and return an error?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to dt's point

Also, If we're setting schedule for a nil TTL, it's a critical bug that we should let it fail hard. but it's not nice to panic. We could return a assertion error with errors.AssertionFailedf to feel safe.

opname = `update ttl job names`
query = `update system.scheduled_jobs set schedule_name = 'row-level-ttl-' || $1 WHERE schedule_id = $2`
)
// TODO Should the number of modified rows be inspected here? Could log a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think renaming a table should handle this. It's part of what the TTL infra itself should check. cc @ecwall to see if there is a strong guarantee here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! I've removed this comment. Happy to revisit if Evan has opinions.

@chrisseto chrisseto force-pushed the show-schedules-row-ttl-table-names branch 2 times, most recently from d664445 to 8fc3b5f Compare July 21, 2023 20:21
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @chrisseto, @dt, @ecwall, and @rhu713)


pkg/sql/rename_table.go line 238 at r1 (raw file):

Previously, chrisseto (Chris Seto) wrote…

I don't think there's a super clear place to put it right now. Could we save the refactor until rename table is implemented in the declarative schema changer? I think having two consumers will make the eventual "home" of this logic much more clear.

Lets move this over to MetaDataUpdater then the declarative side can easily consume it

Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @chrisseto, @dt, @ecwall, and @rhu713)


pkg/sql/rename_table.go line 238 at r1 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

Lets move this over to MetaDataUpdater then the declarative side can easily consume it

*MetadataUpdater

@chengxiong-ruan
Copy link
Contributor

pkg/sql/rename_table.go line 238 at r1 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

*MetadataUpdater

Good idea. I like that.

@chrisseto chrisseto force-pushed the show-schedules-row-ttl-table-names branch 2 times, most recently from fac4a98 to 933cf8f Compare July 24, 2023 17:29
Copy link
Contributor Author

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @dt, @ecwall, @fqazi, and @rhu713)


pkg/sql/rename_table.go line 238 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Good idea. I like that.

Oh nice! That's the perfect spot. Moved.


pkg/sql/rename_table.go line 243 at r1 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Okay! I've removed this comment. Happy to revisit if Evan has opinions.

Done.

@chrisseto chrisseto force-pushed the show-schedules-row-ttl-table-names branch from 933cf8f to 2b26b9f Compare July 24, 2023 18:48
Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @ecwall, @fqazi, and @rhu713)

@chrisseto chrisseto force-pushed the show-schedules-row-ttl-table-names branch from 2b26b9f to 608e989 Compare July 25, 2023 15:58
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @chrisseto, @dt, @ecwall, @fqazi, and @rhu713)


pkg/sql/create_table.go line 2434 at r5 (raw file):

) (*jobs.ScheduledJob, error) {
	sj := jobs.NewScheduledJob(env)
	sj.SetScheduleLabel(fmt.Sprintf("row-level-ttl-%s", tblDesc.GetName()))

nit: i think we may actually want both the name and ID.

the primary reason is that table names are not unique - the fully qualified name of database_name.schema_name.table_name is, but it doesn't seem like we have those other names easily available. (and it just seems like a headache to make all those RENAME operations get reflected in this schedule label too.)

Copy link
Contributor Author

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @dt, @ecwall, @fqazi, @rafiss, and @rhu713)


pkg/sql/create_table.go line 2434 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: i think we may actually want both the name and ID.

the primary reason is that table names are not unique - the fully qualified name of database_name.schema_name.table_name is, but it doesn't seem like we have those other names easily available. (and it just seems like a headache to make all those RENAME operations get reflected in this schedule label too.)

Is there a need to make the label unique? The table ID is available in the "args" column, if disambiguation is needed. I do think that fully qualified name would be much more helpful, I just didn't know how to get it. 😅

root@localhost:36257/defaultdb2> SHOW SCHEDULES;

          id         |        label         | schedule_status |        next_run        |   state   | recurrence | jobsrunning | owner |            created            | on_wait |  on_error   |     command
---------------------+----------------------+-----------------+------------------------+-----------+------------+-------------+-------+-------------------------------+---------+-------------+-------------------
  883280186681032705 | sql-schema-telemetry | ACTIVE          | 2023-07-24 09:37:00+00 | pending   | 37 9 * * 1 |           0 | node  | 2023-07-17 20:35:20.996534+00 | SKIP    | RETRY_SCHED | {}
  883280186749747201 | sql-stats-compaction | ACTIVE          | 2023-07-21 21:00:00+00 | succeeded | @hourly    |           0 | node  | 2023-07-17 20:35:21.038388+00 | SKIP    | RETRY_SCHED | {}
  883280439706157057 | row-level-ttl-104    | ACTIVE          | 2023-07-21 21:00:00+00 | NULL      | @hourly    |           0 | root  | 2023-07-17 20:36:38.237751+00 | WAIT    | RETRY_SCHED | {"tableId": 104}
  885528023598104577 | row-level-ttl-tbl    | ACTIVE          | 2023-07-25 20:00:00+00 | NULL      | @hourly    |           0 | root  | 2023-07-25 19:08:26.403993+00 | WAIT    | RETRY_SCHED | {"tableId": 107}
  885528044534595585 | row-level-ttl-tbl    | ACTIVE          | 2023-07-25 20:00:00+00 | NULL      | @hourly    |           0 | root  | 2023-07-25 19:08:32.797117+00 | WAIT    | RETRY_SCHED | {"tableId": 108}

Hmmmmm, you know, I added this example to showcase my point but I think it's made me agree with you. The fully qualified name would be much nicer but we'd have to hook into database and schema renames as well and it would make the labels MUCH longer. Isrow-level-ttl-107-tbl a good middle ground? If there's no limitation on the format of label, row-level-ttl: 107 (tbl) or something up that alley might be a bit more pleasant to read.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @chrisseto, @dt, @ecwall, @fqazi, and @rhu713)


pkg/sql/create_table.go line 2434 at r5 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Is there a need to make the label unique? The table ID is available in the "args" column, if disambiguation is needed. I do think that fully qualified name would be much more helpful, I just didn't know how to get it. 😅

root@localhost:36257/defaultdb2> SHOW SCHEDULES;

          id         |        label         | schedule_status |        next_run        |   state   | recurrence | jobsrunning | owner |            created            | on_wait |  on_error   |     command
---------------------+----------------------+-----------------+------------------------+-----------+------------+-------------+-------+-------------------------------+---------+-------------+-------------------
  883280186681032705 | sql-schema-telemetry | ACTIVE          | 2023-07-24 09:37:00+00 | pending   | 37 9 * * 1 |           0 | node  | 2023-07-17 20:35:20.996534+00 | SKIP    | RETRY_SCHED | {}
  883280186749747201 | sql-stats-compaction | ACTIVE          | 2023-07-21 21:00:00+00 | succeeded | @hourly    |           0 | node  | 2023-07-17 20:35:21.038388+00 | SKIP    | RETRY_SCHED | {}
  883280439706157057 | row-level-ttl-104    | ACTIVE          | 2023-07-21 21:00:00+00 | NULL      | @hourly    |           0 | root  | 2023-07-17 20:36:38.237751+00 | WAIT    | RETRY_SCHED | {"tableId": 104}
  885528023598104577 | row-level-ttl-tbl    | ACTIVE          | 2023-07-25 20:00:00+00 | NULL      | @hourly    |           0 | root  | 2023-07-25 19:08:26.403993+00 | WAIT    | RETRY_SCHED | {"tableId": 107}
  885528044534595585 | row-level-ttl-tbl    | ACTIVE          | 2023-07-25 20:00:00+00 | NULL      | @hourly    |           0 | root  | 2023-07-25 19:08:32.797117+00 | WAIT    | RETRY_SCHED | {"tableId": 108}

Hmmmmm, you know, I added this example to showcase my point but I think it's made me agree with you. The fully qualified name would be much nicer but we'd have to hook into database and schema renames as well and it would make the labels MUCH longer. Isrow-level-ttl-107-tbl a good middle ground? If there's no limitation on the format of label, row-level-ttl: 107 (tbl) or something up that alley might be a bit more pleasant to read.

right, your example is what i had in mind. i think both ID and table name is good to be both helpful enough and provide full disambiguation for the users who do have the same table names in different places. a convention we use sometimes is to put the ID in square brackets. something like row-level-ttl: table_name [107]

Copy link
Contributor Author

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @dt, @ecwall, @fqazi, @rafiss, and @rhu713)


pkg/sql/create_table.go line 2434 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

right, your example is what i had in mind. i think both ID and table name is good to be both helpful enough and provide full disambiguation for the users who do have the same table names in different places. a convention we use sometimes is to put the ID in square brackets. something like row-level-ttl: table_name [107]

I like that! I'll go ahead and pull this all into a function in the TTL package so it's easier to refactor in the future.

@chrisseto chrisseto force-pushed the show-schedules-row-ttl-table-names branch from 608e989 to d3ed6c1 Compare July 25, 2023 21:10
@chrisseto chrisseto requested a review from rafiss July 25, 2023 21:11
@chrisseto chrisseto force-pushed the show-schedules-row-ttl-table-names branch from d3ed6c1 to 8f50475 Compare July 26, 2023 18:55
Previously, `SHOW SCHEDULES` would only show a table's ID for row level
TTL jobs. For the convenience of debugging, this commit upgrades the
`label` column to contain the table's name instead of its ID.

Rather than performing table name resolution at query time, the job name
is set at creation time and updated by the schema change during table
renames. This minimizes the likelihood of `SHOW SCHEDULES` being
rendered unusable when a cluster is in an unstable state.

Epic: CRDB-18322
Fixes: 93774
@chrisseto chrisseto force-pushed the show-schedules-row-ttl-table-names branch from 8f50475 to fc59204 Compare July 26, 2023 20:59
@chrisseto
Copy link
Contributor Author

Friendly ping @rafiss :)

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good! thanks for the ping

Reviewed 2 of 7 files at r1, 1 of 4 files at r2, 16 of 16 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @dt, @ecwall, @fqazi, and @rhu713)

@chrisseto
Copy link
Contributor Author

TFTR!

bors r+

@chrisseto
Copy link
Contributor Author

Hmm, I just realized that there likely isn't going to be a test that will fail once rename table is implemented in the declarative schema changer. I'll find or find an issue to make sure this doesn't get forgotten.

@craig
Copy link
Contributor

craig bot commented Aug 1, 2023

Build succeeded:

@craig craig bot merged commit 509f8c8 into cockroachdb:master Aug 1, 2023
@chrisseto chrisseto deleted the show-schedules-row-ttl-table-names branch August 1, 2023 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ttljob: SHOW SCHEDULES should display table name alongside table ID where appropriate
7 participants