sql: create general logging job to capture index usage stats #76752

THardy98 · 2022-02-17T20:00:20Z

Resolves: #72486

Previously, we did not capture index usage metrics to our telemetry logging
channel. The intention of this change is to introduce a scheduler for jobs that
emit logging events.

Currently, we want to capture index usage statistics to the telemetry logging
channel on a time interval, but being able to emit scheduled logs for different
statistics/events would be useful telemetry going forward, which motivates the
creation of a scheduler that supports different logging jobs.

This change introduces the scheduledloggingjobs package.
CreateScheduledLoggingJob in scheduledloggingjobs, allows you to create a
scheduler with parameters for:

label
schedule interval
executor type
schedule details
execution arguments

The Resumer and Executor logic for the scheduled jobs resides in
sql/scheduled_logging_jobs.go. The Resumer receives a callback function that
returns the logging events it wants to emit. The Executor receives the job
details/progress (used to create the job record when the job is executed) and
job type (to check that a job of the same type isn't already running when we
try to create the job).

The scheduledloggingjobs package also contains a Controller whose purpose
is to Start all the scheduled logging jobs in its interface. The
Controller's Start method is hooked into the startup of the sql.Server.

Assuming its jobs.proto definitions and wrap.go cases are covered, a new
logging job can be added to this by:

defining its resume callback function
registering its log resumer and executor with
RegisterLogResumerAndScheduleExecutor
defining its arguments (label, schedule interval, executor type, schedule
details, execution arguments) and calling CreateScheduledLoggingJob
adding it to the Controller interface, and Start method (if you want to
create the scheduled job on server startup)

This change was made for internal use in mind, I didn't consider a SQL
interface to create these schedules.

Release note (sql change): Initial implementation of a general logging job,
used to create a job that captures index usage statistics to the telemetry
logging channel.

cockroach-teamcity · 2022-02-17T20:00:28Z

This change is

miretskiy

Reviewed 3 of 20 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @THardy98)

-- commits, line 11 at r2:

Is there an epic/doc/issue you can point me to that describes what this is supposed to accomplish?
Is there an SQL interface for creating such schedules? What's the intended usage and use case? Basically, any information
that can help me review the code better.

miretskiy

Reviewed 1 of 20 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @THardy98)

pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 31 at r2 (raw file):

//////////////////////////////////////
//////// SCHEDULED JOB LOGIC	////////
//////////////////////////////////////

nit: we don't really have this comment style in our codebase...

pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 150 at r2 (raw file):

// If an existing logging job of the same type is found, an error is returned,
// preventing a duplicate job from being created.
func checkExistingLoggingJob(

normally, this should be needed -- scheduled job system, by default, ensures only 1 instance of job run at a time...

THardy98

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, @miretskiy, and @THardy98)

-- commits, line 11 at r2:

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Is there an epic/doc/issue you can point me to that describes what this is supposed to accomplish?
Is there an SQL interface for creating such schedules? What's the intended usage and use case? Basically, any information
that can help me review the code better.

Linked the issue this is intended to resolve, and a quick note about the idea behind this change.

THardy98

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @miretskiy)

pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 31 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

nit: we don't really have this comment style in our codebase...

Removed.

pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 150 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

normally, this should be needed -- scheduled job system, by default, ensures only 1 instance of job run at a time...

Oh I didn't know that. How does it check whether an instance is of a job is already running?

Resolves: cockroachdb#72486 Previously, we did not capture index usage metrics to our telemetry logging channel. The intention of this change is to introduce a scheduler for jobs that emit logging events. Currently, we want to capture index usage statistics to the telemetry logging channel on a time interval, but being able to emit scheduled logs for different statistics/events would be useful telemetry going forward, which motivates the creation of a scheduler that supports different logging jobs. This change introduces the `scheduledloggingjobs` package. `CreateScheduledLoggingJob` in `scheduledloggingjobs`, allows you to create a scheduler with parameters for: - label - schedule interval - executor type - schedule details - execution arguments The Resumer and Executor logic for the scheduled jobs resides in `sql/scheduled_logging_jobs.go`. The Resumer receives a callback function that returns the logging events it wants to emit. The Executor receives the job details/progress (used to create the job record when the job is executed) and job type (to check that a job of the same type isn't already running when we try to create the job). The `scheduledloggingjobs` package also contains a `Controller` whose purpose is to `Start` all the scheduled logging jobs in its interface. The `Controller`'s `Start` method is hooked into the startup of the `sql.Server`. Assuming its `jobs.proto` definitions and `wrap.go` cases are covered, a new logging job can be added to this by: - defining its resume callback function - registering its log resumer and executor with `RegisterLogResumerAndScheduleExecutor` - defining its arguments (label, schedule interval, executor type, schedule details, execution arguments) and calling `CreateScheduledLoggingJob` - adding it to the `Controller` interface, and `Start` method (if you want to create the scheduled job on server startup) This change was made for internal use in mind, I didn't consider a SQL interface to create these schedules. Release note (sql change): Initial implementation of a general logging job, used to create a job that captures index usage statistics to the telemetry logging channel.

Azhng

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)

pkg/sql/conn_executor.go, line 488 at r4 (raw file):

	s.txnIDCache.Start(ctx, stopper)

	s.scheduledLoggingJobsController.Start(ctx)

.Start(ctx) function here usually implies that there are one or more long-running background goroutines that need start using the stopper here. In CRDB we don't directly start a goroutine using the go keyword. Using the stopper here allows us to stop all the background goroutines when gracefully shutting down a server.

Looking at the scheduledLoggingJobsController, it seems like the Start() method simply starts a transaction and create the scheduled job record. This logic can live within the constructor of the object. Alternatively, if you want to follow the automatic job pattern, then you can kick off a background goroutine here that periodically wake up to check if there exists a schedule job record for your scheduled job.

pkg/sql/scheduled_logging_job.go, line 32 at r4 (raw file):

func init() {
	RegisterLogResumerAndScheduleExecutor(

Since you can only register jobs within the sql package, perhaps there's no need exporting this function ?

pkg/sql/scheduledloggingjobs/captured_index_usage_stats.go, line 39 at r4 (raw file):

	"sql.capture_index_usage_stats.telemetry.interval",
	"cron-tab interval to capture index usage statistics to the telemetry logging channel",
	"0 */8 * * *", /* defaultValue */

Hmm, what does this cronexpr mean in english 🤔 ?

pkg/sql/scheduledloggingjobs/captured_index_usage_stats.go, line 92 at r4 (raw file):

	expectedNumDatums = 10
	var allEvents []eventpb.EventPayload
	for _, databaseName := range allDatabaseNames {

What happen if the jobs is paused or cancelled while this loop is still running?

pkg/sql/scheduledloggingjobs/scheduled_logging_job_controller.go, line 64 at r4 (raw file):

// createCaptureIndexUsageStatsScheduledJob creates all scheduled logging jobs.
// createCaptureIndexUsageStatsScheduledJob implements the sql.Controller interface.
func (c *LoggingJobsController) createCaptureIndexUsageStatsScheduledJob(

During cluster startup, every single node will go and attempt to create this scheduled job (and since this schedule already exist, all will fail). This code is synchronous, which means this will block the cluster startup due to contention.

miretskiy

Reviewed 1 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)

pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 150 at r2 (raw file):

Previously, THardy98 (Thomas Hardy) wrote…

Oh I didn't know that. How does it check whether an instance is of a job is already running?

When you specify CreatedBy field when creating new job, that links the job with the schedule; and the scheduler will (by default) wait for the previous job completion
before starting the new one.

miretskiy · 2022-02-18T20:49:07Z

I'm curious: why is this whole thing implemented as a job? Jobs are long running processes, that must survive
restarts. Critically, jobs infrastructure provides ability for a long running job to checkpoint its progress.
As far as I can tell -- nothing here says job ... Just that you want to have a task that runs once every 8 hours, and does something. Sounds like a daemon go routine that sleeps for 8 hours and wakes up periodically to do its thing?

THardy98

There were two reasons:

the job system offered an existing way to create a long running scheduled background process
with the idea to support different log-emitting jobs in the future, some of those jobs may make use other features of the job system like checkpointing (although this job doesn't)

That being said, maybe it's overkill and unlikely that background logging would make use of the job's system effectively.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)

miretskiy

I think we should err on the side of simplicity. If this particular use case doesn't need jobs, then, let's not add another job type and schedule. If feature uses prove the need to have that, then (as can be seen by this PR), unifying the code to use new schedule/job type is fairly straight forward.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)

THardy98

Sounds good, I'll close this PR. I've created PR #76886 in its place. Feel free to take a look at that if you like, although it doesn't use the jobs system.

TYFR @miretskiy and @Azhng 👍

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)

THardy98 requested review from a team as code owners February 17, 2022 20:00

THardy98 requested review from ajwerner and removed request for a team February 17, 2022 20:00

THardy98 force-pushed the event_wrapper_index_usage_telemetry branch from 42b6818 to b6fdb20 Compare February 17, 2022 20:15

THardy98 marked this pull request as draft February 17, 2022 20:15

THardy98 requested a review from Azhng February 17, 2022 20:15

miretskiy reviewed Feb 17, 2022

View reviewed changes

THardy98 force-pushed the event_wrapper_index_usage_telemetry branch 3 times, most recently from b57dade to 6e87ad3 Compare February 17, 2022 21:37

THardy98 commented Feb 17, 2022

View reviewed changes

THardy98 force-pushed the event_wrapper_index_usage_telemetry branch from 6e87ad3 to b5db485 Compare February 17, 2022 22:06

Azhng reviewed Feb 17, 2022

View reviewed changes

THardy98 requested a review from miretskiy February 18, 2022 17:01

miretskiy reviewed Feb 18, 2022

View reviewed changes

miretskiy self-requested a review February 18, 2022 19:58

THardy98 commented Feb 18, 2022

View reviewed changes

miretskiy reviewed Feb 18, 2022

View reviewed changes

miretskiy self-requested a review February 18, 2022 22:59

THardy98 commented Feb 22, 2022

View reviewed changes

THardy98 closed this Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: create general logging job to capture index usage stats #76752

sql: create general logging job to capture index usage stats #76752

THardy98 commented Feb 17, 2022 •

edited

Loading

cockroach-teamcity commented Feb 17, 2022

miretskiy left a comment

miretskiy left a comment

THardy98 left a comment

THardy98 left a comment

Azhng left a comment

miretskiy left a comment

miretskiy commented Feb 18, 2022

THardy98 left a comment

miretskiy left a comment

THardy98 left a comment •

edited

Loading

sql: create general logging job to capture index usage stats #76752

sql: create general logging job to capture index usage stats #76752

Conversation

THardy98 commented Feb 17, 2022 • edited Loading

cockroach-teamcity commented Feb 17, 2022

miretskiy left a comment

Choose a reason for hiding this comment

miretskiy left a comment

Choose a reason for hiding this comment

THardy98 left a comment

Choose a reason for hiding this comment

THardy98 left a comment

Choose a reason for hiding this comment

Azhng left a comment

Choose a reason for hiding this comment

miretskiy left a comment

Choose a reason for hiding this comment

miretskiy commented Feb 18, 2022

THardy98 left a comment

Choose a reason for hiding this comment

miretskiy left a comment

Choose a reason for hiding this comment

THardy98 left a comment • edited Loading

Choose a reason for hiding this comment

THardy98 commented Feb 17, 2022 •

edited

Loading

THardy98 left a comment •

edited

Loading