Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: create general logging job to capture index usage stats #76752

Conversation

THardy98
Copy link

@THardy98 THardy98 commented Feb 17, 2022

Resolves: #72486

Previously, we did not capture index usage metrics to our telemetry logging
channel. The intention of this change is to introduce a scheduler for jobs that
emit logging events.

Currently, we want to capture index usage statistics to the telemetry logging
channel on a time interval, but being able to emit scheduled logs for different
statistics/events would be useful telemetry going forward, which motivates the
creation of a scheduler that supports different logging jobs.

This change introduces the scheduledloggingjobs package.
CreateScheduledLoggingJob in scheduledloggingjobs, allows you to create a
scheduler with parameters for:

  • label
  • schedule interval
  • executor type
  • schedule details
  • execution arguments

The Resumer and Executor logic for the scheduled jobs resides in
sql/scheduled_logging_jobs.go. The Resumer receives a callback function that
returns the logging events it wants to emit. The Executor receives the job
details/progress (used to create the job record when the job is executed) and
job type (to check that a job of the same type isn't already running when we
try to create the job).

The scheduledloggingjobs package also contains a Controller whose purpose
is to Start all the scheduled logging jobs in its interface. The
Controller's Start method is hooked into the startup of the sql.Server.

Assuming its jobs.proto definitions and wrap.go cases are covered, a new
logging job can be added to this by:

  • defining its resume callback function
  • registering its log resumer and executor with
    RegisterLogResumerAndScheduleExecutor
  • defining its arguments (label, schedule interval, executor type, schedule
    details, execution arguments) and calling CreateScheduledLoggingJob
  • adding it to the Controller interface, and Start method (if you want to
    create the scheduled job on server startup)

This change was made for internal use in mind, I didn't consider a SQL
interface to create these schedules.

Release note (sql change): Initial implementation of a general logging job,
used to create a job that captures index usage statistics to the telemetry
logging channel.

@THardy98 THardy98 requested review from a team as code owners February 17, 2022 20:00
@THardy98 THardy98 requested review from ajwerner and removed request for a team February 17, 2022 20:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@THardy98 THardy98 force-pushed the event_wrapper_index_usage_telemetry branch from 42b6818 to b6fdb20 Compare February 17, 2022 20:15
@THardy98 THardy98 marked this pull request as draft February 17, 2022 20:15
@THardy98 THardy98 requested a review from Azhng February 17, 2022 20:15
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 20 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @THardy98)


-- commits, line 11 at r2:

Is there an epic/doc/issue you can point me to that describes what this is supposed to accomplish?
Is there an SQL interface for creating such schedules? What's the intended usage and use case? Basically, any information
that can help me review the code better.

Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 20 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @THardy98)


pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 31 at r2 (raw file):

//////////////////////////////////////
//////// SCHEDULED JOB LOGIC	////////
//////////////////////////////////////

nit: we don't really have this comment style in our codebase...


pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 150 at r2 (raw file):

// If an existing logging job of the same type is found, an error is returned,
// preventing a duplicate job from being created.
func checkExistingLoggingJob(

normally, this should be needed -- scheduled job system, by default, ensures only 1 instance of job run at a time...

@THardy98 THardy98 force-pushed the event_wrapper_index_usage_telemetry branch 3 times, most recently from b57dade to 6e87ad3 Compare February 17, 2022 21:37
Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, @miretskiy, and @THardy98)


-- commits, line 11 at r2:

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Is there an epic/doc/issue you can point me to that describes what this is supposed to accomplish?
Is there an SQL interface for creating such schedules? What's the intended usage and use case? Basically, any information
that can help me review the code better.

Linked the issue this is intended to resolve, and a quick note about the idea behind this change.

Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @miretskiy)


pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 31 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

nit: we don't really have this comment style in our codebase...

Removed.


pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 150 at r2 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

normally, this should be needed -- scheduled job system, by default, ensures only 1 instance of job run at a time...

Oh I didn't know that. How does it check whether an instance is of a job is already running?

Resolves: cockroachdb#72486

Previously, we did not capture index usage metrics to our telemetry logging
channel. The intention of this change is to introduce a scheduler for jobs that
emit logging events.

Currently, we want to capture index usage statistics to the telemetry logging
channel on a time interval, but being able to emit scheduled logs for different
statistics/events would be useful telemetry going forward, which motivates the
creation of a scheduler that supports different logging jobs.

This change introduces the `scheduledloggingjobs` package.
`CreateScheduledLoggingJob` in `scheduledloggingjobs`, allows you to create a
scheduler with parameters for:
- label
- schedule interval
- executor type
- schedule details
- execution arguments

The Resumer and Executor logic for the scheduled jobs resides in
`sql/scheduled_logging_jobs.go`. The Resumer receives a callback function that
returns the logging events it wants to emit. The Executor receives the job
details/progress (used to create the job record when the job is executed) and
job type (to check that a job of the same type isn't already running when we
try to create the job).

The `scheduledloggingjobs` package also contains a `Controller` whose purpose
is to `Start` all the scheduled logging jobs in its interface. The
`Controller`'s `Start` method is hooked into the startup of the `sql.Server`.

Assuming its `jobs.proto` definitions and `wrap.go` cases are covered, a new
logging job can be added to this by:
- defining its resume callback function
- registering its log resumer and executor with
  `RegisterLogResumerAndScheduleExecutor`
- defining its arguments (label, schedule interval, executor type, schedule
  details, execution arguments) and calling `CreateScheduledLoggingJob`
- adding it to the `Controller` interface, and `Start` method (if you want to
  create the scheduled job on server startup)

This change was made for internal use in mind, I didn't consider a SQL
interface to create these schedules.

Release note (sql change): Initial implementation of a general logging job,
used to create a job that captures index usage statistics to the telemetry
logging channel.
@THardy98 THardy98 force-pushed the event_wrapper_index_usage_telemetry branch from 6e87ad3 to b5db485 Compare February 17, 2022 22:06
Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)


pkg/sql/conn_executor.go, line 488 at r4 (raw file):

	s.txnIDCache.Start(ctx, stopper)

	s.scheduledLoggingJobsController.Start(ctx)

.Start(ctx) function here usually implies that there are one or more long-running background goroutines that need start using the stopper here. In CRDB we don't directly start a goroutine using the go keyword. Using the stopper here allows us to stop all the background goroutines when gracefully shutting down a server.

Looking at the scheduledLoggingJobsController, it seems like the Start() method simply starts a transaction and create the scheduled job record. This logic can live within the constructor of the object. Alternatively, if you want to follow the automatic job pattern, then you can kick off a background goroutine here that periodically wake up to check if there exists a schedule job record for your scheduled job.


pkg/sql/scheduled_logging_job.go, line 32 at r4 (raw file):

func init() {
	RegisterLogResumerAndScheduleExecutor(

Since you can only register jobs within the sql package, perhaps there's no need exporting this function ?


pkg/sql/scheduledloggingjobs/captured_index_usage_stats.go, line 39 at r4 (raw file):

	"sql.capture_index_usage_stats.telemetry.interval",
	"cron-tab interval to capture index usage statistics to the telemetry logging channel",
	"0 */8 * * *", /* defaultValue */

Hmm, what does this cronexpr mean in english 🤔 ?


pkg/sql/scheduledloggingjobs/captured_index_usage_stats.go, line 92 at r4 (raw file):

	expectedNumDatums = 10
	var allEvents []eventpb.EventPayload
	for _, databaseName := range allDatabaseNames {

What happen if the jobs is paused or cancelled while this loop is still running?


pkg/sql/scheduledloggingjobs/scheduled_logging_job_controller.go, line 64 at r4 (raw file):

// createCaptureIndexUsageStatsScheduledJob creates all scheduled logging jobs.
// createCaptureIndexUsageStatsScheduledJob implements the sql.Controller interface.
func (c *LoggingJobsController) createCaptureIndexUsageStatsScheduledJob(

During cluster startup, every single node will go and attempt to create this scheduled job (and since this schedule already exist, all will fail). This code is synchronous, which means this will block the cluster startup due to contention.

@THardy98 THardy98 requested a review from miretskiy February 18, 2022 17:01
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)


pkg/sql/scheduledloggingjobs/scheduled_logging_job.go, line 150 at r2 (raw file):

Previously, THardy98 (Thomas Hardy) wrote…

Oh I didn't know that. How does it check whether an instance is of a job is already running?

When you specify CreatedBy field when creating new job, that links the job with the schedule; and the scheduler will (by default) wait for the previous job completion
before starting the new one.

@miretskiy miretskiy self-requested a review February 18, 2022 19:58
@miretskiy
Copy link
Contributor

I'm curious: why is this whole thing implemented as a job? Jobs are long running processes, that must survive
restarts. Critically, jobs infrastructure provides ability for a long running job to checkpoint its progress.
As far as I can tell -- nothing here says job ... Just that you want to have a task that runs once every 8 hours, and does something. Sounds like a daemon go routine that sleeps for 8 hours and wakes up periodically to do its thing?

Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were two reasons:

  • the job system offered an existing way to create a long running scheduled background process
  • with the idea to support different log-emitting jobs in the future, some of those jobs may make use other features of the job system like checkpointing (although this job doesn't)

That being said, maybe it's overkill and unlikely that background logging would make use of the job's system effectively.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)

Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should err on the side of simplicity. If this particular use case doesn't need jobs, then, let's not add another job type and schedule. If feature uses prove the need to have that, then (as can be seen by this PR), unifying the code to use new schedule/job type is fairly straight forward.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)

@miretskiy miretskiy self-requested a review February 18, 2022 22:59
Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll close this PR. I've created PR #76886 in its place. Feel free to take a look at that if you like, although it doesn't use the jobs system.

TYFR @miretskiy and @Azhng 👍

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @THardy98)

@THardy98 THardy98 closed this Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Capture index usage metrics in telemetry logging channel
4 participants