-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-21.2: jobs, sql: Avoid jobs/scheduled jobs lock up #78583
release-21.2: jobs, sql: Avoid jobs/scheduled jobs lock up #78583
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
Scheduled jobs system, by default, ensures that there is only one instance of the job that is currently executing for the schedule. As such, it is not necessary to verify the compaction job does not exist when starting stats compaction job from schedule. Furthermore, due to the interaction of scheduling system, such checks results in wide system.jobs table scan, which causes scheduled job execution to be restarted if any other job modifies system.jobs table. Fixes cockroachdb#78465 Release Notes (sql): Stats compaction scheduled job no longer cause intent buildup. Release Justification: important stability fix to ensure jobs and scheduled jobs do not lock up when running stats compaction job.
Remove `schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running` metrics from job scheduler. These metrics are very expensive to compute as they involve running wider table scans against both `system.jobs` and `system.scheduled_job`. In addition to being expensive to compute, these metrics are not needed since the query can be executed directly if needed, and, in addition these metrics are confusing since these metrics are per node, while the number of running jobs/schedules is cluster wide. More importantly, they can lead to job scheduler query being more expensive since they increase the read set of the scheduler transaction, thus causing txn restarts to be more expensive. Fixes cockroachdb#78447 Release Notes (enterprise): Remove expensive, unnecessary, and never used `schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running` metrics from job schedulers. Release Justification: Stability fix for scheduled job system.
010b489
to
fd07e3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)
Backport 2/2 commits from #78467.
/cc @cockroachdb/release
Improve scheduled jobs system stability by removing expensive metrics calculations, and redundant existence check in stats compaction jobs. Each of these could/does result in a table scan, perhaps repetitively.
See commits for details.
Release Justification: scheduled job system stability improvements.