Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: Eliminate job scheduler table scan for stats #78447

Closed
miretskiy opened this issue Mar 24, 2022 · 0 comments
Closed

jobs: Eliminate job scheduler table scan for stats #78447

miretskiy opened this issue Mar 24, 2022 · 0 comments
Assignees
Labels
A-jobs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-jobs

Comments

@miretskiy
Copy link
Contributor

miretskiy commented Mar 24, 2022

Scheduler creates loop stats for each iteration of its scheduling loop:
https://github.com/cockroachdb/cockroach/blob/master/pkg/jobs/job_scheduler.go#L201
This is bad because we are combing 2 table scans (jobs and schedules) in a same txn that is then used
to update scheduled job.

Such access patterns are very hard to serialize and can cause excessive intent buildup.

Jira issue: CRDB-14135

@miretskiy miretskiy added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-jobs T-jobs labels Mar 24, 2022
@miretskiy miretskiy self-assigned this Mar 24, 2022
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Mar 25, 2022
Remove `schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running`
metrics from job scheduler.  These metrics are very expensive to compute as they
involve running wider table scans against both `system.jobs` and `system.scheduled_job`.

In addition to being expensive to compute, these metrics are not needed since
the query can be executed directly if needed, and, in addition these metrics aref confusing
since these metrics are per node, while the number of running jobs/schedules is cluster wide.
More importantly, they can lead to job scheduler query being more expensive since they increase
the read set of the scheduler transaction, thus causing txn restarts to be more expensive.

Fixes cockroachdb#78447

Release Notes (enterprise): Remove expensive, unnecessary, and never used
`schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running` metrics from
job schedulers.

Release Justification: Stability fix for scheduled job system.
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Mar 25, 2022
Remove `schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running`
metrics from job scheduler.  These metrics are very expensive to compute as they
involve running wider table scans against both `system.jobs` and `system.scheduled_job`.

In addition to being expensive to compute, these metrics are not needed since
the query can be executed directly if needed, and, in addition these metrics are confusing
since these metrics are per node, while the number of running jobs/schedules is cluster wide.
More importantly, they can lead to job scheduler query being more expensive since they increase
the read set of the scheduler transaction, thus causing txn restarts to be more expensive.

Fixes cockroachdb#78447

Release Notes (enterprise): Remove expensive, unnecessary, and never used
`schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running` metrics from
job schedulers.

Release Justification: Stability fix for scheduled job system.
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Mar 25, 2022
Remove `schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running`
metrics from job scheduler.  These metrics are very expensive to compute as they
involve running wider table scans against both `system.jobs` and `system.scheduled_job`.

In addition to being expensive to compute, these metrics are not needed since
the query can be executed directly if needed, and, in addition these metrics are confusing
since these metrics are per node, while the number of running jobs/schedules is cluster wide.
More importantly, they can lead to job scheduler query being more expensive since they increase
the read set of the scheduler transaction, thus causing txn restarts to be more expensive.

Fixes cockroachdb#78447

Release Notes (enterprise): Remove expensive, unnecessary, and never used
`schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running` metrics from
job schedulers.

Release Justification: Stability fix for scheduled job system.
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Mar 25, 2022
Remove `schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running`
metrics from job scheduler.  These metrics are very expensive to compute as they
involve running wider table scans against both `system.jobs` and `system.scheduled_job`.

In addition to being expensive to compute, these metrics are not needed since
the query can be executed directly if needed, and, in addition these metrics are confusing
since these metrics are per node, while the number of running jobs/schedules is cluster wide.
More importantly, they can lead to job scheduler query being more expensive since they increase
the read set of the scheduler transaction, thus causing txn restarts to be more expensive.

Fixes cockroachdb#78447

Release Notes (enterprise): Remove expensive, unnecessary, and never used
`schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running` metrics from
job schedulers.

Release Justification: Stability fix for scheduled job system.
@craig craig bot closed this as completed in 2a5ff76 Mar 27, 2022
blathers-crl bot pushed a commit that referenced this issue Mar 27, 2022
Remove `schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running`
metrics from job scheduler.  These metrics are very expensive to compute as they
involve running wider table scans against both `system.jobs` and `system.scheduled_job`.

In addition to being expensive to compute, these metrics are not needed since
the query can be executed directly if needed, and, in addition these metrics are confusing
since these metrics are per node, while the number of running jobs/schedules is cluster wide.
More importantly, they can lead to job scheduler query being more expensive since they increase
the read set of the scheduler transaction, thus causing txn restarts to be more expensive.

Fixes #78447

Release Notes (enterprise): Remove expensive, unnecessary, and never used
`schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running` metrics from
job schedulers.

Release Justification: Stability fix for scheduled job system.
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Mar 28, 2022
Remove `schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running`
metrics from job scheduler.  These metrics are very expensive to compute as they
involve running wider table scans against both `system.jobs` and `system.scheduled_job`.

In addition to being expensive to compute, these metrics are not needed since
the query can be executed directly if needed, and, in addition these metrics are confusing
since these metrics are per node, while the number of running jobs/schedules is cluster wide.
More importantly, they can lead to job scheduler query being more expensive since they increase
the read set of the scheduler transaction, thus causing txn restarts to be more expensive.

Fixes cockroachdb#78447

Release Notes (enterprise): Remove expensive, unnecessary, and never used
`schedules.round.schedules-ready-to-run` and `schedules.round.num-jobs-running` metrics from
job schedulers.

Release Justification: Stability fix for scheduled job system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-jobs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-jobs
Projects
None yet
Development

No branches or pull requests

1 participant