Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: optimize query for find-running-jobs-of-type #107405

Open
mgartner opened this issue Jul 21, 2023 · 6 comments
Open

sql: optimize query for find-running-jobs-of-type #107405

mgartner opened this issue Jul 21, 2023 · 6 comments
Labels
branch-master Failures and bugs on the master branch. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Jul 21, 2023

The RunningJobExists function issues this query to check for running auto-stats jobs:

SELECT id FROM public.jobs
WHERE job_type IN ('CREATE STATS', 'AUTO CREATE STATS')
  AND status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused')
ORDER BY created
LIMIT 1;

The query plan includes an inefficient index join:

  planning time: 750µs
  execution time: 617ms
  distribution: local
  vectorized: true
  rows read from KV: 101,520 (9.9 MiB, 25 gRPC calls)
  cumulative time spent in KV: 614ms
  maximum memory usage: 60 MiB
  network usage: 0 B (0 messages)
  sql cpu time: 84ms
  regions: aws-us-east-1

  • top-k
  │ nodes: n6
  │ regions: aws-us-east-1
  │ actual row count: 0
  │ estimated max memory allocated: 0 B
  │ estimated max sql temp disk usage: 0 B
  │ sql cpu time: 18µs
  │ order: +created
  │ k: 1
  │
  └── • filter
      │ nodes: n6
      │ regions: aws-us-east-1
      │ actual row count: 0
      │ sql cpu time: 2ms
      │ filter: status IN ('cancel-requested', 'pause-requested', 'paused', 'pending', 'reverting', 'running')
      │
      └── • index join
          │ nodes: n6
          │ regions: aws-us-east-1
          │ actual row count: 50,760
          │ KV time: 413ms
          │ KV contention time: 0µs
          │ KV rows read: 50,760
          │ KV bytes read: 7.3 MiB
          │ KV gRPC calls: 24
          │ estimated max memory allocated: 57 MiB
          │ estimated max sql temp disk usage: 0 B
          │ sql cpu time: 72ms
          │ table: jobs@primary
          │
          └── • scan
                nodes: n6
                regions: aws-us-east-1
                actual row count: 50,760
                KV time: 201ms
                KV contention time: 0µs
                KV rows read: 50,760
                KV bytes read: 2.7 MiB
                KV gRPC calls: 1
                estimated max memory allocated: 2.7 MiB
                sql cpu time: 11ms
                missing stats
                table: jobs@jobs_job_type_idx
                spans: [/'AUTO CREATE STATS' - /'AUTO CREATE STATS'] [/'CREATE STATS' - /'CREATE STATS']

This can cause excessive resource utilization when there are many jobs in the jobs table and the query is run frequently. There's a few potential ways we can make this more efficient:

  1. Omit ORDER BY created from the query plan. This should make the optimizer prefer a full table scan, which, as long as the table isn't HUGE, should be faster than the plan with the index join.
  2. Add an index on (job_type, status) and omit ORDER BY created from the query.
  3. Add an index on (job_type, status) STORING (created) to avoid the index join.

Jira issue: CRDB-30024

@mgartner mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 21, 2023
@mgartner
Copy link
Collaborator Author

mgartner commented Jul 21, 2023

Removing the ORDER BY created may not be possible due to this: https://github.com/cockroachdb/cockroach/blob/92b05ab40ecf023b9809593e65f0f20302d1a82e/pkg/jobs/utils.go#L86C1-L90.

But it might be ok to do so when ignoreJobID is jobspb.InvalidJobID.

@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 21, 2023
@mgartner mgartner moved this to Triage in SQL Queries Jul 24, 2023
mgartner added a commit to mgartner/cockroach that referenced this issue Jul 26, 2023
This commit improves the query plan for the `find-running-jobs-of-type`
query by removing an `ORDER BY` clause if a job ID to ignore was not
given. This can eliminate an index join from the query plan in some
cases, making the query play more efficient.

Informs cockroachdb#107405

Release note: None
@DrewKimball DrewKimball moved this from Triage to Active in SQL Queries Aug 1, 2023
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 3, 2023
This commit fixes an oversight in cockroachdb#102637 which intended to enable
stats collection on the jobs table, but was not successful.

I've manually confirmed that stats are now collected on the jobs table
in a local cluster:
```
  888074673664065537 | AUTO CREATE STATS               | Table statistics refresh for system.public.jobs                         | CREATE STATISTICS __auto__ FROM [15] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'  | root      | succeeded | NULL           | 2023-08-03 19:01:22.343
```

Informs cockroachdb#107405

Release note (performance improvement): We now automatically collect
table statistics on the system.jobs table, which will enable the optimizer
to produce better query plans for internal queries that access the
system.jobs table. This may result in better performance of the system.
Note: a previous attempt to enable stats on system.jobs starting in 23.1.0
was unsuccessful, but we have now fixed the oversight.
rytaft added a commit to rytaft/cockroach that referenced this issue Aug 4, 2023
This commit fixes an oversight in cockroachdb#102637 which intended to enable
stats collection on the jobs table, but was not successful.

I've manually confirmed that stats are now collected on the jobs table
in a local cluster:
```
  888074673664065537 | AUTO CREATE STATS               | Table statistics refresh for system.public.jobs                         | CREATE STATISTICS __auto__ FROM [15] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'  | root      | succeeded | NULL           | 2023-08-03 19:01:22.343
```

Informs cockroachdb#107405

Release note (performance improvement): We now automatically collect
table statistics on the system.jobs table, which will enable the optimizer
to produce better query plans for internal queries that access the
system.jobs table. This may result in better performance of the system.
Note: a previous attempt to enable stats on system.jobs starting in 23.1.0
was unsuccessful, but we have now fixed the oversight.
craig bot pushed a commit that referenced this issue Aug 5, 2023
105477: sql: add plan gist matching to stmt diagnostics feature r=yuzefovich a=yuzefovich

This commit extends the stmt diagnostics feature to add optional
plan-gist-based matching. Previously, we filtered statements based only
on the fingerprint but now we can optionally ask for a particular plan
(by specifying the target plan gist). All other aspects of the feature
(minimum execution latency, sampling probability) are unaffected.

The caveat to the implementation is that the plan gist of the running
statement is available after the optimizer has done its part, so
whenever plan-gist-based matching is desired, the trace will not
include the optimizer part as well as the plan string won't be
available.

This commit also made a minor change to always store the memo and the
opt planning catalog in `planTop`. Previously, this was stored only when
the bundle collection is enabled, but we now can enable it after the
optimizer, at which point the memo and the catalog might be lost. The
optimizer now stores it unconditionally, but then if we choose to not
collect the bundle once the plan gist is available, we release these
things. This allows us to still get `opt` files in the bundle.

Epic: None
Addresses: #96765.
Addresses: #103018.

Release note (sql change): Statement diagnostics feature has been
extended to support collecting a bundle for a particular plan. Namely,
the existing fingerprint-based matching has been extended to also
include plan-gist-based matching. Such bundle will miss a couple of
things: `plan.txt` file as well as the tracing of the optimizer. At
the moment, the feature is only exposed via an overload to
`crdb_internal.request_statement_bundle` builtin function. We now also
support "anti-match" - i.e. collecting a bundle for any plan other than
the provided plan gist.

108139: sql: fix logic to collect stats on system.jobs r=rytaft a=rytaft

This commit fixes an oversight in #102637 which intended to enable stats collection on the jobs table, but was not successful.

I've manually confirmed that stats are now collected on the jobs table in a local cluster:
```
  888074673664065537 | AUTO CREATE STATS               | Table statistics refresh for system.public.jobs                         | CREATE STATISTICS __auto__ FROM [15] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'  | root      | succeeded | NULL           | 2023-08-03 19:01:22.343
```

Informs #107405

Release note (performance improvement): We now automatically collect table statistics on the `system.jobs` table, which will enable the optimizer to produce better query plans for internal queries that access the `system.jobs` table. This may result in better performance of the system. Note: a previous attempt to enable stats on `system.jobs` starting in 23.1.0 was unsuccessful, but we have now fixed the oversight.

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
msirek pushed a commit to msirek/cockroach that referenced this issue Aug 14, 2023
This commit fixes an oversight in cockroachdb#102637 which intended to enable
stats collection on the jobs table, but was not successful.

I've manually confirmed that stats are now collected on the jobs table
in a local cluster:
```
  888074673664065537 | AUTO CREATE STATS               | Table statistics refresh for system.public.jobs                         | CREATE STATISTICS __auto__ FROM [15] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'  | root      | succeeded | NULL           | 2023-08-03 19:01:22.343
```

Informs cockroachdb#107405

Release note (performance improvement): We now automatically collect
table statistics on the system.jobs table, which will enable the optimizer
to produce better query plans for internal queries that access the
system.jobs table. This may result in better performance of the system.
Note: a previous attempt to enable stats on system.jobs starting in 23.1.0
was unsuccessful, but we have now fixed the oversight.
@mgartner
Copy link
Collaborator Author

I'm applying the release-blocker label so this won't get lost. I really think we need to do something here before the release.

@mgartner mgartner added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 18, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 18, 2023

Hi @mgartner, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@mgartner mgartner added the branch-master Failures and bugs on the master branch. label Sep 18, 2023
@michae2
Copy link
Collaborator

michae2 commented Oct 5, 2023

[triage] we discussed during our team meeting today, plan is to merge this PR with an additional query hint, and then make the index change in 24.1.

mgartner added a commit to mgartner/cockroach that referenced this issue Oct 9, 2023
This commit improves the query plan for the `find-running-jobs-of-type`
query by adding a hint to use the `jobs_status_created_idx` and by
removing an `ORDER BY` clause if a job ID to ignore was not given. This
can eliminate an index join from the query plan in some cases, making
the query play more efficient.

Informs cockroachdb#107405

Release note: None
craig bot pushed a commit that referenced this issue Oct 9, 2023
107589: sql: improve query plan for find-running-job-of-type query r=mgartner a=mgartner

This commit improves the query plan for the `find-running-jobs-of-type`
query by adding a hint to use the `jobs_status_created_idx` and by
removing an `ORDER BY` clause if a job ID to ignore was not given. This
can eliminate an index join from the query plan in some cases, making
the query plan more efficient.

Informs #107405

Release note: None


111905: streamingccl: physical replication stream from given timestamp  r=adityamaru a=stevendanna

This adds a RESUME TIMESTAMP option to CREATE VIRTUAL CLUSTER FROM
REPLICATION.  When provided, we allow the user to start a replication
stream into an _existing_ virtual cluster.

When the resume timestamp is provided, the replication stream will be
started from that timestamp, with no initial scan. To facilitate this,
we add a new argument to crdb_internal.start_replication_stream that
allows us to pass a destination-choosen start timestamp.

To avoid various catastrophic mistakes, we only allow this when:

- The destination tenant must be in service mode None

- The provided resume timestamp equals the last recorded "revert
  timestamp" of the destination tenant. The revert timestamp is set when
  the tenant has been forcibly reverted to a particular timestamp and is
  cleared when the tenant is modified in a way that may invalidate a
  resumption from that timestamp.

- If the source tenant has a PreviousSourceTenant set, the new
  destination must match that previous source tenant field.

WARNING: Using this correctly requires that the stream is resumed
before garbage collection has progressed past the given resume
timestamp. During normal operation, the replication stream maintains a
protected timestamp to ensure this is the case. However, when
resuming using this new feature, we have no such guarantee.

Epic: none

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
@mgartner mgartner removed their assignment Oct 10, 2023
@mgartner mgartner moved this from Active to 24.1 Release in SQL Queries Oct 10, 2023
@mgartner
Copy link
Collaborator Author

#107589 has been merged so I'm removing the release-blocker label and moving this to the 24.1 bucket.

@mgartner mgartner removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Oct 19, 2023
@mgartner mgartner moved this from 24.1 Release to 24.2 Release in SQL Queries Nov 28, 2023
@mgartner mgartner moved this from 24.2 Release to Backlog in SQL Queries Jun 20, 2024
@mgartner
Copy link
Collaborator Author

We think that the improvements in #107589 could be sufficient. Moving to the backlog until we find evidence that more work is needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants