Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: avoid CTEs in crdb_internal.system_jobs query #123848

Merged
merged 2 commits into from
Aug 20, 2024

Conversation

stevendanna
Copy link
Collaborator

The CTE in the query used for crdb_internal.system_jobs can prevent a
number of useful query optimizations.

Informs #122687

Release note (performance improvement): Further improves the
performance of job-system related queries.

@stevendanna stevendanna requested a review from a team as a code owner May 8, 2024 21:37
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. labels May 8, 2024
@stevendanna stevendanna requested a review from dt May 8, 2024 22:01
@rafiss
Copy link
Collaborator

rafiss commented May 8, 2024

it looks like the roundtrips went down, but the latency went up. perhaps the query plan is actually less computationally efficient, even though it performs less I/O? i'd guess there's some threshold at which latency is actually better the new way, and maybe in a multiregion cluster it could matter even more.

goos: darwin
goarch: arm64
                               │ old-bench.txt │            new-bench.txt             │
                               │    sec/op     │    sec/op     vs base                │
Jobs/crdb_internal.system_jobs     167.0m ± 6%   292.7m ± 16%  +75.27% (p=0.000 n=10)

                               │ old-bench.txt │           new-bench.txt            │
                               │  roundtrips   │ roundtrips  vs base                │
Jobs/crdb_internal.system_jobs      8.000 ± 0%   3.000 ± 0%  -62.50% (p=0.000 n=10)

                               │ old-bench.txt │             new-bench.txt             │
                               │     B/op      │     B/op       vs base                │
Jobs/crdb_internal.system_jobs    58.55Mi ± 2%   105.99Mi ± 4%  +81.03% (p=0.000 n=10)

                               │ old-bench.txt │            new-bench.txt             │
                               │   allocs/op   │  allocs/op    vs base                │
Jobs/crdb_internal.system_jobs     905.9k ± 2%   1595.0k ± 3%  +76.06% (p=0.000 n=10)

@stevendanna
Copy link
Collaborator Author

Yeah, I think where we are hoping this helps substantially is in the case where we have a job_type or job_status filter that filters out a high percentage of rows. I'll add a test that captures whether it does or not. It seems to in some local testing.

@stevendanna
Copy link
Collaborator Author

stevendanna commented May 15, 2024

Part of what is going on here is that the current tests are using query plans that are created in the absence of stats. If I add an ANALYZE system.jobs and ANALYZE system.job_info into the setup statements, many of the existing queries choose plans that also have a smaller number of round-trips.

I'll add some commits to update the existing tests to better reflect reality so we can compare against that.

My belief here is that we should really only see much of a benefit here for queries that have a filter.

@stevendanna
Copy link
Collaborator Author

I pushed some commits that looks at the queries in the presence of stats and now the diff matches my expectations a lot better. Namely, queries that still have to scan most of the job_info rows aren't much better, but queries with high selectivity in there where clause get a good deal better:

Jobs/crdb_internal.system_jobs:

|--------------------------------------------+------------------------------------------+-----------------------|
| f69856e6ae0080fd7dc94cc174d011cfa60c5188^1 | f69856e6ae0080fd7dc94cc174d011cfa60c5188 |                       |
|--------------------------------------------+------------------------------------------+-----------------------|
| sec/op                                     | sec/op                                   | vs base               |
| 72.36m ± 14%                               | 73.75m ± 7%                              | ~ (p=0.796 n=10)      |
|--------------------------------------------+------------------------------------------+-----------------------|
| roundtrips                                 | roundtrips                               | vs base               |
| 3.000 ± 0%                                 | 3.000 ± 0%                               | ~ (p=1.000 n=10)      |
|--------------------------------------------+------------------------------------------+-----------------------|
| B/op                                       | B/op                                     | vs base               |
| 26.52Mi ± 2%                               | 26.03Mi ± 9%                             | -1.87% (p=0.015 n=10) |
|--------------------------------------------+------------------------------------------+-----------------------|
| allocs/op                                  | allocs/op                                | vs base               |
|                                            |                                          |                       |
| 316.0k ± 1%                                | 317.2k ± 1%                              | ~ (p=0.912 n=10)      |


                               
Jobs/jobs_page_type_filtered_no_matches

|--------------------------------------------+------------------------------------------+------------------------|
| f69856e6ae0080fd7dc94cc174d011cfa60c5188^1 | f69856e6ae0080fd7dc94cc174d011cfa60c5188 |                        |
|--------------------------------------------+------------------------------------------+------------------------|
| sec/op                                     | sec/op                                   | vs base                |
|                                            |                                          |                        |
| 70.06m ± 7%                                | 14.65m ± 7%                              | -79.09% (p=0.000 n=10) |
|--------------------------------------------+------------------------------------------+------------------------|
| roundtrips                                 | roundtrips                               | vs base                |
| 3.000 ± 0%                                 | 1.000 ± 0%                               | -66.67% (p=0.000 n=10) |
|--------------------------------------------+------------------------------------------+------------------------|
| B/op                                       | B/op                                     | vs base                |
| 8.215Mi ± 8%                               | 3.633Mi ± 19%                            | -55.77% (p=0.000 n=10) |
|--------------------------------------------+------------------------------------------+------------------------|
| allocs/op                                  | allocs/op                                | vs base                |
| 90.05k ± 14%                               | 49.63k ± 22%                             | -44.88% (p=0.000 n=10) |
|--------------------------------------------+------------------------------------------+------------------------|


The CTE in the query used for crdb_internal.system_jobs can prevent a
number of useful query optimizations.

Informs cockroachdb#122687

Release note (performance improvement): Further improves the
performance of job-system related queries.
@stevendanna stevendanna removed backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels May 29, 2024
@stevendanna
Copy link
Collaborator Author

@dt I've removed the deeper backports tags from this given our uncertainty about how well this query fares in the absence of query statistics in older releases.

Note that I think this probably accounts for most of the variance I was seeing when trying to convert this to a virtual view. So, once this is in I think it makes sense to rebase that PR on top of this and see if the virtual view no longer produces a regression.

@rafiss rafiss removed the request for review from a team June 3, 2024 20:13
@stevendanna stevendanna removed the backport-24.1.x Flags PRs that need to be backported to 24.1. label Jul 22, 2024
@stevendanna
Copy link
Collaborator Author

bors r=dt

@craig craig bot merged commit d5329dc into cockroachdb:master Aug 20, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants