Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

75% increase in Django test suite run time between CockroachDB 22.2 and 23.1 #102851

Closed
timgraham opened this issue May 7, 2023 · 12 comments · Fixed by #103399 or #104763
Closed

75% increase in Django test suite run time between CockroachDB 22.2 and 23.1 #102851

timgraham opened this issue May 7, 2023 · 12 comments · Fixed by #103399 or #104763
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-blathers-triaged blathers was able to find an owner

Comments

@timgraham
Copy link
Contributor

timgraham commented May 7, 2023

Describe the problem

The full Django test suite as run by TeamCity for https://github.com/cockroachdb/django-cockroachdb/ takes around 40 minutes for CockroachDB 22.2.x.

The same run as of CockroachDB 23.1 rc2 takes around 70 minutes, down from around 90 minutes at 23.1 beta 3, before the fix for #100871.

You can see the various builds at cockroachdb/django-cockroachdb#279.

(I couldn't identify the cause of the regression during the 23.1 development cycle as there were months when the Django test suite wouldn't run without error due to #94337.)

I generated a slow query log with threshold 1000ms for CockraochDB 22.2.8 and CockroachDB nightly (v23.1.0-alpha.8-dev-3f8cb919339a7f19bd3d20c86488d6a46af9fb7f @ 2023/05/05 23:26:11) (the nightly build gives a similar total run time as 23.1 rc2.)

22.2.8: 423 queries with total time of 8.7 minutes
Nightly: 460 queries with total time of 10.1 minutes

This result suggests to me that there may be some other cause besides simply slow queries (though it's possible there are a lot of queries with sub-1-second run time that are accounting for the difference). I'm happy to continue investigating with a pointer in the right direction, or perhaps it would be more efficient for the CockroachDB developers to take it from here.

Jira issue: CRDB-27689

@timgraham timgraham added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 7, 2023
@blathers-crl
Copy link

blathers-crl bot commented May 7, 2023

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels May 7, 2023
@blathers-crl blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 8, 2023
@fqazi fqazi self-assigned this May 8, 2023
@fqazi
Copy link
Collaborator

fqazi commented May 8, 2023

@timgraham Does falling back get_table_list to the 22.2 method help? The old method doesn't fetch comments, and I wonder if that makes a big dent. We have changed how comments are scanned, and as a side effect, we are reading all descriptors now. I'm trying out optimization so we don't scan everything when obj_description is invoked.

@timgraham
Copy link
Contributor Author

Yes, disabling supports for comments reduces the run time on 23.1/nightly to 55 minutes, down from 70 minutes, but still up from 40 minutes for 22.2.

@fqazi
Copy link
Collaborator

fqazi commented May 10, 2023

@timgraham I'm going to address the comment fetching aspect, which got it down to around ~47 minutes for me locally. I'll see if there are other things to attack here. Thanks for testing it out

@fqazi
Copy link
Collaborator

fqazi commented May 10, 2023

I'll adjust the PR I have to keep this one open as well, since it won't fully eliminate the timing I think

fqazi added a commit to fqazi/cockroach that referenced this issue May 10, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
@fqazi
Copy link
Collaborator

fqazi commented May 10, 2023

I wonder if the remaining delta is linked to: #102259, so let me attack that next

fqazi added a commit to fqazi/cockroach that referenced this issue May 10, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
fqazi added a commit to fqazi/cockroach that referenced this issue May 11, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
fqazi added a commit to fqazi/cockroach that referenced this issue May 11, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
fqazi added a commit to fqazi/cockroach that referenced this issue May 11, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
craig bot pushed a commit that referenced this issue May 13, 2023
103106: sql/catalog: avoid fetching descriptors when fetching comments r=fqazi a=fqazi

Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. 

Additionally, we added many builtins into both functions, which meant our existing virtual index look-up could end up falling back too frequently to full scans. Also, a virtual index is added on the kv_catalog_comments table for fast point look-ups when referencing descriptors.


Informs: #102851
Fixes: #102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2

Co-authored-by: Faizan Qazi <[email protected]>
blathers-crl bot pushed a commit that referenced this issue May 15, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: #102851
Fixes: #102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
craig bot pushed a commit that referenced this issue May 16, 2023
103399: sql: make hashed OIDs identifiable for optimal virtual indexes r=fqazi a=fqazi

Due to the recent increase in table sizes for builtins, certain tables like pg_class are now joining on hundreds of rows. While these tables contain virtual indexes, these virtual indexes will return to full table scan if an OID is not found. That's because they have no way of knowing if the OID is a hashed index or a descriptor ID. To address this, these changes are going to alter OID hashing, so that OID hashes are generated at values greater than CockroachPredefinedOIDMax. This then allows virtual indexes to avoid full table scans if the value matched is not found (i.e. if its a builtin function reference).

Fixes: #102851

Release note (sql change): OID generation for pg_catalog has changed. For example column, index and constraint OID's will have different values. Relation, type and function OID's are unchanged.

Co-authored-by: Faizan Qazi <[email protected]>
@craig craig bot closed this as completed in a5ad914 May 16, 2023
timgraham added a commit to timgraham/django-cockroachdb that referenced this issue May 17, 2023
timgraham added a commit to timgraham/django-cockroachdb that referenced this issue May 17, 2023
rafiss pushed a commit to rafiss/cockroach that referenced this issue May 17, 2023
Due to the recent increase in table sizes for builtins,
certain tables like pg_class are now joining on hundreds
of rows. While these tables contain virtual indexes, these
virtual indexes will return to full table scan if an OID is
not found. That's because they have no way of knowing if
the OID is a hashed index or a descriptor ID. To address this,
these changes are going to alter OID hashing, so that OID hashes
are generated at values greater than CockroachPredefinedOIDMax.
This then allows virtual indexes to avoid full table scans
if the value matched is not found (i.e. if its a builtin
function reference).

Fixes: cockroachdb#102851

Release note (sql change): OID generation for pg_catalog
has changed. For example column, index and constraint
OID's will have different values. Relation, type and function
OID's are unchanged.
@timgraham
Copy link
Contributor Author

Did you have comments enabled in django-cockroachdb when you tested locally (as in cockroachdb/django-cockroachdb#279)? I'm not seeing much improvement with CockroachDB nightly (70 minutes) vs. 23.1.1 [which I believe doesn't have your fix here] (75 minutes) (see builds at cockroachdb/django-cockroachdb#282).

@fqazi fqazi reopened this May 25, 2023
@fqazi
Copy link
Collaborator

fqazi commented May 25, 2023

Let me go and analyze this again. I used your pull request but it might have been disabled there temporarily

@timgraham
Copy link
Contributor Author

I'm seeing a speed-up of perhaps a couple of minutes after Rafi's last change in CRDB nightly and 23.1.5, but it's rather insignificant compared to the total regression in 23.1.x of around 30 minutes. As noted earlier in #102851 (comment), even with the comments queries disabled, the runtime of Django's test suite with CockroachDB 23.1.5 is around 60 minutes compared to 40 minutes with 22.2.x. The builds were run today at cockroachdb/django-cockroachdb#279.

@rafiss
Copy link
Collaborator

rafiss commented Jul 6, 2023

Thanks for checking again. To move forward on this, could you measure which steps of the test suite are slow, capture any slow queries, and report these in a new issue?

@timgraham
Copy link
Contributor Author

As I wrote in this issue's description, I previously didn't have much luck identifying the cause of the slowness and those last efforts didn't feel very efficient, particularly if CockroachDB developers have better tools and more experience with profiling. I can try again if you like but some more specific advice might be helpful.

@rafiss
Copy link
Collaborator

rafiss commented Jul 7, 2023

My suggestion is to measure where the time is going in the Django test suite. Is it test setup? Is it specific tests that take longer?

After we have that narrowed down, then we can capture the queries that are running in that part of the test by using the SQL Exec log: https://www.cockroachlabs.com/docs/v23.1/logging#sql_exec

Also, I see shorter run times in our TeamCity nightly job:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-blathers-triaged blathers was able to find an owner
Projects
None yet
3 participants