-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
75% increase in Django test suite run time between CockroachDB 22.2 and 23.1 #102851
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
@timgraham Does falling back get_table_list to the 22.2 method help? The old method doesn't fetch comments, and I wonder if that makes a big dent. We have changed how comments are scanned, and as a side effect, we are reading all descriptors now. I'm trying out optimization so we don't scan everything when obj_description is invoked. |
Yes, disabling supports for comments reduces the run time on 23.1/nightly to 55 minutes, down from 70 minutes, but still up from 40 minutes for 22.2. |
@timgraham I'm going to address the comment fetching aspect, which got it down to around ~47 minutes for me locally. I'll see if there are other things to attack here. Thanks for testing it out |
I'll adjust the PR I have to keep this one open as well, since it won't fully eliminate the timing I think |
Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Informs: cockroachdb#102851 Fixes: cockroachdb#102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2
I wonder if the remaining delta is linked to: #102259, so let me attack that next |
Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Informs: cockroachdb#102851 Fixes: cockroachdb#102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2
Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Informs: cockroachdb#102851 Fixes: cockroachdb#102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2
Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Informs: cockroachdb#102851 Fixes: cockroachdb#102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2
Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Informs: cockroachdb#102851 Fixes: cockroachdb#102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2
103106: sql/catalog: avoid fetching descriptors when fetching comments r=fqazi a=fqazi Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Additionally, we added many builtins into both functions, which meant our existing virtual index look-up could end up falling back too frequently to full scans. Also, a virtual index is added on the kv_catalog_comments table for fast point look-ups when referencing descriptors. Informs: #102851 Fixes: #102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2 Co-authored-by: Faizan Qazi <[email protected]>
Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Informs: #102851 Fixes: #102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2
103399: sql: make hashed OIDs identifiable for optimal virtual indexes r=fqazi a=fqazi Due to the recent increase in table sizes for builtins, certain tables like pg_class are now joining on hundreds of rows. While these tables contain virtual indexes, these virtual indexes will return to full table scan if an OID is not found. That's because they have no way of knowing if the OID is a hashed index or a descriptor ID. To address this, these changes are going to alter OID hashing, so that OID hashes are generated at values greater than CockroachPredefinedOIDMax. This then allows virtual indexes to avoid full table scans if the value matched is not found (i.e. if its a builtin function reference). Fixes: #102851 Release note (sql change): OID generation for pg_catalog has changed. For example column, index and constraint OID's will have different values. Relation, type and function OID's are unchanged. Co-authored-by: Faizan Qazi <[email protected]>
Due to the recent increase in table sizes for builtins, certain tables like pg_class are now joining on hundreds of rows. While these tables contain virtual indexes, these virtual indexes will return to full table scan if an OID is not found. That's because they have no way of knowing if the OID is a hashed index or a descriptor ID. To address this, these changes are going to alter OID hashing, so that OID hashes are generated at values greater than CockroachPredefinedOIDMax. This then allows virtual indexes to avoid full table scans if the value matched is not found (i.e. if its a builtin function reference). Fixes: cockroachdb#102851 Release note (sql change): OID generation for pg_catalog has changed. For example column, index and constraint OID's will have different values. Relation, type and function OID's are unchanged.
Did you have comments enabled in django-cockroachdb when you tested locally (as in cockroachdb/django-cockroachdb#279)? I'm not seeing much improvement with CockroachDB nightly (70 minutes) vs. 23.1.1 [which I believe doesn't have your fix here] (75 minutes) (see builds at cockroachdb/django-cockroachdb#282). |
Let me go and analyze this again. I used your pull request but it might have been disabled there temporarily |
I'm seeing a speed-up of perhaps a couple of minutes after Rafi's last change in CRDB nightly and 23.1.5, but it's rather insignificant compared to the total regression in 23.1.x of around 30 minutes. As noted earlier in #102851 (comment), even with the comments queries disabled, the runtime of Django's test suite with CockroachDB 23.1.5 is around 60 minutes compared to 40 minutes with 22.2.x. The builds were run today at cockroachdb/django-cockroachdb#279. |
Thanks for checking again. To move forward on this, could you measure which steps of the test suite are slow, capture any slow queries, and report these in a new issue? |
As I wrote in this issue's description, I previously didn't have much luck identifying the cause of the slowness and those last efforts didn't feel very efficient, particularly if CockroachDB developers have better tools and more experience with profiling. I can try again if you like but some more specific advice might be helpful. |
My suggestion is to measure where the time is going in the Django test suite. Is it test setup? Is it specific tests that take longer? After we have that narrowed down, then we can capture the queries that are running in that part of the test by using the SQL Exec log: https://www.cockroachlabs.com/docs/v23.1/logging#sql_exec Also, I see shorter run times in our TeamCity nightly job:
|
Describe the problem
The full Django test suite as run by TeamCity for https://github.com/cockroachdb/django-cockroachdb/ takes around 40 minutes for CockroachDB 22.2.x.
The same run as of CockroachDB 23.1 rc2 takes around 70 minutes, down from around 90 minutes at 23.1 beta 3, before the fix for #100871.
You can see the various builds at cockroachdb/django-cockroachdb#279.
(I couldn't identify the cause of the regression during the 23.1 development cycle as there were months when the Django test suite wouldn't run without error due to #94337.)
I generated a slow query log with threshold 1000ms for CockraochDB 22.2.8 and CockroachDB nightly (v23.1.0-alpha.8-dev-3f8cb919339a7f19bd3d20c86488d6a46af9fb7f @ 2023/05/05 23:26:11) (the nightly build gives a similar total run time as 23.1 rc2.)
22.2.8: 423 queries with total time of 8.7 minutes
Nightly: 460 queries with total time of 10.1 minutes
This result suggests to me that there may be some other cause besides simply slow queries (though it's possible there are a lot of queries with sub-1-second run time that are accounting for the difference). I'm happy to continue investigating with a pointer in the right direction, or perhaps it would be more efficient for the CockroachDB developers to take it from here.
Jira issue: CRDB-27689
The text was updated successfully, but these errors were encountered: