Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(sql): prevent sqlglot from extensive deepcopying every time we create a sqlglot object #8592

Merged
merged 3 commits into from
Mar 13, 2024

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Mar 8, 2024

Still profiling and adding more copy=False options, apparently it greatly improves the performance.

According to profiling the recursive generation of sqlglot is still a bottleneck for big queries which cannot be fixed on the ibis side. There could be one option though to compile the fragments in a greedy fashion which are going to be cached by Node.map() and inject those as arbitrary strings to other sqlglot expressions.

The benchmark shows a pretty solid improvement:

------------------------------------------------------------------------------ benchmark 'test_compile[large-duckdb]': 2 tests -------------------------------------------------------------------------------
Name (time in ms)                                 Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_compile[large-duckdb] (0399_182b6a5)     15.0069 (1.63)     43.1253 (1.38)     17.0702 (1.76)     5.8277 (2.57)     15.7781 (1.69)     0.3220 (1.28)         3;11   58.5818 (0.57)         59           1
test_compile[large-duckdb] (NOW)               9.2348 (1.0)      31.1592 (1.0)       9.6997 (1.0)      2.2679 (1.0)       9.3424 (1.0)      0.2524 (1.0)           1;7  103.0959 (1.0)          94           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Also measured the heap usage right after translation:

BEFORE
Partition of a set of 3349536 objects. Total size = 418566016 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 994115  30 180331768  43 180331768  43 dict (no owner)
     1 435432  13 48768384  12 229100152  55 sqlglot.expressions.Literal
     2 435432  13 31351104   7 260451256  62 dict of sqlglot.expressions.Literal
     3  94643   3 14095526   3 274546782  66 str
     4 105890   3 11859680   3 286406462  68 sqlglot.expressions.Identifier
     5  23325   1  8770520   2 295176982  71 types.CodeType
     6 105890   3  7624080   2 302801062  72 dict of sqlglot.expressions.Identifier
     7  46724   1  7169992   2 309971054  74 list
     8  85244   3  6257008   1 316228062  76 tuple
     9  52938   2  5929056   1 322157118  77 sqlglot.expressions.Column

AFTER copy=False
Partition of a set of 1083787 objects. Total size = 142273068 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 249758  23 46102480  32  46102480  32 dict (no owner)
     1  94632   9 14094707  10  60197187  42 str
     2 104392  10 11691904   8  71889091  51 sqlglot.expressions.Literal
     3  23325   2  8770976   6  80660067  57 types.CodeType
     4 104392  10  7516224   5  88176291  62 dict of sqlglot.expressions.Literal
     5  85257   8  6258104   4  94434395  66 tuple
     6  46189   4  4157973   3  98592368  69 bytes
     7  22557   2  3428664   2 102021032  72 function
     8  25386   2  2843232   2 104864264  74 sqlglot.expressions.Identifier
     9   2327   0  2578312   2 107442576  76 type

Now it uses 30% of the memory it used before. Apparently there are still a lot of sqlglot Literal objects around, looking into that.

fixes #8484

@kszucs kszucs force-pushed the sqlglot-copy branch 3 times, most recently from 556d73b to cba0d80 Compare March 8, 2024 15:21
@kszucs kszucs requested a review from cpcloud March 8, 2024 15:21
@cpcloud
Copy link
Member

cpcloud commented Mar 8, 2024

Hm, the failing tests seem to be exposing a bug in sqlglot.

@kszucs kszucs force-pushed the sqlglot-copy branch 3 times, most recently from ec7f86f to 005d609 Compare March 11, 2024 10:38
@kszucs kszucs force-pushed the sqlglot-copy branch 6 times, most recently from d92e71a to cab3f87 Compare March 12, 2024 12:54
@kszucs
Copy link
Member Author

kszucs commented Mar 12, 2024

After main bumped the sqlglot version the current benchmark comparison looks like:

------------------------------------------------------------------------------ benchmark 'test_compile[large-duckdb]': 2 tests -------------------------------------------------------------------------------
Name (time in ms)                                 Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_compile[large-duckdb] (0408_bc9d757)     11.5417 (1.24)     13.2755 (1.0)      12.4163 (1.25)     0.4392 (1.0)      12.3671 (1.29)     0.6514 (1.38)         11;0   80.5394 (0.80)         28           1
test_compile[large-duckdb] (NOW)               9.2860 (1.0)      31.8314 (2.40)      9.9179 (1.0)      2.3738 (5.40)      9.5708 (1.0)      0.4714 (1.0)           1;2  100.8283 (1.0)          89           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

ibis/backends/sql/compiler.py Outdated Show resolved Hide resolved
@cpcloud cpcloud added the performance Issues related to ibis's performance label Mar 13, 2024
@cpcloud
Copy link
Member

cpcloud commented Mar 13, 2024

@kszucs Feel free to merge whenever you're ready.

@kszucs
Copy link
Member Author

kszucs commented Mar 13, 2024

Thanks for the mssql workaround! Merging.

@kszucs kszucs merged commit 461293b into ibis-project:main Mar 13, 2024
80 checks passed
@kszucs kszucs deleted the sqlglot-copy branch March 13, 2024 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues related to ibis's performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: regression: slow to generate SQL, generated SQL is much slower
2 participants