-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Extra string allocations in EXTRACT handling #19965
Comments
In general, I think we could fix this and other similar situations by introducing a way to normalize scalar builtin functions and their arguments. We could have a way of defining a rule that looks at a scalar builtin and lets you normalize its arguments. But I'm not exactly sure whether this is going to be worth doing - how many circumstances of this nature really are there? It turns out that TPCH query 7 actually uses |
Using the simpler query
we see about 30% of the program's allocations in strings.ToLower :) Though we also see 30% just returning the new float that represents the year, so it's not totally clear that this is such a big deal still. Still, it really would be nice to have a way to normalize the input arguments to builtins before running them once per every row... it feels like we should be able to somehow define that this function always @cockroachdb/sql-optimizer any thoughts on this? It's not urgent but just curious. |
Since |
We could have a |
59598: parser: lowercase timespans in extract r=jordanlewis a=jordanlewis Touches #19965. Confirmed that this (relatively silly) PR gets rid of the allocations in the experiment listed in #19965 (comment) The `extract` builtin is kind of weird - it is actually supported by the parser and not an ordinary builtin function that people can call without parser support. As such, we can normalize its inputs right in the parser. The benefit of this is that, later, we are going to unconditionally ToLower() the string arguments to extract - which costs an allocation. So we can do this normalization up front to save a bunch of allocations in queries that run `extract` over a lot of data rows. Release note (performance improvement): improve the allocation performance of workloads that use the `EXTRACT` builtin. Co-authored-by: Jordan Lewis <[email protected]>
We have marked this issue as stale because it has been inactive for |
This is useful for More general idea: I'm also exploring the possibilities of making this a general improvement for all built-in functions. The idea is to fold constants during optimization. However, it seems complicated the issue:
Issues with the function By the way, I found that there are many test cases with unquoted arguments:
So what are the rules for argument resolution? What if there is a column named "hour"? |
See discussion at https://reviewable.io/reviews/cockroachdb/cockroach/19923#-KyS_gHiEzAfKPa0fg4X:-Ky_Yc13IQxe498kphDh:b708j2.
In our handling for the
extract
built-in, we do string normalization on the precision argument. @justinj and @knz pointed out that there are some unnecessary string allocations here which could be reduced.Also applies to
extract_duration
anddate_trunc
.Jira issue: CRDB-5946
The text was updated successfully, but these errors were encountered: