sql: Extra string allocations in EXTRACT handling #19965

solongordon · 2017-11-10T16:05:10Z

See discussion at https://reviewable.io/reviews/cockroachdb/cockroach/19923#-KyS_gHiEzAfKPa0fg4X:-Ky_Yc13IQxe498kphDh:b708j2.

In our handling for the extract built-in, we do string normalization on the precision argument. @justinj and @knz pointed out that there are some unnecessary string allocations here which could be reduced.

Also applies to extract_duration and date_trunc.

Jira issue: CRDB-5946

The text was updated successfully, but these errors were encountered:

jordanlewis · 2021-01-29T20:57:10Z

In general, I think we could fix this and other similar situations by introducing a way to normalize scalar builtin functions and their arguments.

We could have a way of defining a rule that looks at a scalar builtin and lets you normalize its arguments. But I'm not exactly sure whether this is going to be worth doing - how many circumstances of this nature really are there?

It turns out that TPCH query 7 actually uses extract, so I'm going to run a test with that query and see if the extra allocations in this issue actually make a difference or not.

jordanlewis · 2021-01-29T21:25:31Z

cc @otan @knz - why were we planning to remove extract_duration in 20.2? I see a TODO to that effect that I don't understand.

jordanlewis · 2021-01-29T21:29:14Z

Using the simpler query

select extract('YeAr' from l_commitdate) as f from lineitem group by f;

we see about 30% of the program's allocations in strings.ToLower :) Though we also see 30% just returning the new float that represents the year, so it's not totally clear that this is such a big deal still.

Still, it really would be nice to have a way to normalize the input arguments to builtins before running them once per every row... it feels like we should be able to somehow define that this function always lower()'s its first argument via the optimizer. Then, since lower() is an immutable function, it'll always get constant folded via the pre-existing constant folding rules 🤔

@cockroachdb/sql-optimizer any thoughts on this? It's not urgent but just curious.

jordanlewis · 2021-01-29T21:36:50Z

Since extract is super weird, and completely handled by the parser, we could theoretically just tell the parser to add a lower() to the first argument unconditionally, but that seems inflexible, because there are other functions like date_trunc that have this same property that don't get special handling via the parser.

RaduBerinde · 2021-01-29T21:46:27Z

We could have a crdb_internal_extract_from_lower variant and a normalization rule that converts extract(x) to crdb_internal_extract_from_lower(lower(x)).

59598: parser: lowercase timespans in extract r=jordanlewis a=jordanlewis Touches #19965. Confirmed that this (relatively silly) PR gets rid of the allocations in the experiment listed in #19965 (comment) The `extract` builtin is kind of weird - it is actually supported by the parser and not an ordinary builtin function that people can call without parser support. As such, we can normalize its inputs right in the parser. The benefit of this is that, later, we are going to unconditionally ToLower() the string arguments to extract - which costs an allocation. So we can do this normalization up front to save a bunch of allocations in queries that run `extract` over a lot of data rows. Release note (performance improvement): improve the allocation performance of workloads that use the `EXTRACT` builtin. Co-authored-by: Jordan Lewis <[email protected]>

github-actions · 2023-09-26T11:09:49Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

XiaochenCui · 2024-07-23T16:44:03Z

This is useful for extract since there are some test cases with uppercase arguments. I'll implement this later and provide a performance report.

More general idea:

I'm also exploring the possibilities of making this a general improvement for all built-in functions. The idea is to fold constants during optimization. However, it seems complicated the issue:

If we translate extract('year', ...) to extract(lower('year'), ...), it will result in significantly more CPU instructions than the original function, which is not good.
If we mark the first argument of extract as lower, it increases the code complexity a lot with little performance improvement.

Issues with the function extract:

By the way, I found that there are many test cases with unquoted arguments:

SELECT extract(hour from timetz '12:01:02.345678-07'), where "hour" is a constant.
SELECT extract(element, input::timestamp), where "element" is a column.

So what are the rules for argument resolution? What if there is a column named "hour"?

solongordon mentioned this issue Nov 10, 2017

Add TIME column type and datum. #19923

Merged

knz added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Apr 27, 2018

jordanlewis added the A-sql-builtins SQL built-in functions and semantics thereof. label Apr 30, 2018

jordanlewis added E-starter Might be suitable for a starter project for new employees or team members. good first issue labels Aug 21, 2018

jordanlewis mentioned this issue Jan 29, 2021

parser: lowercase timespans in extract #59598

Merged

github-actions bot added the no-issue-activity label Sep 26, 2023

mgartner removed the no-issue-activity label Sep 26, 2023

mgartner added this to SQL Queries Sep 26, 2023

github-project-automation bot moved this to Triage in SQL Queries Sep 26, 2023

mgartner moved this from Triage to New Backlog in SQL Queries Sep 26, 2023

yuzefovich added the T-sql-queries SQL Queries Team label May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: Extra string allocations in EXTRACT handling #19965

sql: Extra string allocations in EXTRACT handling #19965

solongordon commented Nov 10, 2017 •

edited by cockroach-jira-scripts

Loading

jordanlewis commented Jan 29, 2021

jordanlewis commented Jan 29, 2021

jordanlewis commented Jan 29, 2021

jordanlewis commented Jan 29, 2021

RaduBerinde commented Jan 29, 2021

github-actions bot commented Sep 26, 2023

XiaochenCui commented Jul 23, 2024 •

edited

Loading

sql: Extra string allocations in EXTRACT handling #19965

sql: Extra string allocations in EXTRACT handling #19965

Comments

solongordon commented Nov 10, 2017 • edited by cockroach-jira-scripts Loading

jordanlewis commented Jan 29, 2021

jordanlewis commented Jan 29, 2021

jordanlewis commented Jan 29, 2021

jordanlewis commented Jan 29, 2021

RaduBerinde commented Jan 29, 2021

github-actions bot commented Sep 26, 2023

XiaochenCui commented Jul 23, 2024 • edited Loading

solongordon commented Nov 10, 2017 •

edited by cockroach-jira-scripts

Loading

XiaochenCui commented Jul 23, 2024 •

edited

Loading