Hashagg with filter support #1065

zhouyuan · 2022-08-15T06:11:29Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The plan is generated from Spark when doing "count distinct" with more than once in a query

select
        count(distinct l_linestatus) as dist_l_linestatus,
        count(distinct l_returnflag) as dist_l_returnflag
from
        lineitem
where
        l_shipdate <= date '1998-12-01'

in each Count distinct aggregation Spark will append a filter (WHERE gid = 1).

the second use case is like below (Aggregation with FILTER)

val df = spark.sparkContext.parallelize(
  TestData2(1, 1) ::
  TestData2(1, 2) ::
  TestData2(2, 1) ::
  TestData2(2, 2) ::
  TestData2(3, 1) ::
  TestData2(3, 2) :: Nil, 2).toDF()
df.createOrReplaceTempView("testData2")
sql("SELECT COUNT(a) FILTER (WHERE b > 1) FROM testData2").show

Describe the solution you'd like
Currently Gazelle will fallback to Vanilla Spark to execute such quereis. The overhead is big when doing C2R/R2C. Should better to support these cases natively.

Describe alternatives you've considered
N/A

Additional context
N/a

The text was updated successfully, but these errors were encountered:

zhouyuan added the enhancement New feature or request label Aug 15, 2022

github-actions bot mentioned this issue Aug 15, 2022

[NSE-1065] Adding hashagg w/ filter support #1066

Merged

github-actions bot mentioned this issue Aug 30, 2022

[NSE-1065] fix on count distinct w/ keys #1090

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hashagg with filter support #1065

Hashagg with filter support #1065

zhouyuan commented Aug 15, 2022 •

edited

Loading

Hashagg with filter support #1065

Hashagg with filter support #1065

Comments

zhouyuan commented Aug 15, 2022 • edited Loading

zhouyuan commented Aug 15, 2022 •

edited

Loading