Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

Hashagg with filter support #1065

Open
zhouyuan opened this issue Aug 15, 2022 · 0 comments
Open

Hashagg with filter support #1065

zhouyuan opened this issue Aug 15, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@zhouyuan
Copy link
Collaborator

zhouyuan commented Aug 15, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The plan is generated from Spark when doing "count distinct" with more than once in a query

select
        count(distinct l_linestatus) as dist_l_linestatus,
        count(distinct l_returnflag) as dist_l_returnflag
from
        lineitem
where
        l_shipdate <= date '1998-12-01'

image

in each Count distinct aggregation Spark will append a filter (WHERE gid = 1).

the second use case is like below (Aggregation with FILTER)

val df = spark.sparkContext.parallelize(
  TestData2(1, 1) ::
  TestData2(1, 2) ::
  TestData2(2, 1) ::
  TestData2(2, 2) ::
  TestData2(3, 1) ::
  TestData2(3, 2) :: Nil, 2).toDF()
df.createOrReplaceTempView("testData2")
sql("SELECT COUNT(a) FILTER (WHERE b > 1) FROM testData2").show

Describe the solution you'd like
Currently Gazelle will fallback to Vanilla Spark to execute such quereis. The overhead is big when doing C2R/R2C. Should better to support these cases natively.

Describe alternatives you've considered
N/A

Additional context
N/a

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant