Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure to pushdown the aggregation for count(const) #4473

Merged
merged 1 commit into from
Jul 23, 2020

Conversation

Lewuathe
Copy link
Member

@Lewuathe Lewuathe commented Jul 16, 2020

Purpose

Aggregate pushdown should be enabled even with count(const) query. Since SimplifyCountOverConstant is applied after PushAggregationIntoTableScan, the aggregation pushdown does not treat count(*) and count(const) uniformally.

The logical plan of the following query with PostgreSQL connector is fixed.

create table test_table (c1 integer, c2 varchar);

Before

> explain (type logical) select c2, count(1) from test_table group by 1;
                                                            Query Plan
-----------------------------------------------------------------------------------------------------------------------------------
 Output[c2, _col1]
 │   Layout: [c2:varchar, count:bigint]
 │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
 │   _col1 := count
 └─ RemoteExchange[GATHER]
    │   Layout: [c2:varchar, count:bigint]
    │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
    └─ Project[]
       │   Layout: [c2:varchar, count:bigint]
       │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
       └─ Aggregate(FINAL)[c2][$hashvalue]
          │   Layout: [c2:varchar, $hashvalue:bigint, count:bigint]
          │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
          │   count := count("count_0")
          └─ LocalExchange[HASH][$hashvalue] ("c2")
             │   Layout: [c2:varchar, count_0:bigint, $hashvalue:bigint]
             │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
             └─ RemoteExchange[REPARTITION][$hashvalue_1]
                │   Layout: [c2:varchar, count_0:bigint, $hashvalue_1:bigint]
                │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
                └─ Aggregate(PARTIAL)[c2][$hashvalue_2]
                   │   Layout: [c2:varchar, $hashvalue_2:bigint, count_0:bigint]
                   │   count_0 := count(*)
                   └─ ScanProject[table = postgresql:public.test_table public.test_table columns=[c2:varchar:varchar]]
                          Layout: [c2:varchar, $hashvalue_2:bigint]
                          Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
                          $hashvalue_2 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("c2"), 0))
                          c2 := c2:varchar:varchar

After

The fix eliminates the aggregation node from the Presto side properly.

> explain (type logical) select c2, count(1) from test_table group by 1;
                                                                                  Query Plan
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Output[c2, _col1]
 │   Layout: [c2_0:varchar, _presto_generated_1:bigint]
 │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
 │   c2 := c2_0
 │   _col1 := _presto_generated_1
 └─ RemoteExchange[GATHER]
    │   Layout: [c2_0:varchar, _presto_generated_1:bigint]
    │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
    └─ TableScan[postgresql:public.test_table public.test_table columns=[c2:varchar:varchar, count(*):_presto_generated_1:bigint:bigint] groupingSets=[[c2:varchar:varchar]]]
           Layout: [c2_0:varchar, _presto_generated_1:bigint]
           Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}
           _presto_generated_1 := count(*):_presto_generated_1:bigint:bigint
           c2_0 := c2:varchar:varchar

It fixes #4362

@cla-bot cla-bot bot added the cla-signed label Jul 16, 2020
@sajjoseph
Copy link
Contributor

Thanks for fixing this.

@Lewuathe Lewuathe force-pushed the count-constant-pushdown branch 3 times, most recently from b49104e to 4f4732f Compare July 21, 2020 23:37
@kasiafi
Copy link
Member

kasiafi commented Jul 22, 2020

Shall we remove the other (late) occurrence of the SimplifyCountOverConstant rule? I can't think of any case it could fire at that point.

@kokosing
Copy link
Member

Shall we remove the other (late) occurrence of the SimplifyCountOverConstant rule? I can't think of any case it could fire at that point.

👍

Aggregate pushdown should be enabled even with count(const) query. Since
SimplifyCountOverConstant is applied after PushAggregationIntoTableScan,
the aggregation pushdown does not treat count(*) and count(const)
uniformally.
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My approval may not mean much :) but LGTM

@findepi findepi merged commit 84d3b06 into trinodb:master Jul 23, 2020
@findepi
Copy link
Member

findepi commented Jul 23, 2020

Merged, thanks!

@findepi findepi mentioned this pull request Jul 23, 2020
8 tasks
@findepi findepi added this to the 340 milestone Jul 23, 2020
@Lewuathe Lewuathe deleted the count-constant-pushdown branch July 24, 2020 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Aggregate pushdown for count(const) or count(non null) in JDBC connectors
6 participants