-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pushdown of count
aggregation with distinct
#8562
Conversation
i am afraid there isn't, since we don't have a mock connector for exercising aggregation pushdown SPI calls. |
Taking these words back. Since you're making |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/expression/ImplementCount.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/expression/ImplementCount.java
Outdated
Show resolved
Hide resolved
@alexjo2144 can you please review build output? |
Yeah, I need to think about this a little more. There are some pushdowns that I'm still not fully understanding. For example, with the original code |
5086bb1
to
cca1e07
Compare
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Show resolved
Hide resolved
plugin/trino-phoenix/src/test/java/io/trino/plugin/phoenix/TestPhoenixConnectorTest.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
75311f9
to
3831de0
Compare
3ef0814
to
950f732
Compare
59c5585
to
02581d8
Compare
ad90d51
to
32b3974
Compare
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
1c6d6f2
to
94e67a3
Compare
94e67a3
to
cf2e758
Compare
@findepi should be good now (I hope). Please give it another look. |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestingH2JdbcClient.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,90 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows pushdown of multiple distinct aggregations as long as
only one non-count aggregation is distinct.
i may have asked about that but don't remember answer -- why "only one non-count" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what exactly @alexjo2144 had in mind here.
There are many constraints here. And definitely not all queries which satisfy the condition from commit message will be be fully pushed down
E.g. this one will not:
select max(distinct regionkey), count(distinct regionkey), count(distinct nationkey) from nation;
I suggest to just leave title line in commit message and drop the rest.
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix/src/test/java/io/trino/plugin/phoenix/TestPhoenixConnectorTest.java
Show resolved
Hide resolved
plugin/trino-phoenix/src/test/java/io/trino/plugin/phoenix/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix5/src/test/java/io/trino/plugin/phoenix5/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
@@ -282,8 +283,9 @@ public PostgreSqlClient( | |||
this::quoted, | |||
ImmutableSet.<AggregateFunctionRule<JdbcExpression>>builder() | |||
.add(new ImplementCountAll(bigintTypeHandle)) | |||
.add(new ImplementCount(bigintTypeHandle)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redudantn move
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
@@ -118,18 +117,6 @@ protected TestTable createTableWithUnsupportedColumn() | |||
return Optional.of(dataMappingTestSetup); | |||
} | |||
|
|||
@Override | |||
protected Optional<DataMappingTestSetup> filterCaseSensitiveDataMappingTestData(DataMappingTestSetup dataMappingTestSetup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related?
6d29af0
to
86fa04e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java"
@@ -627,19 +626,36 @@ public PlanOptimizers( | |||
.addAll(projectionPushdownRules) | |||
.add(new PushProjectionIntoTableScan(metadata, typeAnalyzer, scalarStatsCalculator)) | |||
.add(new RemoveRedundantIdentityProjections()) | |||
.add(new PushLimitThroughMarkDistinct()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment why this is inside pushIntoTableScanRulesExceptJoins set, otherwise looks like unintentional
maybe it shouldn't be here? instead, PushLimitThroughMarkDistinct
could come together with MultipleDistinctAggregationToMarkDistinct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it shouldn't be here? instead, PushLimitThroughMarkDistinct could come together with MultipleDistinctAggregationToMarkDistinct?
per @kasiafi
After removing MultipleDistinctAggregationToMarkDistinct from an IterativeOptimizer, there's no way that a MarkDistinctNode could be present at that stage, so the rule PushLimitThroughMarkDistinct can be also removed. However, I'd keep it there next to other PushLimit rules in case it's needed again in the future. Actually, the PushLimit rules could be extracted like projectionPushdownRules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind. Reworking after chat with @kasiafi
@@ -855,6 +871,7 @@ public PlanOptimizers( | |||
// DetermineTableScanNodePartitioning is needed to needs to ensure all table handles have proper partitioning determined | |||
// Must run before AddExchanges | |||
.add(new DetermineTableScanNodePartitioning(metadata, nodePartitioningManager, taskCountEstimator)) | |||
.add(new MultipleDistinctAggregationToMarkDistinct()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had MultipleDistinctAggregationToMarkDistinct
once and now we have twice. What's the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not make sense.
@kasiafi done (hopefully) as we discussed. Much simpler. Thanks. Let's see what CI says. |
.add(new PushAggregationThroughOuterJoin()) | ||
.add(new ReplaceRedundantJoinWithSource()) // Run this after PredicatePushDown optimizer as it inlines filter constants | ||
.add(new MultipleDistinctAggregationToMarkDistinct()) // Run this after aggregation pushdown | ||
.addAll(limitPushdownRules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.addAll(limitPushdownRules) | |
.addAll(limitPushdownRules) // including through MarkDistinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually having this here results in test failures. @kasiafi looked deeply into that and the problem comes from extra call of PushLimitThroughOuterJoin
and the fact that applyJoin
will not consume same limit twice.
So without this optimization went
... Limit -> OuterJoin -> Scan(no limit)
(PushLimitThroughOuterJoin)
... Limit -> OuterJoin -> Limit -> Scan(no limit)
(PushLimitIntoTableScan)
... Limit -> OuterJoin -> Scan(with limit)
now join is pushable to TS
With this extra line the optimization continues from when it completed previously
... Limit -> OuterJoin -> Scan(with limit)
(PushLimitThroughOuterJoin)
... Limit -> OuterJoin -> Limit -> Scan(with limit)
(PushLimitIntoTableScan; applyJoin return Optional.empty() because of https://github.com/trinodb/trino/blob/408476e5d4067db61324f773510c27731373b91b/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java#L459-L461)
... Limit -> OuterJoin -> Limit -> Scan(with limit)
!!!!! join no longer pushable to table scan
This could be addressed with #9056 but I am not super convinced this is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in PlanOptimizers
looks OK.
.add(new RemoveRedundantIdentityProjections()) | ||
.add(new PushAggregationThroughOuterJoin()) | ||
.add(new ReplaceRedundantJoinWithSource()) // Run this after PredicatePushDown optimizer as it inlines filter constants | ||
.add(new MultipleDistinctAggregationToMarkDistinct()) // Run this after aggregation pushdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... so that multiple distinct aggregations can be pushed into a connector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder whether we actually considered solving the problem differently. instead of changing rule order, we could have a rule that matches to MarkDistinct and invokes aggregation pushdown.
trying this out could be a follow up. if successful, we could simply this class, presumably even rolling the changes back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MarkDistinct
is not always a result of the MultipleDistinctAggregationToMarkDistinct
rule. And when it is, it could be hard to identify the original aggregations because of intervening rules.
Also, applying rule MultipleDistinctAggregationToMarkDistinct
later might be possibly beneficial from the viewpoint of decorrelation.
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix/src/test/java/io/trino/plugin/phoenix/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix5/src/test/java/io/trino/plugin/phoenix5/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
e13aba1
to
5d919d9
Compare
5d919d9
to
1d9830c
Compare
1d9830c
to
18444e5
Compare
Moves the distinct aggregation rewriting optimizers to after PushAggregationIntoTableScan.
TODO: I'm not as familiar with testing around PlanOptimizers, is there a good way to test this?
#4323