Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce projection pushdown framework for JDBC connectors #22203

Merged

Conversation

Praveen2112
Copy link
Member

@Praveen2112 Praveen2112 commented May 30, 2024

Description

This framework would allow us to pushdown specific functions closer to the JDBC datasource. We use this function to pushdown reverse function in case of PostgreSQL connector. In order to push down a specific function to the underlying JDBC datasource, we need two parts

  • Client should implement convertProjection
  • Write a ProjectFunctionRule specific to the function to be pushed down.

We also support partial pushdown of expression i.e LOWER(REVERSE(varchar_col)) will be translated into LOWER(synthetic_col_from_query)

Spl thanks to @SemionPar for inspiring me to use reverse function for this framework.

Additional context and related issues

Without this change when we try to run a query like this

explain select reverse(name) from nation

Current master

                                                               Query Plan
-----------------------------------------------------------------------------------------------------------------------------------------
 Trino version: testversion
 Fragment 0 [SOURCE]
     Output layout: [expr]
     Output partitioning: SINGLE []
     Output[columnNames = [_col0]]
     │   Layout: [expr:varchar(25)]
     │   Estimates: {rows: 25 (1.34kB), cpu: 0, memory: 0B, network: 0B}
     │   _col0 := expr
     └─ ScanProject[table = postgresql:tpch.nation tpch.nation columns=[name:varchar(25):varchar]]
            Layout: [expr:varchar(25)]
            Estimates: {rows: 25 (1.34kB), cpu: 1.34k, memory: 0B, network: 0B}/{rows: 25 (1.34kB), cpu: 1.34k, memory: 0B, network: 0B}
            expr := reverse(name)
            name := name:varchar(25):varchar

With this optimization

trino:tpch> explain select reverse(name) from nation;
                                                                    Query Plan
--------------------------------------------------------------------------------------------------------------------------------------------------
 Trino version: testversion
 Fragment 0 [SOURCE]
     Output layout: [pfgnrtd]
     Output partitioning: SINGLE []
     Output[columnNames = [_col0]]
     │   Layout: [pfgnrtd:varchar(25)]
     │   Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B}
     │   _col0 := pfgnrtd
     └─ TableScan[table = postgresql:Query[SELECT REVERSE("name") AS "_pfgnrtd_0" FROM "tpch"."nation"] columns=[_pfgnrtd_0:varchar(25):varchar]]
            Layout: [pfgnrtd:varchar(25)]
            Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}
            pfgnrtd := _pfgnrtd_0:varchar(25):varchar

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# PostgreSQL
* Pushdown `reverse` function to the datasource when applied as a projection expression 

Comment on lines 468 to 491
ImmutableSet.Builder<JdbcColumnHandle> columnsBuilder = ImmutableSet.builder();
ImmutableList.Builder<Assignment> assignmentBuilder = ImmutableList.builder();
ImmutableMap.Builder<ConnectorExpression, Variable> syntheticVariablesBuilder = ImmutableMap.builder();
ImmutableMap.Builder<String, ParameterizedExpression> columnExpressionsBuilder = ImmutableMap.builder();
for (ConnectorExpression child : projection.getChildren()) {
RewrittenExpression rewrittenExpression = rewriteExpression(
session,
nextSyntheticColumnId,
child,
assignments,
translatedExpression);
nextSyntheticColumnId = rewrittenExpression.nextSyntheticColumnId;
columnsBuilder.addAll(rewrittenExpression.columnHandles);
assignmentBuilder.addAll(rewrittenExpression.assignments);
syntheticVariablesBuilder.putAll(rewrittenExpression.syntheticVariables);
columnExpressionsBuilder.putAll(rewrittenExpression.columnExpressions);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many objects might be created - I'm not sure if we could pass builder as arguments here.

@huw0
Copy link
Member

huw0 commented May 30, 2024

This PR is quite timely as I was looking at the possibility to do something similar today. Thanks @Praveen2112.

My primary use case is pushdown of JSON field extraction which causes a significant speedup to the overall query.

However ideally it'd be good to enable pushdown of all functions that the connector source supports where the expression matches that of Trino's own.

It'd be really helpful if function pushdown could be easily reusable between convertPredicate and convertProjection?
This adds some complexity so may be better as a separate PR but I wonder if it is worth early consideration.

ImmutableMap.Builder<String, ParameterizedExpression> columnExpressionsBuilder = ImmutableMap.builder();
Set<ConnectorExpression> translatedExpression = new HashSet<>();

if (isComplexExpressionPushdown(session)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a first statement in the method, and if false - return Optional.empty();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to return Optional#empty as we need to additional changes for normal expression.

Copy link
Contributor

@ssheikin ssheikin Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Optional.empty(); if if (!newVariables.isEmpty()) { is false.
if (!newVariables.isEmpty()) { is true only if newVariables is populated.
newVariables is populated only if newVariablesBuilder is populated.
newVariablesBuilder is populated only when if (isComplexExpressionPushdown(session)) { is true.
What do I miss?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay !! I get it - Previously I was a bit confused as this code snippet was a part of applyProjection. Thanks for pointing it out.

Comment on lines 382 to 386
nextSyntheticColumnId = rewrittenExpression.nextSyntheticColumnId;
newColumnsBuilder.addAll(rewrittenExpression.columnHandles);
assignmentBuilder.addAll(rewrittenExpression.assignments);
newVariablesBuilder.putAll(rewrittenExpression.syntheticVariables);
columnExpressionsBuilder.putAll(rewrittenExpression.columnExpressions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need in this assignments and builders. Just use rewrittenExpression

if (isComplexExpressionPushdown(session)) {
for (ConnectorExpression projection : projections) {
RewrittenExpression rewrittenExpression = rewriteExpression(session, nextSyntheticColumnId, projection, assignments, translatedExpression);
nextSyntheticColumnId = rewrittenExpression.nextSyntheticColumnId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextSyntheticColumnId -> nextSyntheticColumnId()

ImmutableList.Builder<Assignment> assignmentBuilder = ImmutableList.builder();
ImmutableMap.Builder<ConnectorExpression, Variable> newVariablesBuilder = ImmutableMap.builder();
ImmutableMap.Builder<String, ParameterizedExpression> columnExpressionsBuilder = ImmutableMap.builder();
Set<ConnectorExpression> translatedExpression = new HashSet<>();
Copy link
Contributor

@ssheikin ssheikin Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't grasp the PR fully yet, but for now it looks like this variable could have a little bit more ugly name, to gather more attention. E.g. translatedExpressionAccumulator

Comment on lines 467 to 500
// If the parent expression cannot be translated try translating its argument
ImmutableSet.Builder<JdbcColumnHandle> columnsBuilder = ImmutableSet.builder();
ImmutableList.Builder<Assignment> assignmentBuilder = ImmutableList.builder();
ImmutableMap.Builder<ConnectorExpression, Variable> syntheticVariablesBuilder = ImmutableMap.builder();
ImmutableMap.Builder<String, ParameterizedExpression> columnExpressionsBuilder = ImmutableMap.builder();
for (ConnectorExpression child : projection.getChildren()) {
RewrittenExpression rewrittenExpression = rewriteExpression(
session,
nextSyntheticColumnId,
child,
assignments,
translatedExpression);
nextSyntheticColumnId = rewrittenExpression.nextSyntheticColumnId;
columnsBuilder.addAll(rewrittenExpression.columnHandles);
assignmentBuilder.addAll(rewrittenExpression.assignments);
syntheticVariablesBuilder.putAll(rewrittenExpression.syntheticVariables);
columnExpressionsBuilder.putAll(rewrittenExpression.columnExpressions);
}

return new RewrittenExpression(
nextSyntheticColumnId,
syntheticVariablesBuilder.buildOrThrow(),
columnExpressionsBuilder.buildOrThrow(),
columnsBuilder.build(),
assignmentBuilder.build());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it can be extacted to a separate commit

assignmentBuilder.build());
}

public record RewrittenExpression(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move it to the very bottom of the class?

@ssheikin
Copy link
Contributor

ssheikin commented Jun 7, 2024

skimmed

@Praveen2112 Praveen2112 force-pushed the praveen/jdbc_projection_pushdown branch 3 times, most recently from 31d24f6 to 45a08c7 Compare June 26, 2024 11:16
@Praveen2112
Copy link
Member Author

@ssheikin Thanks a lot for the review. AC

@Praveen2112 Praveen2112 force-pushed the praveen/jdbc_projection_pushdown branch from 45a08c7 to 31c973c Compare June 27, 2024 12:26
@ebyhr
Copy link
Member

ebyhr commented Jul 1, 2024

The change may cause regression when reverse function pushdown happens on non-varchar types (e.g. money type) in PostgreSQL.

@Praveen2112
Copy link
Member Author

The same can be restricted via rules right ? Reverse pushdown rule can be applied only on JdbcTypeHandle which corresponds to VARCHAR - Would it solve the issue ?

@Praveen2112 Praveen2112 force-pushed the praveen/jdbc_projection_pushdown branch from 31c973c to 7735be0 Compare July 1, 2024 06:56
@Praveen2112
Copy link
Member Author

@ebyhr Thank you for pointing the regression. Have fixed it and also added a test case to verify it doesn't break for columns with special data type.

Copy link
Contributor

@ssheikin ssheikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

ImmutableMap.Builder<ConnectorExpression, Variable> newVariablesBuilder = ImmutableMap.builder();
ImmutableMap.Builder<String, ParameterizedExpression> columnExpressionsBuilder = ImmutableMap.builder();

for (ConnectorExpression projection : ImmutableSet.copyOf(projections)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe it's cheaper to copy to ImmutableList, instead of Set, as projections is a list, which may be ImmutableList too, avoiding actual copying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would allow to skip processing redundant expression - if propagated

@findinpath
Copy link
Contributor

@Praveen2112 would you mind adding more hands-on details on the description of the PR about the problem that you're solving?

Also helpful would be potentially to share an EXPLAIN plan of a query with and without your changes.

@Praveen2112 Praveen2112 force-pushed the praveen/jdbc_projection_pushdown branch from 7735be0 to 054aa3e Compare July 1, 2024 13:48
@Praveen2112
Copy link
Member Author

@findinpath Yeah sure !! Would add the EXPLAIN set of changes -


List<Assignment> outputAssignments = assignmentBuilder.build();

if (newVariables.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be moved even upper.

@Praveen2112 Praveen2112 force-pushed the praveen/jdbc_projection_pushdown branch from 054aa3e to 6662a68 Compare July 1, 2024 14:13
@Praveen2112
Copy link
Member Author

@ssheikin Thanks for the review. AC

We use this framework for pushing down `REVERSE` function to PostgreSql connector. This framework doesn't support
partial pushdown of expressions.
@Praveen2112 Praveen2112 force-pushed the praveen/jdbc_projection_pushdown branch from 6662a68 to 696532c Compare July 2, 2024 04:50
@Praveen2112
Copy link
Member Author

@ebyhr Thanks a lot for the review - AC

@Praveen2112
Copy link
Member Author

@findinpath I have updated the description with the changes in the plan as well

@Praveen2112 Praveen2112 merged commit 3acbb30 into trinodb:master Jul 2, 2024
95 checks passed
@github-actions github-actions bot added this to the 452 milestone Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants