Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support range predicate pushdown for string columns with collation in PostgreSQL connector #9746

Merged
merged 1 commit into from
Dec 3, 2021

Conversation

takezoe
Copy link
Member

@takezoe takezoe commented Oct 23, 2021

No description provided.

@cla-bot cla-bot bot added the cla-signed label Oct 23, 2021
@@ -37,7 +37,6 @@
import io.trino.plugin.jdbc.LongWriteFunction;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update io.trino.plugin.postgresql.TestPostgreSqlConnectorTest#hasBehavior to support SUPPORTS_PREDICATE_PUSHDOWN_WITH_VARCHAR_INEQUALITY

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impl looks good. Please adjust the declaration in tests to make sure the tests pass with the new assumption.

@takezoe takezoe force-pushed the postgresql_pushdown_collate branch 2 times, most recently from 69077fa to 968b6cd Compare October 25, 2021 14:09
@@ -81,13 +76,16 @@ public void setExtensions()
protected boolean hasBehavior(TestingConnectorBehavior connectorBehavior)
{
switch (connectorBehavior) {
case SUPPORTS_PREDICATE_PUSHDOWN_WITH_VARCHAR_INEQUALITY:
case SUPPORTS_JOIN_PUSHDOWN_WITH_VARCHAR_INEQUALITY:
return false;
Copy link
Member Author

@takezoe takezoe Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional updates are necessary to support aggregation pushdown for varchar columns. Since I would focus on support range predicates pushdown in this pull request, I just introduced a new behavior that indicates whether aggregation pushdown is supported for varchar columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a GitHub issue and add a TODO so that we can remove the additional behaviour once it's no longer needed (and other people know that this is something they can work on).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense (as long as this fix can be merged) and I would work on it once this pull request is completed. By the way, supporting type sensitive aggregation pushdown in JDBC plugins doesn't seem easy. Fundamental interface changes may be required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I ran into some similar issues in #7320.

There were some ideas floated in #7320 (comment) (which we didn't end up doing since it looked like a one-off need at that time).

If you already have some direction in your mind it might be helpful to discuss it on #dev on Slack too (if you think the changes will be large and touch the SPI).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, will take a look at #7320 first. Thanks.

@takezoe
Copy link
Member Author

takezoe commented Oct 25, 2021

TestPostgreSqlConnectorTest#testSystemTable() is still failing with the following error:

io.trino.spi.TrinoException: ERROR: collations are not supported by type name

I'm trying to find a way to fix it.

@takezoe
Copy link
Member Author

takezoe commented Oct 26, 2021

Looks like collations have been supported by name type since PostgreSQL 12. I would allow pushdown only for column types that isCollatable() returns true for now.

@takezoe
Copy link
Member Author

takezoe commented Oct 29, 2021

@kokosing @hashhar Could you take a look again?

tableHandle -> ((JdbcTableHandle) tableHandle).getConstraint().isAll(),
TupleDomain.all(),
ImmutableMap.of())));
PlanMatchPattern.node(TableScanNode.class)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why still isNotFullyPushedDown?

Copy link
Member Author

@takezoe takezoe Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the domain compaction, the original predicate seems to remain on Trino side although the compacted predicate is pushed down to PostgreSQL:

SQL on PostgreSQL:

SELECT "nationkey", "name", "regionkey" FROM "tpch"."nation" WHERE ("name" >= ? COLLATE "C" AND "name" <= ? COLLATE "C")

Plan on Trino:

Output[regionkey, nationkey, name]
│   Layout: [regionkey:bigint, nationkey:bigint, name:varchar(25)]
└─ RemoteExchange[GATHER]
   │   Layout: [nationkey:bigint, name:varchar(25), regionkey:bigint]
   └─ ScanFilter[table = postgresql:tpch.nation tpch.nation constraint on [name] columns=[nationkey:bigint:int8, name:varchar(25):varchar, regionkey:bigint:int8], filterPredicate = ("name" IN (CAST('POLAND' AS varchar(25)), CAST('ROMANIA' AS varchar(25)), CAST('VIETNAM' AS varchar(25))))]
          Layout: [nationkey:bigint, name:varchar(25), regionkey:bigint]
          nationkey := nationkey:bigint:int8
          regionkey := regionkey:bigint:int8
          name := name:varchar(25):varchar

@@ -81,13 +76,16 @@ public void setExtensions()
protected boolean hasBehavior(TestingConnectorBehavior connectorBehavior)
{
switch (connectorBehavior) {
case SUPPORTS_PREDICATE_PUSHDOWN_WITH_VARCHAR_INEQUALITY:
case SUPPORTS_JOIN_PUSHDOWN_WITH_VARCHAR_INEQUALITY:
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a GitHub issue and add a TODO so that we can remove the additional behaviour once it's no longer needed (and other people know that this is something they can work on).

@takezoe
Copy link
Member Author

takezoe commented Nov 8, 2021

@findepi Finished updating for now. Could you take a look again?

@findepi findepi requested a review from hashhar November 9, 2021 10:06
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

Can we detect (or apply some heuristics) to determine during runtime whether a given predicate needs the COLLATION applied or not? For example for equality predicates adding the collation will lead to a definite performance regression because no indexes can be used anymore.

If it's not possible to do dynamically during runtime then let's add an experimental. prefixed config property to enable this behaviour and keep it opt in for now?

WDYT @wendigo ?

@takezoe
Copy link
Member Author

takezoe commented Nov 11, 2021

For example for equality predicates adding the collation will lead to a definite performance regression because no indexes can be used anymore.

I think it's possible in some cases but 100% is impossible. But anyway it should be better than full-scan?

If it's not possible to do dynamically during runtime then let's add an experimental. prefixed config property to enable this behaviour and keep it opt in for now?

I'm fine to make this optional as long as we can enable this by configuration. Is adding experimental.postgresql.enable-string-pushdown-with-collate ok?

@hashhar
Copy link
Member

hashhar commented Nov 15, 2021

For example for equality predicates adding the collation will lead to a definite performance regression because no indexes can be used anymore.

I think it's possible in some cases but 100% is impossible. But anyway it should be better than full-scan?

If it's not possible to do dynamically during runtime then let's add an experimental. prefixed config property to enable this behaviour and keep it opt in for now?

I'm fine to make this optional as long as we can enable this by configuration. Is adding experimental.postgresql.enable-string-pushdown-with-collate ok?

@takezoe You can add a session property with matching config property in PostgreSqlConfig. I think experimental.postgresql.string-pushdown-with-collate is fine.

cc: @findepi @kokosing @wendigo @ebyhr Any opinions? I'd prefer to have this behind a configuration toggle (at-least for now) until we can verify that there isn't a performance concern in actual practical usage?

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good % confirmation about config from other maintainers.

@takezoe
Copy link
Member Author

takezoe commented Nov 16, 2021

Looks like postgresql.experimental.string-pushdown-with-collate is more consistent because PostgreSqlConfig used to have another experimental setting postgresql.experimental.array-mapping:

@LegacyConfig("postgresql.experimental.array-mapping")

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments.

Mostly about possible simplifications.

@takezoe takezoe force-pushed the postgresql_pushdown_collate branch 2 times, most recently from 587edbe to dbddd0a Compare November 20, 2021 03:46
@takezoe takezoe force-pushed the postgresql_pushdown_collate branch 2 times, most recently from 28313bd to c59edbe Compare November 27, 2021 08:54
@takezoe
Copy link
Member Author

takezoe commented Nov 27, 2021

@hashhar Finished updating the pull request. Could you take a look again?

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me % comment. Thanks for working on this.

Also it looks the first two commits should be squashed together.

@takezoe takezoe force-pushed the postgresql_pushdown_collate branch 2 times, most recently from 005f947 to c5646cc Compare November 29, 2021 16:18
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % minor testing change.

Thanks for working on this @takezoe. You might also be interested in #10059 which we plan to use as the base for allowing join pushdown too using COLLATE.

@takezoe
Copy link
Member Author

takezoe commented Nov 30, 2021

Looks like hasBehavior() has to behave differently depending on whether enable_string_pushdown_with_collate is true or false. And it seems annoying to cover by a single test class.

For example, having a flag as a field and changing its value inside testStringPushdownWithCollate() is simple but I don't think it's an ideal way. Accepting Session at hasBehavior() might be smarter but this will require many changes even in other JDBC connectors' test cases.

Does splitting TestPostgreSqlConnectorTest to create another test class just for testing string pushdown with collate make sense?

@takezoe
Copy link
Member Author

takezoe commented Nov 30, 2021

having a flag as a field and changing its value inside testStringPushdownWithCollate()

I adopted this way for now and added a case for join on VARCHAR columns.

@takezoe takezoe force-pushed the postgresql_pushdown_collate branch 2 times, most recently from e5e96e6 to 7e5ae28 Compare November 30, 2021 08:51
@hashhar
Copy link
Member

hashhar commented Dec 1, 2021

I adopted this way for now and added a case for join on VARCHAR columns.

Since only PostgreSQL has this behaviour today (and not enabled by default) I think it's fine to not add a new behaviour to TestingConnectorBehaviour just yet. That way you don't need the flag field (especially because we aren't using hasBehaviour in the new test at all).

In terms of future evolution I think once we prove it out with PostgreSQL we can lift the config to BaseJdbcConfig and then add a test in BaseJdbcConnectorTest which sets the session property - at that time we can revisit the best way to structure the TestingConnectorBehaviour (or maybe by that time we'll have more formal ways of handling collations).

@findepi @ebyhr @wendigo @takezoe Any thoughts?

@findepi
Copy link
Member

findepi commented Dec 1, 2021

@hashhar sounds good. except i am not convinced we need a config just yet, except for trying things out (a temporary kill switch)

@takezoe
Copy link
Member Author

takezoe commented Dec 2, 2021

Ah, I see. That makes sense. I updated the test case.

@takezoe takezoe force-pushed the postgresql_pushdown_collate branch 3 times, most recently from 6fd6890 to 16b3c0c Compare December 3, 2021 16:12
@hashhar
Copy link
Member

hashhar commented Dec 3, 2021

Closing and re-opening to get ci to run (some glitch on GitHub end).

@hashhar hashhar closed this Dec 3, 2021
@hashhar hashhar reopened this Dec 3, 2021
@hashhar
Copy link
Member

hashhar commented Dec 3, 2021

Unrelated failure. Thanks @takezoe for the feature. Merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants