Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to enable case insenstive predicate pushdown in JDBC connectors #7022

Closed

Conversation

kokosing
Copy link
Member

Allow to enable case insenstive predicate pushdown in JDBC connectors

return caseInsensitivePredicatePushdownEnabled;
}

@Config("case-insensitive-predicate-pushdown.enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the case where someone sets their MySQL instance to use case sensitive collation by default and they want to leverage the full pushdown capabilities. In such a situation, the "insensitive" in the flag name is not reasonable, as this is exactly what they are not doing.

Maybe unsafe-varchar-predicate-pushdown.enabled ?

cc @losipiuk @electrum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... in case of PostgreSQL or other case sensitive database one case change collation to use case insensitive comparisons. Then we might need to do something reverse to match the remote database collation.

How about varchar-predicate-pushdown.mode with values FULL, PUSHDOWN_AND_KEEP and CASE_INSENSITIVE? Then depending on connector we could have different defaults. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an option, but IMO exposes too much internal detail, while being also pretty technical and I am not yet convinced it's the right way.
Also, why would we have the toggle for a connector which doesn't need it, because remote has just one collation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I did it your way.

PredicatePushdownController CASE_INSENSITIVE_CHARACTER_PUSHDOWN = (session, domain) -> {
checkArgument(
domain.getType() instanceof VarcharType || domain.getType() instanceof CharType,
"CASE_SENSITIVE_PUSHDOWN can be used only for chars and varchars");

if (isCaseInsensitivePredicatePushdownEnabled(session)) {
return PUSHDOWN_AND_KEEP.apply(session, domain);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or FULL_PUSHDOWN. I see one could argue both ways.

maybe add a comment and we may need to revisit this later, to unlock e.g. agg or join pushdown on top of varchar columns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is easier for understand for user if we FULL_PUSHDOWN here, otherwise he might say that pushdown does not work

@@ -46,6 +47,11 @@ public JdbcMetadataSessionProperties(JdbcMetadataConfig jdbcMetadataConfig, @Max
"Enable aggregation pushdown",
jdbcMetadataConfig.isAggregationPushdownEnabled(),
false))
.add(booleanProperty(
CASE_INSENSTIVE_PREDICATE_PUSHDOWN_ENABLED,
"Enable case insensitive predicate pushdown",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is a good description for this option. As I understand it, pushdown is disabled on character columns (for all operators?), on databases which are doing case insensitive string comparison by default, right? Maybe something "Always do potentially unsafe (var)char predicate pushdown on databases which do case insensitive string comparison"?

Also, this is a tricky setting, or perhaps I should say "esoteric", so can it be documented somewhere is better detail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, pushdown is disabled on character columns (for all operators?), on databases which are doing case insensitive string comparison by default, right?

It is more complicated than that. By default we pushdown a = 'value' predicate when remote data source is doing case insensitive string comparison by default and we also keep the predicate on Trino side.

I updated the message, please take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK now, though one could argue that "potentially unsafe" already implies "may cause incorrect results"

@kokosing kokosing force-pushed the origin/master/284_unsafe_pushdown branch from fcd75e1 to 7614d5a Compare March 1, 2021 12:07
Copy link
Member Author

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments, addressed.

return caseInsensitivePredicatePushdownEnabled;
}

@Config("case-insensitive-predicate-pushdown.enabled")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I did it your way.

@@ -46,6 +47,11 @@ public JdbcMetadataSessionProperties(JdbcMetadataConfig jdbcMetadataConfig, @Max
"Enable aggregation pushdown",
jdbcMetadataConfig.isAggregationPushdownEnabled(),
false))
.add(booleanProperty(
CASE_INSENSTIVE_PREDICATE_PUSHDOWN_ENABLED,
"Enable case insensitive predicate pushdown",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, pushdown is disabled on character columns (for all operators?), on databases which are doing case insensitive string comparison by default, right?

It is more complicated than that. By default we pushdown a = 'value' predicate when remote data source is doing case insensitive string comparison by default and we also keep the predicate on Trino side.

I updated the message, please take a look.

PredicatePushdownController CASE_INSENSITIVE_CHARACTER_PUSHDOWN = (session, domain) -> {
checkArgument(
domain.getType() instanceof VarcharType || domain.getType() instanceof CharType,
"CASE_SENSITIVE_PUSHDOWN can be used only for chars and varchars");

if (isCaseInsensitivePredicatePushdownEnabled(session)) {
return PUSHDOWN_AND_KEEP.apply(session, domain);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is easier for understand for user if we FULL_PUSHDOWN here, otherwise he might say that pushdown does not work

@findepi
Copy link
Member

findepi commented Mar 1, 2021

Since PostgreSQL collation is just different (although case sensitive by default), i am not sure there is a point in treating PostgreSQL predicate pushdown from MySQL's or SQL Server's.

That would allow us to remove io.trino.plugin.postgresql.TestPostgreSqlConnectorTest#testCaseSensitiveDataMapping override.

@@ -735,6 +735,7 @@ public void testCaseSensitiveDataMapping(DataMappingTestSetup dataMappingTestSet
throw new SkipException("PostgreSQL has different collation than Trino");
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev

@@ -599,6 +600,40 @@ private String getLongInClause(int start, int length)
return "orderkey IN (" + longValues + ")";
}

@Test(dataProvider = "testCaseInsensitivePredicatePushdownDataProvider")
public void testCaseInsensitivePredicatePushdown(String type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BCT candidate?

Or we need BaseJdbcConnectorTest for cases like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BCT candidate?

Sure these tests should be ported, but it is out of the scope of this PR.

Or we need BaseJdbcConnectorTest for cases like this?

Not yet. This feature only exists in few JDBC connectors. So I think it is too specific to have generic tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure these tests should be ported, but it is out of the scope of this PR.

sure, let's have an issue

we also need something generic for TopN pushdows, see #7031

@kokosing
Copy link
Member Author

kokosing commented Mar 2, 2021

Since PostgreSQL collation is just different (although case sensitive by default), i am not sure there is a point in treating PostgreSQL predicate pushdown from MySQL's or SQL Server's.

That is in line with my thinking. I just wanted to keep the scope of this PR limited and PostgreSQL is next in line in my queue. I am wondering if it is possible to do anything better for PostgreSQL, for example != should work with PUSHDOWN_AND_KEEP there while it does not for MySQL and co.

@findepi
Copy link
Member

findepi commented Mar 2, 2021

I am wondering if it is possible to do anything better for PostgreSQL, for example != should work with PUSHDOWN_AND_KEEP there while it does not for MySQL and co.

Good point, i didn't think about this.
I am not sure it's worthwhile to keep separate impls just for != case.

@findepi
Copy link
Member

findepi commented Mar 3, 2021

The PostgreSQL case is changing in #7145.
We need to somehow make sure that place gets updated too.

@findepi
Copy link
Member

findepi commented Mar 5, 2021

The PostgreSQL case is changing in #7145.
We need to somehow make sure that place gets updated too.

Let's automate this.

Can you please add a test in BaseJdbcConnectorTest that verifies that:

  • either varchar = and < predicate is pushed down into the connector (with correct results)
  • OR session toggle CASE_INSENSTIVE_PREDICATE_PUSHDOWN_ENABLED is supported and results in filter being fully pushed down (ignoring result correctness)?

If we test this on a table where cases doesn't matter (eg nation.name), we can express this easily.

@kokosing kokosing force-pushed the origin/master/284_unsafe_pushdown branch from 7614d5a to 87ea72e Compare March 8, 2021 10:00
@kokosing
Copy link
Member Author

kokosing commented Mar 8, 2021

PTAL

Adding a test to BaseJdbcConnectorTest requires me to run tests for all JDBC connectors. I run just few, so there might be some issues still.

@findepi
Copy link
Member

findepi commented Mar 8, 2021

Can you please update PostgreSqlClient as well?

@kokosing kokosing force-pushed the origin/master/284_unsafe_pushdown branch from 87ea72e to 364eac3 Compare March 8, 2021 11:43
@kokosing
Copy link
Member Author

kokosing commented Mar 8, 2021

Can you please update PostgreSqlClient as well?

I did. This one is tricky, it compares values differently. So I no longer validate incorrect results, as they are different for PostgreSQL than for MySQL.

@findepi
Copy link
Member

findepi commented Mar 8, 2021

You still can validate incorrect results in a case like b vs A, if you want to.

@kokosing
Copy link
Member Author

kokosing commented Mar 8, 2021

You still can validate incorrect results in a case like b vs A, if you want to.

I think such assertion is overly complicated. Let's leave it as it is.

.isNotFullyPushedDown(FilterNode.class)
.skippingTypesCheck()
.matches("VALUES 'A', 'b', 'B'");
assertThat(query(format("SELECT value FROM %s WHERE value > 'A'", tableName)))
Copy link
Member

@hashhar hashhar Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if you tested that ordering is correct too.

Some databases/collations can sort A, a, B, b or a, b, A, B or A, B, a, b.
Your test would pass for collations that sort as A, B, a, b too when I think it should fail.

Or am I broadening the scope by introducing ordering with predicate pushdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or am I broadening the scope by introducing ordering with predicate pushdown?

You are. However it sounds like a good idea to have a single toggle to control both pushdown at once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW same goes to aggregation push down. Notice that pushing down DISTINCT or GROUP BY on varchar column may produce different results. Is it handled today anyhow? It sounds to me that it could be also controlled with the same switch.

@findepi ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, but we also need a reusable test for aggregations as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusable tests for order by and aggregations are out of my scope. Will create tickets for that, possibly as bugs as I would expect now wrong results. Also I am going to rename toggle from unsafe-varchar-predicate-pushdown.enabled to unsafe-varchar-pushdown.enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7320, order by is handled by io.trino.plugin.jdbc.BaseJdbcConnectorTest#testCaseSensitiveTopNPushdown

@kokosing kokosing force-pushed the origin/master/284_unsafe_pushdown branch from 364eac3 to 3bb8248 Compare March 17, 2021 12:44
Copy link
Member Author

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

.isNotFullyPushedDown(FilterNode.class)
.skippingTypesCheck()
.matches("VALUES 'A', 'b', 'B'");
assertThat(query(format("SELECT value FROM %s WHERE value > 'A'", tableName)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7320, order by is handled by io.trino.plugin.jdbc.BaseJdbcConnectorTest#testCaseSensitiveTopNPushdown

@kokosing
Copy link
Member Author

@findepi ping

Comment on lines +577 to +578
// possibly incorrect results with unexpected 'A'
.isFullyPushedDown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. isFullyPushedDown verifies results correctness, this shouldn't pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because allow_pushdown_into_connectors does not control predicate pushdown.

Comment on lines 587 to 590
assertThat(query(format("SELECT value FROM %s WHERE value != 'a'", tableName)))
.skippingTypesCheck()
.matches("VALUES 'A', 'b', 'B'")
.isNotFullyPushedDown(FilterNode.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For postgres we can pushdown != but cannot pushdown >.

Let's use order-related (<, >, <=, etc.) here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added below.

@kokosing kokosing force-pushed the origin/master/284_unsafe_pushdown branch from 7b63c5f to acc1cbe Compare April 3, 2021 07:02
@hashhar
Copy link
Member

hashhar commented Apr 4, 2021

FYI Postgres has a solution available which I've started implementing in #7494.

@kokosing
Copy link
Member Author

kokosing commented Apr 4, 2021

FYI Postgres has a solution available which I've started implementing in #7494.

If it is a case then I think I am going to close this PR. It is better to have correct predicate pushdown instead work arounds. Same option exists for MySQL (https://dev.mysql.com/doc/refman/8.0/en/case-sensitivity.html) and in SqlServer.

@kokosing kokosing closed this Apr 4, 2021
@kokosing kokosing deleted the origin/master/284_unsafe_pushdown branch April 4, 2021 18:21
@hashhar
Copy link
Member

hashhar commented Apr 4, 2021

@kokosing It would be useful to keep this branch around because from my quick attempt today it looks like the query generation for predicate pushdown is done within the engine itself (QueryBuilder) and connectors don't have any control over the syntax/clauses being used. I'll take a deeper look again though.

Thanks for the info about MySQL and SQLServer - I'll incorporate them in my PR too.

@findepi
Copy link
Member

findepi commented Apr 7, 2021

FYI Postgres has a solution available

That's not a news to me

which I've started implementing in #7494.

that's definitely new.

it will take time to deliver this though, and may or may not be possible for all connectors where we had to disable varchar pushdown by default.

therefore i do think this PR is still valuable

@kokosing kokosing restored the origin/master/284_unsafe_pushdown branch April 7, 2021 10:31
@kokosing kokosing reopened this Apr 7, 2021
@kokosing kokosing closed this Jul 26, 2021
@kokosing kokosing deleted the origin/master/284_unsafe_pushdown branch July 26, 2021 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants