-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to enable case insenstive predicate pushdown in JDBC connectors #7022
Allow to enable case insenstive predicate pushdown in JDBC connectors #7022
Conversation
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
return caseInsensitivePredicatePushdownEnabled; | ||
} | ||
|
||
@Config("case-insensitive-predicate-pushdown.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the case where someone sets their MySQL instance to use case sensitive collation by default and they want to leverage the full pushdown capabilities. In such a situation, the "insensitive" in the flag name is not reasonable, as this is exactly what they are not doing.
Maybe unsafe-varchar-predicate-pushdown.enabled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... in case of PostgreSQL or other case sensitive database one case change collation to use case insensitive comparisons. Then we might need to do something reverse to match the remote database collation.
How about varchar-predicate-pushdown.mode
with values FULL
, PUSHDOWN_AND_KEEP
and CASE_INSENSITIVE
? Then depending on connector we could have different defaults. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an option, but IMO exposes too much internal detail, while being also pretty technical and I am not yet convinced it's the right way.
Also, why would we have the toggle for a connector which doesn't need it, because remote has just one collation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I did it your way.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataSessionProperties.java
Outdated
Show resolved
Hide resolved
PredicatePushdownController CASE_INSENSITIVE_CHARACTER_PUSHDOWN = (session, domain) -> { | ||
checkArgument( | ||
domain.getType() instanceof VarcharType || domain.getType() instanceof CharType, | ||
"CASE_SENSITIVE_PUSHDOWN can be used only for chars and varchars"); | ||
|
||
if (isCaseInsensitivePredicatePushdownEnabled(session)) { | ||
return PUSHDOWN_AND_KEEP.apply(session, domain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or FULL_PUSHDOWN. I see one could argue both ways.
maybe add a comment and we may need to revisit this later, to unlock e.g. agg or join pushdown on top of varchar columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is easier for understand for user if we FULL_PUSHDOWN
here, otherwise he might say that pushdown does not work
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/PredicatePushdownController.java
Show resolved
Hide resolved
@@ -46,6 +47,11 @@ public JdbcMetadataSessionProperties(JdbcMetadataConfig jdbcMetadataConfig, @Max | |||
"Enable aggregation pushdown", | |||
jdbcMetadataConfig.isAggregationPushdownEnabled(), | |||
false)) | |||
.add(booleanProperty( | |||
CASE_INSENSTIVE_PREDICATE_PUSHDOWN_ENABLED, | |||
"Enable case insensitive predicate pushdown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this is a good description for this option. As I understand it, pushdown is disabled on character columns (for all operators?), on databases which are doing case insensitive string comparison by default, right? Maybe something "Always do potentially unsafe (var)char predicate pushdown on databases which do case insensitive string comparison"?
Also, this is a tricky setting, or perhaps I should say "esoteric", so can it be documented somewhere is better detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, pushdown is disabled on character columns (for all operators?), on databases which are doing case insensitive string comparison by default, right?
It is more complicated than that. By default we pushdown a = 'value'
predicate when remote data source is doing case insensitive string comparison by default and we also keep the predicate on Trino side.
I updated the message, please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK now, though one could argue that "potentially unsafe" already implies "may cause incorrect results"
fcd75e1
to
7614d5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments, addressed.
return caseInsensitivePredicatePushdownEnabled; | ||
} | ||
|
||
@Config("case-insensitive-predicate-pushdown.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I did it your way.
@@ -46,6 +47,11 @@ public JdbcMetadataSessionProperties(JdbcMetadataConfig jdbcMetadataConfig, @Max | |||
"Enable aggregation pushdown", | |||
jdbcMetadataConfig.isAggregationPushdownEnabled(), | |||
false)) | |||
.add(booleanProperty( | |||
CASE_INSENSTIVE_PREDICATE_PUSHDOWN_ENABLED, | |||
"Enable case insensitive predicate pushdown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, pushdown is disabled on character columns (for all operators?), on databases which are doing case insensitive string comparison by default, right?
It is more complicated than that. By default we pushdown a = 'value'
predicate when remote data source is doing case insensitive string comparison by default and we also keep the predicate on Trino side.
I updated the message, please take a look.
PredicatePushdownController CASE_INSENSITIVE_CHARACTER_PUSHDOWN = (session, domain) -> { | ||
checkArgument( | ||
domain.getType() instanceof VarcharType || domain.getType() instanceof CharType, | ||
"CASE_SENSITIVE_PUSHDOWN can be used only for chars and varchars"); | ||
|
||
if (isCaseInsensitivePredicatePushdownEnabled(session)) { | ||
return PUSHDOWN_AND_KEEP.apply(session, domain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is easier for understand for user if we FULL_PUSHDOWN
here, otherwise he might say that pushdown does not work
Since PostgreSQL collation is just different (although case sensitive by default), i am not sure there is a point in treating PostgreSQL predicate pushdown from MySQL's or SQL Server's. That would allow us to remove |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/VarcharPredicatePushdownConfig.java
Outdated
Show resolved
Hide resolved
...-base-jdbc/src/main/java/io/trino/plugin/jdbc/VarcharPredicatePushdownSessionProperties.java
Outdated
Show resolved
Hide resolved
@@ -735,6 +735,7 @@ public void testCaseSensitiveDataMapping(DataMappingTestSetup dataMappingTestSet | |||
throw new SkipException("PostgreSQL has different collation than Trino"); | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rev
@@ -599,6 +600,40 @@ private String getLongInClause(int start, int length) | |||
return "orderkey IN (" + longValues + ")"; | |||
} | |||
|
|||
@Test(dataProvider = "testCaseInsensitivePredicatePushdownDataProvider") | |||
public void testCaseInsensitivePredicatePushdown(String type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BCT candidate?
Or we need BaseJdbcConnectorTest
for cases like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BCT candidate?
Sure these tests should be ported, but it is out of the scope of this PR.
Or we need BaseJdbcConnectorTest for cases like this?
Not yet. This feature only exists in few JDBC connectors. So I think it is too specific to have generic tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure these tests should be ported, but it is out of the scope of this PR.
sure, let's have an issue
we also need something generic for TopN pushdows, see #7031
That is in line with my thinking. I just wanted to keep the scope of this PR limited and PostgreSQL is next in line in my queue. I am wondering if it is possible to do anything better for PostgreSQL, for example |
Good point, i didn't think about this. |
The PostgreSQL case is changing in #7145. |
Let's automate this. Can you please add a test in
If we test this on a table where cases doesn't matter (eg |
7614d5a
to
87ea72e
Compare
PTAL Adding a test to |
Can you please update PostgreSqlClient as well? |
87ea72e
to
364eac3
Compare
I did. This one is tricky, it compares values differently. So I no longer validate incorrect results, as they are different for PostgreSQL than for MySQL. |
You still can validate incorrect results in a case like |
I think such assertion is overly complicated. Let's leave it as it is. |
.isNotFullyPushedDown(FilterNode.class) | ||
.skippingTypesCheck() | ||
.matches("VALUES 'A', 'b', 'B'"); | ||
assertThat(query(format("SELECT value FROM %s WHERE value > 'A'", tableName))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if you tested that ordering is correct too.
Some databases/collations can sort A, a, B, b
or a, b, A, B
or A, B, a, b
.
Your test would pass for collations that sort as A, B, a, b
too when I think it should fail.
Or am I broadening the scope by introducing ordering
with predicate pushdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or am I broadening the scope by introducing ordering with predicate pushdown?
You are. However it sounds like a good idea to have a single toggle to control both pushdown at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW same goes to aggregation push down. Notice that pushing down DISTINCT or GROUP BY on varchar column may produce different results. Is it handled today anyhow? It sounds to me that it could be also controlled with the same switch.
@findepi ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should, but we also need a reusable test for aggregations as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusable tests for order by and aggregations are out of my scope. Will create tickets for that, possibly as bugs as I would expect now wrong results. Also I am going to rename toggle from unsafe-varchar-predicate-pushdown.enabled
to unsafe-varchar-pushdown.enabled
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7320, order by is handled by io.trino.plugin.jdbc.BaseJdbcConnectorTest#testCaseSensitiveTopNPushdown
364eac3
to
3bb8248
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
.isNotFullyPushedDown(FilterNode.class) | ||
.skippingTypesCheck() | ||
.matches("VALUES 'A', 'b', 'B'"); | ||
assertThat(query(format("SELECT value FROM %s WHERE value > 'A'", tableName))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7320, order by is handled by io.trino.plugin.jdbc.BaseJdbcConnectorTest#testCaseSensitiveTopNPushdown
3bb8248
to
355c292
Compare
355c292
to
7b63c5f
Compare
@findepi ping |
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
// possibly incorrect results with unexpected 'A' | ||
.isFullyPushedDown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. isFullyPushedDown
verifies results correctness, this shouldn't pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is because allow_pushdown_into_connectors
does not control predicate pushdown.
assertThat(query(format("SELECT value FROM %s WHERE value != 'a'", tableName))) | ||
.skippingTypesCheck() | ||
.matches("VALUES 'A', 'b', 'B'") | ||
.isNotFullyPushedDown(FilterNode.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For postgres we can pushdown !=
but cannot pushdown >
.
Let's use order-related (<
, >
, <=
, etc.) here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added below.
7b63c5f
to
acc1cbe
Compare
FYI Postgres has a solution available which I've started implementing in #7494. |
If it is a case then I think I am going to close this PR. It is better to have correct predicate pushdown instead work arounds. Same option exists for MySQL (https://dev.mysql.com/doc/refman/8.0/en/case-sensitivity.html) and in SqlServer. |
@kokosing It would be useful to keep this branch around because from my quick attempt today it looks like the query generation for predicate pushdown is done within the engine itself (QueryBuilder) and connectors don't have any control over the syntax/clauses being used. I'll take a deeper look again though. Thanks for the info about MySQL and SQLServer - I'll incorporate them in my PR too. |
That's not a news to me
that's definitely new. it will take time to deliver this though, and may or may not be possible for all connectors where we had to disable varchar pushdown by default. therefore i do think this PR is still valuable |
Allow to enable case insenstive predicate pushdown in JDBC connectors