Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect aggregation pushdown for case insensitive columns in JDBC some connectors #7320

Closed
kokosing opened this issue Mar 17, 2021 · 4 comments · Fixed by #8551
Closed
Labels
bug Something isn't working correctness

Comments

@kokosing
Copy link
Member

In MySql:

trino:tpch> select c from x;
 c
---
 a
 A
 b
 B
(4 rows)

trino:tpch> select distinct c from x;
 c
---
 a
 b
(2 rows)

Possibly affects SQL Server and MemSql too.

@kokosing kokosing added bug Something isn't working correctness labels Mar 17, 2021
@kokosing
Copy link
Member Author

Another example

trino:tpch> select count(*) from x group by c;
 _col0
-------
     2
     2
(2 rows)

trino:tpch> select count(*) from x where rand() != 52 group by c;
 _col0
-------
     1
     1
     1
     1
(4 rows)

@hashhar
Copy link
Member

hashhar commented Jul 9, 2021

@kokosing @wendigo This is a bit challenging to do correctly.

The easiest case is if the grouping sets contain a text type then the connector can decide to prevent pushdown. This takes care of things like group by or distinct.

For cases with a global aggregation (e.g. max(c) or count(c)) we might need to add something to AggregateFunction to identify if the function is case-sensitive or not. e.g. max is case-sensitive but count is not.

I've implemented a fix for the grouping sets case but for global aggregation I'm not sure about how we want to tackle this. It feels a bit "dirty" to add the concept of case-sensitivity vs not to engine representation of functions.

Note that I can blanket disallow aggregation pushdown if the input to an aggregation function is a text type but that looks more heavy handed than necessary? Or should we tackle it as a follow-up?

Another alternative is to add this logic to the aggregation function rewrite rules individually (e.g. ImplementMinMax)

@kokosing
Copy link
Member Author

kokosing commented Jul 9, 2021

if the function is case-sensitive or not.

I was under impression it is about argument type. Can make a condition on information is type is case sensitive or not? Is io.trino.plugin.jdbc.JdbcTypeHandle#caseSensitivity hold an information here?

@hashhar
Copy link
Member

hashhar commented Jul 9, 2021

Argument type alone doesn't mean pushdown will result in incorrect results.

e.g. SELECT count(varchar) is safe to pushdown while SELECT max(varchar) is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctness
Development

Successfully merging a pull request may close this issue.

2 participants