Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cast projection pushdown in oracle #22728

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented Jul 18, 2024

Description

This PR adds support for cast projection pushdown for the Oracle connector. This uses the framework introduced in #22203 for cast projection pushdown.
Currently, this PR only adds support for the CHAR and VARCHAR cast pushdown, which has the following limitations:

  1. Pushdown is not supported with Char having length > 500
  2. Pushdown is not supported with Varchar having length > 1000
  3. Pushdown is not supported with unbounded varchar
  4. Pushdown is not supported with Char cast to Varchar

Release notes

(X) Release notes are required, with the following suggested text:

# Oracle
* Add support for cast project pushdown with Char and Varchar data type. ({issue}`22728`

Copy link
Contributor

@ssheikin ssheikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

new CastTestCase("char(10)", "varchar(1001)"),
new CastTestCase("char(10)", "varchar"),
new CastTestCase("varchar(10)", "varchar(1001)"),
new CastTestCase("varchar(10)", "varchar"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pity. This is very common usecase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we do something for varchars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use TO_NCLOB to cast in varchar or varchar(... >1000 CHAR). https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NCLOB.html

Trying it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from CAST pushdown, comparison with clob type was not supported for pushdown. So I have made the change to support clob type comparison pushdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic!

assertJoinFullyPushedDown(session, leftTable, rightTable, "LEFT JOIN", "l.varchar_50 = r.varchar_100", Arrays.asList(1, 2));
assertJoinFullyPushedDown(session, leftTable, rightTable, "RIGHT JOIN", "l.varchar_50 = r.varchar_100", Arrays.asList(1, null));
assertJoinFullyPushedDown(session, leftTable, rightTable, "INNER JOIN", "l.varchar_50 = r.varchar_100", List.of(1));
assertJoinFullyPushedDown(session, leftTable, rightTable, "INNER JOIN", "r.varchar_100 = l.varchar_50", List.of(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why join criteria in this assertion differs from others?

targetTypeHandle.get()));
}

private static Optional<JdbcTypeHandle> toJdbcTypeHandle(Type type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is the meat of this PR.
And this is a main opportunity to improve for function pushdown.


assertJoinFullyPushedDown(session, leftTable, rightTable, "LEFT JOIN", "l.varchar_50 = r.varchar_100", Arrays.asList(1, 2));
assertJoinFullyPushedDown(session, leftTable, rightTable, "RIGHT JOIN", "l.varchar_50 = r.varchar_100", Arrays.asList(1, null));
assertJoinFullyPushedDown(session, leftTable, rightTable, "INNER JOIN", "l.varchar_50 = r.varchar_100", List.of(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah

@krvikash krvikash force-pushed the krvikash/oracle-cast-pushdown branch from 1acef9d to f4204f3 Compare July 19, 2024 08:33
Comment on lines 623 to 626
new CastTestCase("integer", "decimal(15)"),
new CastTestCase("integer", "decimal(10, 2)"),
new CastTestCase("integer", "decimal(30, 2)"),
new CastTestCase("smallint", "integer"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the complexity of the implementation
f4204f3#r1683242993
I guess these return types can be handled too. Could you please try?


private static String toNClob(String expression)
{
return "TO_NCLOB(%s)".formatted(expression);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if during trino reads the remote DB the unsupported by trino type (some remote-specific type) was force-mapped to text,
later here it was capturedAs CharType or VarcharType ?
Will TO_NCLOB work correctly?
Maybe it's the case here too https://github.com/trinodb/trino/pull/22203/files#diff-510ce28e8ade87630227de284298df1cb44186f73bc9f2219e1f6ad6505d410aR66

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this change. nclob cast pushdown is out of scope of this PR.

if (!clobInfo[1]) {
secondArgument = toNClob(secondExpression);
}
return "DBMS_LOB.compare(%s, %s) %s 0".formatted(firstArgument, secondArgument, operator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the DBMS_LOB is something which is provided by default on Oracle or is it some sort of a package which needs to be installed ? How would it work with collation ? What if the underlying bytes in CLOB has some invalid string pattern - in this case won't Trino and Oracle differ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this change. nclob comparison is out of scope of this PR.

}
}

private static List<CastTestCase> supportedCastTypePushdown()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is same as parameterized tests ? Can we have an assertion which assert all types of join and we could pass the types as an argument to it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is same as parameterized tests ?

Yes

Can we have an assertion which assert all types of join and we could pass the types as an argument to it ?

I have a test testAllJoinPushdownWithCast which covers all join but only with one data type. For other all data types I am verifying only with INNER JOIN. Should we test all data types with all join? Or one should be fine?

@krvikash krvikash force-pushed the krvikash/oracle-cast-pushdown branch 5 times, most recently from fa267be to 77cf16e Compare July 23, 2024 16:09
@krvikash
Copy link
Contributor Author

Thanks @ssheikin @Praveen2112 @marcinsbd @SemionPar for the review. Addressed comments. Removing pushdown support for nclob data type. This requires some more work.

Here is summary:
This PR only adds support for the VARCHAR and CHAR cast pushdown, which has the following limitations:

  1. Pushdown is not supported with Char having length > 500
  2. Pushdown is not supported with varchar having length > 1000
  3. Pushdown is not supported with unbounded varchar

@krvikash krvikash force-pushed the krvikash/oracle-cast-pushdown branch from 77cf16e to e341765 Compare July 25, 2024 07:31
@krvikash
Copy link
Contributor Author

Added AbstractRewriteCast in base-jdbc pacakge, so that other jdbc connector can support cast pushdown by implementing toJdbcTypeHandle and buildCast. Also I have the base test case in base-jdbc package.

There is test case enhancement required to cover this comment #22728 (comment) . I will create table using remote database instead of trino.

@krvikash krvikash force-pushed the krvikash/oracle-cast-pushdown branch 2 times, most recently from b44eb71 to 424f81d Compare July 25, 2024 21:16
@mosabua
Copy link
Member

mosabua commented Sep 4, 2024

Adding stale-ignore label since this looks like its very active but just stalled a bit.

@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Sep 4, 2024
@krvikash krvikash force-pushed the krvikash/oracle-cast-pushdown branch from 9fb7a98 to 26e9272 Compare September 5, 2024 09:32
@krvikash
Copy link
Contributor Author

krvikash commented Sep 5, 2024

(Do not cast unnecessary with extra space padding when target char type has more length than source char type)

@krvikash krvikash force-pushed the krvikash/oracle-cast-pushdown branch from 26e9272 to 831320b Compare September 5, 2024 09:49
@krvikash
Copy link
Contributor Author

krvikash commented Sep 5, 2024

(rebased with master)

@krvikash krvikash force-pushed the krvikash/oracle-cast-pushdown branch from 831320b to 4151d4c Compare September 5, 2024 09:56
@krvikash
Copy link
Contributor Author

krvikash commented Sep 5, 2024

(CI fix)

@krvikash krvikash force-pushed the krvikash/oracle-cast-pushdown branch from 4151d4c to 93b238f Compare September 5, 2024 10:05
import static java.util.Arrays.asList;
import static org.assertj.core.api.Assertions.assertThat;

public abstract class BaseOracleCastPushdownTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this abstraction ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstraction will be useful in the scenario where we might want to verify cast pushdown for different query runner.

Comment on lines +75 to +76
assertThatThrownBy(() -> getQueryRunner().execute("SELECT CAST(%s AS %s) FROM %s".formatted(testCase.sourceColumn(), testCase.castType(), leftTable())))
.hasMessageMatching("(.*)Cannot cast (.*) to (.*)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it thrown by Trino or by Oracle or by both ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By Trino.

@krvikash
Copy link
Contributor Author

Pushdown was giving incorrect result wrt Unicode character values and char data type column. So, For now in this PR I am removing char cast pushdown.

@krvikash
Copy link
Contributor Author

(rebased with master)

@krvikash
Copy link
Contributor Author

Pushdown was giving incorrect result wrt Unicode character values and char data type column. So, For now in this PR I am removing char cast pushdown.

The issue seems to be happening in this case: When casting char unicode value to varchar type.

SELECT CAST(c_char_unicode AS varchar(50)) FROM table1;
SELECT CAST(c_nchar_unicode AS varchar(50)) FROM table1;

Copy link
Contributor

@mayankvadariya mayankvadariya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please evaluate a possibility of case-insensitive matching in Oracle when pushed down.

public void testCastFails()
{
for (CastTestCase testCase : failCast()) {
assertThatThrownBy(() -> getQueryRunner().execute("SELECT CAST(%s AS %s) FROM %s".formatted(testCase.sourceColumn(), testCase.castType(), leftTable())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this test fails, it doesn't say which condition has failed exactly. Do you think we could improve on that?
Took a quick look - as() or withFailMessage() does not seem to work, maybe custom satifies() would.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SemionPar, I will take this as a follow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krvikash Sure thing!

One more thought that crossed my mind: if cast fails at runtime with different message when pushed down, say:

      Trino: Cannot cast DECIMAL(19, 2) '99999999999999999.99' to DECIMAL(10, 2)
      Teradata: [Error 2616] [SQLState 22003] Numeric overflow occurred during computation.

It would be good to be able to use testCastFails to test such cases, don't you think? We might want to add message customization to CastTestCase (or introduce FailCastTestCase) - consider this another improvement idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemionPar Here is the refactoring PR #23737

@Praveen2112 Praveen2112 merged commit 1c3b9f8 into trinodb:master Sep 26, 2024
61 checks passed
@krvikash krvikash deleted the krvikash/oracle-cast-pushdown branch September 26, 2024 09:41
@github-actions github-actions bot added this to the 460 milestone Sep 26, 2024
@ssheikin
Copy link
Contributor

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

8 participants