Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation - SqlCallTransformer #442

Conversation

findinpath
Copy link
Contributor

What changes are proposed in this pull request, and why are they necessary?

In case of dealing with Hive views which make use of set operation (e.g. UNION) ensure that the char fields from the inner SELECT statements have the same type as the output field types of the set operation.

Due to wrong coercion between varchar and char in Trino, as described in trinodb/trino#9031
, a work-around needs to be applied in case of translating Hive views which contain a UNION dealing with char and varchar types. The work-around consists in the explicit cast of the field having char type towards varchar type corresponding of the set operation output type.

This PR is a spin-off from #438 using a slighly different functionality to achieve a similar functionality.

From a technical design point of view, the present PR extends SqlCallTransformer to modify on the fly the select branch of the set operation which needs explicit casting for its char columns (which need to be changed to varchar type due to trinodb/trino#9031 Trino limitation.

How was this patch tested?

The patch has been tested with automatic unit tests.

@findinpath findinpath changed the title [Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation [Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation - SqlCallTransformer Aug 11, 2023
@findinpath findinpath force-pushed the findinpath/cast-fields-in-union-sqlcalltransformer branch 4 times, most recently from 3a4dda3 to 37f39d3 Compare September 19, 2023 19:30

Optional<RelDataType> selectNodeDataType;
try {
selectNodeDataType = Optional.of(deriveRelDatatype(sqlNode));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aastha25 in some cases while running the HiveToTrinoConverterTest#testViews I stumbled on RuntimeException issued from deriveRelDataType(SqlNode).

java.lang.RuntimeException: Failed to derive the RelDataType for SqlNode: CAST(row("b"."b1") as row("b1" varchar)) AS `b` with topSqlNode: SELECT `tablef`.`a`, `generic_project`(`tablef`.`b`, 'b', 'struct<b1:string>') AS `b`
FROM `test`.`tablef` AS `tablef`
	at com.linkedin.coral.common.utils.TypeDerivationUtil.getRelDataType(TypeDerivationUtil.java:87)
	at com.linkedin.coral.common.transformers.SqlCallTransformer.deriveRelDatatype(SqlCallTransformer.java:68)

Is this expected?
I currently workaround this limitation by just working with column types for which the type can actually be derived.

Also side note - Why throwing the very generic RuntimeException in TypeDerivationUtil instead of specialized exceptions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath were you able to isolate the test case which is leading to the failure?

Copy link
Contributor Author

@findinpath findinpath Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When not catching the exception

          try {
            selectNodeDataType = Optional.of(deriveRelDatatype(sqlNode));
          } catch (RuntimeException e) {
            // The type derivation may fail for complex expressions
            selectNodeDataType = Optional.empty();
          }

, the following tests will be failing:

  • fuzzy_union_view_double_branch_evolved_different
  • fuzzy_union_view_more_than_two_branches_evolved
  • fuzzy_union_view_map_with_struct_value_evolved
  • fuzzy_union_view_array_with_struct_value_evolved
  • fuzzy_union_view_deeply_nested_struct_evolved
  • fuzzy_union_view_deeply_nested_complex_struct_evolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljfgem Can we fix the generic type derivation method and also throw the underlying error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will take a look.

}

List<SqlNode> updatedOperands = new ArrayList<>(sqlCall.getOperandList().size());
for (SqlNode operand : operandsList) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we iterate on the operandsList twice? Can we merge this loop and the above? I see below we cast to SqlSelect in the first loop we check if it is SqlSelect. In both we iterate on the selectNodes, etc. I think there is room for consolidation? PS. In any case let us reuse the naming conventions suggested above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to find the least restrictive type for each select item index from the operands - this is one looping.

Once we have the least restrictive type, we need another looping to verify whether the operands need to be updated.

I didn't find a better alternative to do this.

@wmoustafa
Copy link
Contributor

wmoustafa commented Sep 28, 2023

@KevinGe00 Can we run integration tests on this PR? Trino tests are enough.

@findinpath findinpath force-pushed the findinpath/cast-fields-in-union-sqlcalltransformer branch 2 times, most recently from be01343 to a2bf267 Compare September 28, 2023 20:35
…set operation

In case of dealing with Hive views which make use of set operation (e.g. UNION)
ensure that the `char` fields from the inner SELECT statements have the same type
as the output field types of the set operation.

Due to wrong coercion between `varchar` and `char` in Trino, as described in
trinodb/trino#9031
, a work-around needs to be applied in case of translating Hive views which
contain a UNION dealing with char and varchar types.
The work-around consists in the explicit cast of the field
having char type towards varchar type corresponding of the set operation
output type.
@findinpath findinpath force-pushed the findinpath/cast-fields-in-union-sqlcalltransformer branch from a2bf267 to 61aa89d Compare September 28, 2023 20:39
@findinpath
Copy link
Contributor Author

findinpath commented Sep 28, 2023

@wmoustafa addressed comments and rebased on master.

Copy link
Contributor

@wmoustafa wmoustafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with @KevinGe00 that integration tests passed. Thanks @findinpath for working on this. Appreciate the spin-off from #438.

@wmoustafa wmoustafa merged commit 4bb38a1 into linkedin:master Oct 5, 2023
1 check passed
aastha25 added a commit to aastha25/coral that referenced this pull request Mar 5, 2024
aastha25 added a commit that referenced this pull request Mar 5, 2024
* Revert "[Coral-Trino] Cast char fields, if necessary, to varchar type in the set operation (#442)"

This reverts commit 4bb38a1.

* spotlessfix
KevinGe00 added a commit to KevinGe00/coral that referenced this pull request May 15, 2024
… in the set operation (linkedin#442)"

This reverts commit 4bb38a1.

# Conflicts:
#	coral-trino/src/main/java/com/linkedin/coral/trino/rel2trino/DataTypeDerivedSqlCallConverter.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants