Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coral-Schema] Fix incorrect type derivation for repeated field reference on UDF calls #510

Merged
merged 3 commits into from
Jul 31, 2024

Conversation

KevinGe00
Copy link
Contributor

@KevinGe00 KevinGe00 commented Jun 11, 2024

What changes are proposed in this pull request, and why are they necessary?

This is a continuation of #507. #507 reverts some of the effects of #409 (namely, around single union type handling), and this PR reverts more effects (see below). Note this PR includes those same reverts.

Incorrect behavior after reverting #409 changes:
In Hive,

Table t1 that has the columns:
struct_col           	struct<union_field:uniontype<string>>

View v1: select extract_union(struct_col).union_field.tag_0 as b from t1
b: string
Schema for col b produced by coral-schema represents:
struct<tag_0:string>

In the above scenario, we expect that produced schema (by coral-schema) for col b to be simply a string, however, what we get back is of type struct<tag_0:string>.

Although #409 mitigates this issue by detecting and unwrapping structs with field tag_0, it doesn't address the root cause. Which is a bug in SchemaRexShuttle.visitFieldAccess that doesn't handle nested field references on UDF calls introduced in #203, and is fixed in this PR here.

How was this patch tested?

Unit tests
Regression tests

@KevinGe00 KevinGe00 changed the title fix field reference bug in coral schema [Coral-Schema] Fix incorrect type derivation for repeated field reference on UDF calls Jun 11, 2024
Copy link
Collaborator

@1fanwang 1fanwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, minor comments

Copy link
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.
Please ensure the regression tests pass for coral-trino, coral-spark and coral-schema.

@KevinGe00 KevinGe00 merged commit 8786af3 into linkedin:master Jul 31, 2024
1 check passed
KevinGe00 added a commit to KevinGe00/coral that referenced this pull request Jul 31, 2024
…ence on UDF calls (linkedin#510)

* fix field reference bug in coral schema

* bring back override tag

* address nits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants