[Coral-Schema] Fix incorrect type derivation for repeated field reference on UDF calls #510
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request, and why are they necessary?
This is a continuation of #507. #507 reverts some of the effects of #409 (namely, around single union type handling), and this PR reverts more effects (see below). Note this PR includes those same reverts.
Incorrect behavior after reverting #409 changes:
In Hive,
In the above scenario, we expect that produced schema (by coral-schema) for
col b
to be simply a string, however, what we get back is of type struct<tag_0:string>.Although #409 mitigates this issue by detecting and unwrapping structs with field
tag_0
, it doesn't address the root cause. Which is a bug in SchemaRexShuttle.visitFieldAccess that doesn't handle nested field references on UDF calls introduced in #203, and is fixed in this PR here.How was this patch tested?
Unit tests
Regression tests