-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register "passthrough" UDFs with correct ordinal return type #541
Register "passthrough" UDFs with correct ordinal return type #541
Conversation
@KevinGe00 thanks for the PR. I think technically this looks a hacky solution to me, because the func(arg1, arg2), and the return type is nullable only when both arg1 and arg2 are nullable. My high-level point is the previous usage of I can approve this for now as a workaround solution, but I think we do need to realize coral-schema lacks a systematic way to handle nullability in its current implementation. |
The reason
I agree Coral deriving nullabilities correctly for
In cases where the return type inference has more complex rules, we have written classes that conditionally determine the return type given the input. See COALESCE_STRUCT_FUNCTION_RETURN_STRATEGY for an example.
coral-schema does have the ability to infer nullabillities, the current state is that all the inner fields created by operator are set as nullable in alignment with Spark's expectations, #369 . However, as this comment suggests, Coral needs a way to determine the engine the UDF was implemented in (i.e Hive UDF or Spark UDF) after the introduction of Spark native UDF support in Coral --- this may necessitate a wider discussion. |
@KevinGe00 I see you mentioned #369 , However, doesn't the case presented by this PR serve a counterexample to that statement? |
What changes are proposed in this pull request, and why are they necessary?
"Passthrough" UDFs are functions that take in a column as input, perform certain edits on the input fields (such as setting a field to NULL), but ultimately returns a column of the same schema and nullabilities as the input column. Prior to this PR, Coral was not reserving the input column nullabillities of these (new and spark native) passthrough UDFs when inferring the schema. This is was because these UDFs were registered with the return type
ARG1
instead ofOrdinalReturnTypeInferenceV2(1)
. While these two types are semantically the same, both representing a type inference strategy whereby the result type of a call is the type of the operand 1 (0-based),coral-schema
is only set up to handleOrdinalReturnTypeInferenceV2
and notARG1
, see here.How was this patch tested?