-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow for decimals to be used as input types for UDFs #3217
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agavra , I think I'm missing some understanding of our UDF code to fully understand these changes so I defer to Rohan's approval. Some questions inline.
@@ -316,7 +318,6 @@ boolean accepts(final Schema argument, final Map<Schema, Schema> reservedGeneric | |||
return Objects.equals(type, argument.type()) | |||
&& CUSTOM_SCHEMA_EQ.getOrDefault(type, (a, b) -> true).test(schema, argument) | |||
&& Objects.equals(schema.version(), argument.version()) | |||
&& Objects.equals(schema.parameters(), argument.parameters()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it only the decimal type that may contain parameters in their schemas? (Trying to understand whether removing this portion of the check is safe.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the moment yes - it's the only one that we use that has parameters. I think in general it makes sense to defer the parameter to check to the CUSTOM_SCHEMA_EQ
instead of checking that they are the same.
@@ -126,7 +126,7 @@ protected String visitBytes(final Schema schema) { | |||
+ DecimalUtil.scale(schema) + ")"; | |||
} | |||
|
|||
throw new KsqlException("Cannot format bytes type: " + schema); | |||
return "BYTES"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still the case that the only bytes type we currently support is Decimals, right? I'm having trouble understanding why this change is necessary (besides the pretty error message in UdfIndexTest#shouldNotFindArbitraryBytesTypes`).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the pretty error message - if you can't find a UDF we shouldn't throw "Cannot format bytes type". The other BYTES type that we support is also the way we represent Generics.
@@ -93,7 +99,9 @@ static Schema getSchemaFromType(final Type type, final String name, final String | |||
schema = GenericsUtil.generic(((TypeVariable) type).getName()); | |||
} else { | |||
schema = typeToSchema.getOrDefault(type, () -> handleParametrizedType(type)).get(); | |||
schema.name(name); | |||
if (schema.name() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary? (Why would the schema already have a name at this point?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is something that I feel like we should change in the long run, but decimals are defined by their schema name (they all have org.apache.kafka.connect.data.Decimal
as their schema name). we should not be using the schema name to define the parameter name, but that's out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my bad. I even followed the Decimal schema builder code through but somehow missed it was setting the name. Thanks for the explanations!
Description
This change makes it so that UDFs accept decimal types, but they still cannot return decimal types. To make this change, we treat all decimals as the same schema within the UDF context (which is OK because they all map to
BigDecimal
)Testing done
floor
and movedfloor
to be a@Udf
instead ofKudf
)Reviewer checklist