-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Remove unnecessary arg conversion for UDFs #3595
Conversation
9c19120
to
fd41ed1
Compare
Do we have any tests covering the end to end coercion mechanism that you can point me to? I'd be much more comfortable approving this with those tests in mind then just trying to scan the code. (And if we don't, then we should add tests for that flow) |
End to end arg coercion tests might be tricky, as it seems there is no arg coercion going on. |
Hummm... love PRs that remove unnecessary code! I'm trying to think how we might end up invoking a UDF with something other than what it expects. KSQL is, in general, strongly typed. UDFs are matched on their signature, and looking at the why that matching works, it doesn't seem to do any implicit conversion of types. (I'm looking at So, assuming exact matching against the types, a UDF would only be passed the wrong type if something was happening up stream that wasn't checking types were as expected, e.g. a UDF that says its returning a In the future, I know we want/plan to support So... the conclusion I would draw is that this coercion is indeed unnecessary and can be removed. However, it might just be worth trying to write a bad-actor UDF to see how this is handled/detected/reported with the old and new code. As I said, you may find there is already code to handle this, and tests that test it, or maybe there isn't. |
I tried this and the return type matching the schema type is checked in the code so it's not possible to craft such a udf:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caused by: io.confluent.ksql.util.KsqlException: Return type MAP<STRING, BIGINT> of UDF TEST_UDF does not match the declared return type MAP<STRING, STRING>.
at io.confluent.ksql.function.KsqlFunction.checkMatchingReturnTypes(KsqlFunction.java:181)
at io.confluent.ksql.function.KsqlFunction.getReturnType(KsqlFunction.java:155)
I think that's not an error that is thrown when invoking the UDF, that's an error thrown when building the topology. Which suggests your test isn't actually testing what we want.
I've knocked up a quick test on master with:
@UdfDescription(name = "badudf", description = "UDF that returns a type that does not match the advertised schema")
public class BadUdf {
@Udf(schema = "ARRAY<VARCHAR>")
public List<Integer> apply(final String value) {
return ImmutableList.of(1);
}
}
Note how it says its returning ARRAY<VARCHAR>
but actually returns ARRAY<INT>
.
And:
@UdfDescription(name = "GoodUdf", description = "Returns first element")
public class GoodUdf {
@Udf
public String apply(final List<String> value) {
return value.isEmpty() ? "" : value.get(0);
}
}
Then added a QTT test:
{
"name": "Create a struct from a string",
"statements": [
"CREATE STREAM test (value STRING) WITH (kafka_topic='test_topic', value_format='JSON');",
"CREATE STREAM OUTPUT AS SELECT GoodUdf(BadUdf(value)) AS value FROM test;"
],
"inputs": [
{"topic": "test_topic", "key": 1, "value": {"value": "a"}, "timestamp": 0}
],
"outputs": [
{"topic": "OUTPUT", "key": 1, "value": {"VALUE": "a"}, "timestamp": 0}
]
}
When this is run on master the GoodUdf
is invoked with a list containing the integer 1
, which results in a class cast exception.
If we remove the GoodUdf
from the test case then the test fails when trying to serialize the row, as it doesn't match the expected schema.
Conclusion... a bad-actor UDF would not have worked previously, so there's not change in behaviour with this PR.
FYI, while investigating I found this issue: #3620
Yep, that was the point. I tried to write a test with a UDF where the return type didn't match the schema provider return but it didn't let me do that. |
Confused... see my example BadUdf. |
tbh, people are going to be able to write bad UDFs that do naughty things anyway, e.g. its easy to cast a
Moreover, generics are erased at compile time, so at invocation time the engine can't distinguish between a |
Description
Previously arg coercion for UDFs was attempting to coerce arguments passed to UDF into the types expected by the method signature, including doing such things as casting one numeric type to another or converting strings to other types.
However in reality the caller of the UDF (the expression evaluation) will never pass in invalid types as they are verified by the type parser and the results of any expression evaluation are already coerced to the right type before calling the UDF.
The only useful thing the arg coercion was doing is converting args for a varargs method into an array of the appropriate type - this code has been moved to the DynamicUdfInvoker.
Testing done
Amended unit tests as appropriate.
Reviewer checklist