Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow for decimals to be used as input types for UDFs #3217

Merged
merged 1 commit into from
Aug 15, 2019

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Aug 15, 2019

Description

This change makes it so that UDFs accept decimal types, but they still cannot return decimal types. To make this change, we treat all decimals as the same schema within the UDF context (which is OK because they all map to BigDecimal)

Testing done

  • Unit Tests
  • QTT tests (for floor and moved floor to be a @Udf instead of Kudf)

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@agavra agavra requested a review from vpapavas August 15, 2019 00:44
@agavra agavra requested a review from a team as a code owner August 15, 2019 00:44
Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agavra , I think I'm missing some understanding of our UDF code to fully understand these changes so I defer to Rohan's approval. Some questions inline.

@@ -316,7 +318,6 @@ boolean accepts(final Schema argument, final Map<Schema, Schema> reservedGeneric
return Objects.equals(type, argument.type())
&& CUSTOM_SCHEMA_EQ.getOrDefault(type, (a, b) -> true).test(schema, argument)
&& Objects.equals(schema.version(), argument.version())
&& Objects.equals(schema.parameters(), argument.parameters())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only the decimal type that may contain parameters in their schemas? (Trying to understand whether removing this portion of the check is safe.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the moment yes - it's the only one that we use that has parameters. I think in general it makes sense to defer the parameter to check to the CUSTOM_SCHEMA_EQ instead of checking that they are the same.

@@ -126,7 +126,7 @@ protected String visitBytes(final Schema schema) {
+ DecimalUtil.scale(schema) + ")";
}

throw new KsqlException("Cannot format bytes type: " + schema);
return "BYTES";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still the case that the only bytes type we currently support is Decimals, right? I'm having trouble understanding why this change is necessary (besides the pretty error message in UdfIndexTest#shouldNotFindArbitraryBytesTypes`).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the pretty error message - if you can't find a UDF we shouldn't throw "Cannot format bytes type". The other BYTES type that we support is also the way we represent Generics.

@@ -93,7 +99,9 @@ static Schema getSchemaFromType(final Type type, final String name, final String
schema = GenericsUtil.generic(((TypeVariable) type).getName());
} else {
schema = typeToSchema.getOrDefault(type, () -> handleParametrizedType(type)).get();
schema.name(name);
if (schema.name() == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary? (Why would the schema already have a name at this point?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something that I feel like we should change in the long run, but decimals are defined by their schema name (they all have org.apache.kafka.connect.data.Decimal as their schema name). we should not be using the schema name to define the parameter name, but that's out of scope

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad. I even followed the Decimal schema builder code through but somehow missed it was setting the name. Thanks for the explanations!

@agavra agavra merged commit 4a2e4b9 into confluentinc:master Aug 15, 2019
@agavra agavra deleted the decimal_udf_in branch August 15, 2019 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants