-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: custom comparators for array, map and struct #3385
Conversation
@confluentinc It looks like @vpapavas just signed our Contributor License Agreement. 👍 Always at your service, clabot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
final Field fieldA = fieldsA.get(i); | ||
final Field fieldB = fieldsB.get(i); | ||
if (!fieldA.name().equals(fieldB.name()) | ||
|| fieldA.index() != fieldB.index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the order of fields matters 🤔 - it might be better to go through structA and get by name in structB. We can leave that for a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I checked is that in Connect, Field equals compares the index of two fields, hence order matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vpapavas
When reviewing this PR it would have been useful to have some context on why you're making the change. Is there some bug you're trying to fix? Or some future functionality this is needed for?
Your changes LGTM on the surface and without knowing the context...
} | ||
|
||
private static boolean structEquals(final Schema structA, final Schema structB) { | ||
return structA.fields().isEmpty() | ||
|| structB.fields().isEmpty() | ||
|| Objects.equals(structA.fields(), structB.fields()); | ||
|| compareFieldsOfStructs(structA, structB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not particularly pertinent to your change, but do you know why a two structs are considered equal if one side has no fields?
Maybe this has something to do with the semantics being "are they compatible" and not "are they equal"...?
As above, if that's the case I think we should rename these methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question. I don't know but I will discuss with Almog today and try to find out why.
@@ -348,24 +349,57 @@ public static boolean areCompatible(final Schema arg1, final Schema arg2) { | |||
} | |||
|
|||
private static boolean mapEquals(final Schema mapA, final Schema mapB) { | |||
return Objects.equals(mapA.keySchema(), mapB.keySchema()) | |||
&& Objects.equals(mapA.valueSchema(), mapB.valueSchema()); | |||
return areCompatible(mapA.keySchema(), mapB.keySchema()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the change in semantics from equals
to compatible
I wonder if we should change the name of these methods? eg. mapCompatible
BTW, do you think this might be related to: #2376??? |
Thank you for your comments @big-andy-coates ! You are right that there was no context, sorry about that. I edited the description. I don't think it is related to #2376 as the code I changed is only called by UDF/UDAF code ( |
b5982d2
to
4ee2b59
Compare
Description
Do deep custom equality for array, map or struct. I.e., use our custom comparator for the nested fields in array, map and struct.
If container types (array, map, struct) contain for instance decimal schema, the UdfIndex code that tries to find the correct udf/udaf per parameter type fails as it checks whether the runtime parameter and signature parameter are equal. I happened upon this then when trying to support decimals for the UDAF sum_list.
I changed the code to check whether schemas are deep compatible.
Testing done
Added extra tests in SchemaUtilTest
Reviewer checklist