-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-14999: [C++] Don't check field name in ListType Equals() #13851
ARROW-14999: [C++] Don't check field name in ListType Equals() #13851
Conversation
cf1e327
to
770a7d1
Compare
if (value_field()->nullable()) { | ||
ss << 'n'; | ||
} else { | ||
ss << 'N'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullability of internal field is now part of the fingerprint.
I think this should be raised in the mailing list - this is a non-trivial change to the spec - afaik internal names are an integral part of the field and must be taken into account in equality. |
@jorgecarleitao I was thinking about doing that; thanks for the nudge. To be clear, this PR makes checking field names optional, and keeps the check on by default in code paths where strict equality (where we also check field metadata) are already on. So I don't think this breaks the spec, but happy to discuss more on the mailing list. https://lists.apache.org/thread/p6y48qznd61zxc78g3930h4nddz7oo4z |
8c709b6
to
ba5ea71
Compare
b80a544
to
6f28382
Compare
6f28382
to
25568ae
Compare
@pitrou @paleolimbot Would you be willing to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few notes! You should take anything I say about C++ with a grain of salt, but I do wonder if the signature Equals(DataType|Field|Schema other, TypeEqualsOptions options)
might be more appropriate than accumulating arguments? I can see how in some future somebody might not care about nullability either and that approach would help avoid even more arguments.
@@ -16,6 +16,7 @@ | |||
# under the License. | |||
|
|||
import decimal | |||
from email.policy import strict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems out of place?
# TODO(ARROW-18204): metadata doesn't matter by default | ||
# other_metadata <- list_of(field("item", int32(), # nolint | ||
# metadata = list(hello="world"))) # nolint | ||
# expect_equal(x, other_metadata) # nolint | ||
# expect_false(x$Equals(other_metadata, check_metadata = TRUE)) # nolint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be more effective as a reprex()
in ARROW-18204 rather than commented-out code here.
# other_metadata <- map_of(int32(), # nolint | ||
# field("value", int32(), metadata = list(hello="world"))) # nolint | ||
# expect_equal(x, other_metadata) # nolint | ||
# expect_false(x$Equals(other_metadata, check_metadata = TRUE)) # nolint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I would put this in the Jira rather than commented out code here.
Do we really want to add a new option for this or can we just reuse |
I implemented it this way in #14847. I was skeptical at first, but it came out rather clean. I just added the field names to the metadata fingerprint. |
Closing in favor of #14847 |
BREAKING CHANGE
Two changes for "internal fields" (fields within ListTypes and MapTypes):
Examples