-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default comparison for data classes #710
Comments
It's not clear to me how (or even whether) these issues will affect nominal data classes, so my comments will focus primarily on structural data classes. I think many (most?) users' intuitive mental model for structs will be that they are an unordered collection of named fields. I think that's already the case in C++ to a large extent, and #561 further encourages this mental model by enabling users to disregard field order when reading or writing initializers. I think that mental model is not only unavoidable, it's useful, and it makes structs a very natural complement for tuple types. I think we should lean into that model, and try to avoid attaching API-visible meaning to the declaration order of struct fields. Regarding question 1, this model implies that a struct type should be equality-comparable with any struct type that has the same field names and types (in other words, any struct type that it can be initialized from), and it should evaluate the field comparisons in an unspecified order. I'm honestly having a hard time coming up with any drawbacks to supporting heterogeneous equality comparisons that wouldn't apply just as much to heterogeneous initialization (which #561 proposes to support). One possible concern, particularly for the homogeneous case, is that we'll presumably want the However, I don't think that carries much weight: the cases where that degree of control is useful will likely be very rare, and when they occur, the user would probably be better off using a nominal class type instead, where they can achieve much more precise control over performance by hand-implementing the Another possible concern is that if the field comparisons have side effects, then changes in the (unspecified) order of evaluation could change the behavior of the program unexpectedly, in ways that could be difficult to understand or diagnose. However, this risk seems quite remote, and to the extent we nonetheless want to address it, we could do so either by pseudo-randomizing the order so that such problems are caught quickly, or by defining a fixed order that does not depend on the declaration order (such as lexicographic by field names). Regarding question 2, the fields-are-unordered model implies that it is unnecessary, if not outright harmful, to provide ordering comparison operators for structs: it's not meaningful to ask whether one collection of unordered field values is "less than" another, so people are unlikely to need an operator that asks that question in the first place, and are likely to be confused by any usages of that operator that they come across. On the other hand, it is meaningful and useful for a struct to provide some total order that is unspecified but fixed (at least for the life of the program), so that's what we should do instead. That will enable users to do things like perform binary search on struct values without risking the confusion that would come from naming the comparison Note that this issue goes beyond data classes: |
I think, for me, the primary question here is: is the field order a salient property of a struct value? Do we consider two structs that have the same field names and the same field values, but in a different order, to be importantly different (as they would be if they had different field names or different values) or only incidentally different (eg, like the difference between a If the field order is salient, then I think comparisons between structs with different field order should require an explicit conversion to a common type, whether they're equality or relational comparisons. If field order is non-salient, then I would expect equality comparison to work regardless of field order, and relational comparison to not work regardless of field order (even if the field order is the same). For relational comparisons, if field order is non-salient, we should get the same result regardless of field order, so we'd need to make an arbitrary but consistent choice (such as comparing the fields in alphabetical order by name), but any such choice seems like it would be surprising. I don't think this option would or should imply that My current leaning is very slightly that field order is salient in this sense, because I think that it's better to avoid non-salient but representable differences, and that this option is more reminiscent of C++ structs. However, I think the alternative also has merit, and in the alternative I do like @geoffromer's idea of producing an unspecified-but-fixed total order for structs with a name that is not |
"Representable" in what sense? |
My general position is that we should the allow operations we can since they will be convenient/useful for users and not result in more bugs in user code. On question 1: I think a useful property is that |
I think what I mean is "observable by a program". |
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the |
FWIW, I like the idea of "is field order salient" from @zygoloid -- and FWIW, I feel that the order is salient here (and with tuples). |
From a discussion with @zygoloid I feel like we at least have a tentative direction here. Fundamentally, we're both pretty happy with the idea that field order is salient as @zygoloid outlined in his comment -- it is an observable aspect of the program. This isn't really a narrow thing about data classes, but a general principle. The declaration order of fields in a class represents their order in memory, similar for data classes, for tuples, etc. I think we really did try to take to heart the idea of having more "set" semantics as @geoffromer outlined, but I think we're both a bit more comfortable being transparent about the underlying implementation here and how things would be laid out in memory if they are in fact laid out in memory. However, there are common operations that can be especially efficiently defined heterogeneously for data classes:
I think I have reproduced the direction correctly, but if I've missed anything let me know. Also want to check with @KateGregory to see if all of us are actually happy with this direction despite the tradeoff it makes compared to what @geoffromer suggests above. |
After some discussions in the weekly meeting and follow-up on Discord, some tweaks to the above:
Maybe this converges us on a good initial decision? @KateGregory @zygoloid |
From chat, looks like all the leads are happy here. Closing this up and will put it in the queue that needs a proposal. |
Just to clarify, I'd like to write down a bunch of examples so we are clear on which are allowed. AssignmentWe support both reordering fields and implicit conversions during assignment and initialization of aggregates. So this is allowed:
Fields are assigned in the order defined by the left-hand side. So in this example: array element 0 before element 1, and within an array element Equality comparisonFor equality (and inequality) comparison of aggregates, we support both reordering fields and fields with different types as long as comparison is defined between those types. So this is allowed:
Fields are compared in the order defined by the left-hand side. So in this example: array element 0 before element 1, and within an array element Ordering comparisonOrdering comparison (
but this is not:
Data classes are compared in lexicographical order, with the first field being most significant. In the first example, Argument passing(I'm least certain of this section, but this is my reading of what is written.) Implicit conversion between aggregates is allowed to reorder fields and perform implicit conversions on field types, just like assignment. So this is allowed:
Fields are assigned according to the order in the type of the parameter from the function signature, just like an assignment of the argument value to the parameter. |
All of those examples match what I expected. @zygoloid ? |
Define initialization, assignment, comparison, and implicit conversion between data classes with different field orders and/or types. Implements the decision made in #710 . Co-authored-by: Richard Smith <[email protected]>
Add concrete design for interfaces for comparison. Rename interfaces for arithmetic following current thinking in #1058. Update rules for mixed-type comparisons for data classes following #710. Co-authored-by: Chandler Carruth <[email protected]>
Define initialization, assignment, comparison, and implicit conversion between data classes with different field orders and/or types. Implements the decision made in #710 . Co-authored-by: Richard Smith <[email protected]>
Add concrete design for interfaces for comparison. Rename interfaces for arithmetic following current thinking in #1058. Update rules for mixed-type comparisons for data classes following #710. Co-authored-by: Chandler Carruth <[email protected]>
I think #1178 maybe addressed the need for a proposal here? |
I think we agree that data classes (including struct types) should support equality comparisons if all of its field types do. This comparison should at least support values with the same type, including field order.
Question 1: Should this also be supported if the fields are the same, but in a different order?
Question 2: Should data classes support ordering comparisons (
<
,<=
,>
,>=
) using lexicographical order? Clearly the field orders would have to match in this case.Context: proposals #561 on basic classes and #702 on comparisons, and Discord discussion starting with https://discord.com/channels/655572317891461132/708431657849585705/872603493242908772
The text was updated successfully, but these errors were encountered: