Default comparison for data classes #710

josh11b · 2021-08-05T18:41:39Z

I think we agree that data classes (including struct types) should support equality comparisons if all of its field types do. This comparison should at least support values with the same type, including field order.

var s: {.x: i32, .y: i32} = {.x = 3, .y = 4};
Assert(s == {.x = 3, .y = 4});

Question 1: Should this also be supported if the fields are the same, but in a different order?

Assert(s == {.y = 4, .x = 3});

Question 2: Should data classes support ordering comparisons (<, <=, >, >=) using lexicographical order? Clearly the field orders would have to match in this case.

Assert(s < {.x = 5, .y = 2});

Context: proposals #561 on basic classes and #702 on comparisons, and Discord discussion starting with https://discord.com/channels/655572317891461132/708431657849585705/872603493242908772

The text was updated successfully, but these errors were encountered:

geoffromer · 2021-08-05T22:41:15Z

It's not clear to me how (or even whether) these issues will affect nominal data classes, so my comments will focus primarily on structural data classes.

I think many (most?) users' intuitive mental model for structs will be that they are an unordered collection of named fields. I think that's already the case in C++ to a large extent, and #561 further encourages this mental model by enabling users to disregard field order when reading or writing initializers. I think that mental model is not only unavoidable, it's useful, and it makes structs a very natural complement for tuple types. I think we should lean into that model, and try to avoid attaching API-visible meaning to the declaration order of struct fields.

Regarding question 1, this model implies that a struct type should be equality-comparable with any struct type that has the same field names and types (in other words, any struct type that it can be initialized from), and it should evaluate the field comparisons in an unspecified order. I'm honestly having a hard time coming up with any drawbacks to supporting heterogeneous equality comparisons that wouldn't apply just as much to heterogeneous initialization (which #561 proposes to support).

One possible concern, particularly for the homogeneous case, is that we'll presumably want the == operator to short-circuit: as it iterates through the field pairs, it should return false as soon as it finds an unequal pair, rather than perform further comparisons that cannot affect the final result. Since some field comparisons may be more expensive than others, it may sometimes be possible to optimize the average-case performance of the comparison by changing the order in which fields are compared. If we guarantee to perform the comparisons in field declaration order, then users can use the declaration order to control the comparison order; if the comparison order is unspecified, users have no such control.

However, I don't think that carries much weight: the cases where that degree of control is useful will likely be very rare, and when they occur, the user would probably be better off using a nominal class type instead, where they can achieve much more precise control over performance by hand-implementing the == operator.

Another possible concern is that if the field comparisons have side effects, then changes in the (unspecified) order of evaluation could change the behavior of the program unexpectedly, in ways that could be difficult to understand or diagnose. However, this risk seems quite remote, and to the extent we nonetheless want to address it, we could do so either by pseudo-randomizing the order so that such problems are caught quickly, or by defining a fixed order that does not depend on the declaration order (such as lexicographic by field names).

Regarding question 2, the fields-are-unordered model implies that it is unnecessary, if not outright harmful, to provide ordering comparison operators for structs: it's not meaningful to ask whether one collection of unordered field values is "less than" another, so people are unlikely to need an operator that asks that question in the first place, and are likely to be confused by any usages of that operator that they come across. On the other hand, it is meaningful and useful for a struct to provide some total order that is unspecified but fixed (at least for the life of the program), so that's what we should do instead. That will enable users to do things like perform binary search on struct values without risking the confusion that would come from naming the comparison <.

Note that this issue goes beyond data classes: choice types and most other sum types likewise have no natural concept of "less than", but can and should define a fixed but unspecified total order under a different name. So I think we will eventually need a "default total order" API whether or not we support comparison operators for data classes, so I don't think we're incurring much language-design cost if we plan on taking this route for data classes as well. Note that I think the design of that API is out of scope for #561 and #702, because I expect it to be tied to the design for user-defined comparison operators.

zygoloid · 2021-08-05T23:32:17Z

I think, for me, the primary question here is: is the field order a salient property of a struct value? Do we consider two structs that have the same field names and the same field values, but in a different order, to be importantly different (as they would be if they had different field names or different values) or only incidentally different (eg, like the difference between a vector<int> containing {1, 2, 3} whose capacity is 3 and a vector<int> containing {1, 2, 3} whose capacity is 4)?

If the field order is salient, then I think comparisons between structs with different field order should require an explicit conversion to a common type, whether they're equality or relational comparisons. {.a = 1, .b = 2} and {.b = 2, .a = 1} should not compare equal (because they are different values), and should not compare unequal (because that would be surprising), so having them not be comparable at all seems best.

If field order is non-salient, then I would expect equality comparison to work regardless of field order, and relational comparison to not work regardless of field order (even if the field order is the same). For relational comparisons, if field order is non-salient, we should get the same result regardless of field order, so we'd need to make an arbitrary but consistent choice (such as comparing the fields in alphabetical order by name), but any such choice seems like it would be surprising. I don't think this option would or should imply that {a: Int, b: Int} and {b: Int, a: Int} are the same type, but rather that they are two different types that can represent the same set of values (like two different string implementations).

My current leaning is very slightly that field order is salient in this sense, because I think that it's better to avoid non-salient but representable differences, and that this option is more reminiscent of C++ structs. However, I think the alternative also has merit, and in the alternative I do like @geoffromer's idea of producing an unspecified-but-fixed total order for structs with a name that is not <. (There are some considerations from predictable performance goal here: we might not want the comparison to be entirely unspecified. But I think we can find a reasonable approach.)

geoffromer · 2021-08-05T23:57:13Z

My current leaning is very slightly that field order is salient in this sense, because I think that it's better to avoid non-salient but representable differences, and that this option is more reminiscent of C++ structs.

"Representable" in what sense?

josh11b · 2021-08-09T22:52:39Z

My general position is that we should the allow operations we can since they will be convenient/useful for users and not result in more bugs in user code.

On question 1: I think a useful property is that a = b results in a == b, even when a and b have different field orders.

zygoloid · 2021-08-28T00:05:04Z

My current leaning is very slightly that field order is salient in this sense, because I think that it's better to avoid non-salient but representable differences, and that this option is more reminiscent of C++ structs.

"Representable" in what sense?

I think what I mean is "observable by a program".

github-actions · 2021-11-26T01:48:24Z

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive label. The long term label can also be added for issues which are expected to take time.
This issue is labeled inactive because the last activity was over 90 days ago.

chandlerc · 2021-11-29T22:18:16Z

FWIW, I like the idea of "is field order salient" from @zygoloid -- and FWIW, I feel that the order is salient here (and with tuples).

chandlerc · 2021-12-01T12:08:57Z

From a discussion with @zygoloid I feel like we at least have a tentative direction here.

Fundamentally, we're both pretty happy with the idea that field order is salient as @zygoloid outlined in his comment -- it is an observable aspect of the program. This isn't really a narrow thing about data classes, but a general principle. The declaration order of fields in a class represents their order in memory, similar for data classes, for tuples, etc.

I think we really did try to take to heart the idea of having more "set" semantics as @geoffromer outlined, but I think we're both a bit more comfortable being transparent about the underlying implementation here and how things would be laid out in memory if they are in fact laid out in memory.

However, there are common operations that can be especially efficiently defined heterogeneously for data classes:

converting assignment (and initialization) should work between types with a set of fields with the same names and convertible types.
data classes with a set of fields with the same name and comparable types should also be comparable for both equality and less-than.
- However, only equality should allow different ordered fields. For less-than (and other relational comparisons), the order of field names (with comparable types) must match so we can pick a consistent and unsurprising order of comparison (even if not especially meaningful). The order we suggest is declaration order.
explicit cast between data classes with a set of fields with the same names and convertible types, but no implicit cast.
- No implicit cast even if the order matches and the fields are implicit convertible.
  - We would expect tuples and data classes and fixed size arrays to all work similarly. But implicitly converting 10,000 x i8 to 100 x i64 seems likely to be surprising at best, and more likely to indicate a missing overload or other mistake, even though the fields are "in order" and implicitly convertible.
- Maybe in the future, we allow implicit destructuring in some cases, but initially start w/ simple and restrictive model

I think I have reproduced the direction correctly, but if I've missed anything let me know. Also want to check with @KateGregory to see if all of us are actually happy with this direction despite the tradeoff it makes compared to what @geoffromer suggests above.

chandlerc · 2021-12-03T04:32:04Z

After some discussions in the weekly meeting and follow-up on Discord, some tweaks to the above:

Let's allow implicit conversions between all of these style of aggregate types (tuples, arrays, data classes) when the members implicitly convert
- Super important for some cases: (1, 1) converting to (i8, i8).
- To start, keep it very simple and consistent across all of these. Can watch for surprising cases that we need to address w/ more nuanced rule.
- Include different field orders for data classes.
Need to pin down which order is used when doing heterogenous comparison with data classes that have different order of fields
- Consistently use LHS order.
- Match the order used in assignment.

Maybe this converges us on a good initial decision? @KateGregory @zygoloid

chandlerc · 2021-12-07T21:33:19Z

From chat, looks like all the leads are happy here. Closing this up and will put it in the queue that needs a proposal.

josh11b · 2021-12-09T21:08:59Z

Just to clarify, I'd like to write down a bunch of examples so we are clear on which are allowed.

Assignment

We support both reordering fields and implicit conversions during assignment and initialization of aggregates. So this is allowed:

var s: Array({.x: i32, .y: i32}, 2) = ...;
var t: Array({.y: i64, .x: i64}, 2) = s;

Fields are assigned in the order defined by the left-hand side. So in this example: array element 0 before element 1, and within an array element .y before .x.

Equality comparison

For equality (and inequality) comparison of aggregates, we support both reordering fields and fields with different types as long as comparison is defined between those types. So this is allowed:

var s: Array({.x: i32, .y: i32}, 2) = ...;
var t: Array({.y: u32, .x: u32}, 2) = ...;
Assert(s == t);

Fields are compared in the order defined by the left-hand side. So in this example: array element 0 before element 1, and within an array element .y before .x.

Ordering comparison

Ordering comparison (<, <=, etc.) is allowed between aggregates only when the field order is consistent. The types may be different as long as ordering comparison is defined between those types. So this is allowed:

var s: Array({.x: i32, .y: i32}, 2) = ({.x = 1 .y = 2}, {.x = 3, .y = 4});
var t: Array({.x: u32, .y: u32}, 2) = ({.x = 4, .y = 3}, {.x = 2, .y = 1});
Assert(s < t);

but this is not:

var s: {.x: i32, .y: i32} = {.x = 3, .y = 4};
var t: {.y: u32, .x: u32} = {.y = 3, .x = 4};
Assert(s < t);

Data classes are compared in lexicographical order, with the first field being most significant. In the first example, [0].x is compared before [0].y before [1].x before [1].y.

Argument passing

(I'm least certain of this section, but this is my reading of what is written.)

Implicit conversion between aggregates is allowed to reorder fields and perform implicit conversions on field types, just like assignment. So this is allowed:

fn F(t: Array({.y: i64, .x: i64}, 2));
var s: Array({.x: i32, .y: i32}, 2) = ...;
F(s);

Fields are assigned according to the order in the type of the parameter from the function signature, just like an assignment of the argument value to the parameter.

chandlerc · 2021-12-09T21:19:01Z

All of those examples match what I expected. @zygoloid ?

Define initialization, assignment, comparison, and implicit conversion between data classes with different field orders and/or types. Implements the decision made in #710 . Co-authored-by: Richard Smith <[email protected]>

Add concrete design for interfaces for comparison. Rename interfaces for arithmetic following current thinking in #1058. Update rules for mixed-type comparisons for data classes following #710. Co-authored-by: Chandler Carruth <[email protected]>

Define initialization, assignment, comparison, and implicit conversion between data classes with different field orders and/or types. Implements the decision made in #710 . Co-authored-by: Richard Smith <[email protected]>

Add concrete design for interfaces for comparison. Rename interfaces for arithmetic following current thinking in #1058. Update rules for mixed-type comparisons for data classes following #710. Co-authored-by: Chandler Carruth <[email protected]>

jonmeow · 2022-08-11T22:26:29Z

I think #1178 maybe addressed the need for a proposal here?

chandlerc · 2022-08-12T01:32:47Z

I think #1178 maybe addressed the need for a proposal here?

@zygoloid or @josh11b to confirm.

josh11b · 2022-08-12T01:49:03Z

Confirmed, #981 and #1178 addressed this.

josh11b mentioned this issue Aug 5, 2021

Comparison operators #702

Merged

github-actions bot added the inactive Issues and PRs which have been inactive for at least 90 days. label Nov 26, 2021

chandlerc removed the inactive Issues and PRs which have been inactive for at least 90 days. label Nov 29, 2021

chandlerc closed this as completed Dec 7, 2021

josh11b mentioned this issue Dec 10, 2021

Implicit conversions for aggregates #981

Merged

zygoloid mentioned this issue Apr 16, 2022

Rework operator interfaces #1178

Merged

jonmeow added the leads question A question for the leads team label Aug 10, 2022

josh11b mentioned this issue Aug 15, 2022

Create a tuples proposal #2038

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default comparison for data classes #710

Default comparison for data classes #710

josh11b commented Aug 5, 2021 •

edited

Loading

geoffromer commented Aug 5, 2021

zygoloid commented Aug 5, 2021

geoffromer commented Aug 5, 2021

josh11b commented Aug 9, 2021

zygoloid commented Aug 28, 2021

github-actions bot commented Nov 26, 2021

chandlerc commented Nov 29, 2021

chandlerc commented Dec 1, 2021

chandlerc commented Dec 3, 2021

chandlerc commented Dec 7, 2021

josh11b commented Dec 9, 2021 •

edited

Loading

chandlerc commented Dec 9, 2021

jonmeow commented Aug 11, 2022

chandlerc commented Aug 12, 2022

josh11b commented Aug 12, 2022

Default comparison for data classes #710

Default comparison for data classes #710

Comments

josh11b commented Aug 5, 2021 • edited Loading

geoffromer commented Aug 5, 2021

zygoloid commented Aug 5, 2021

geoffromer commented Aug 5, 2021

josh11b commented Aug 9, 2021

zygoloid commented Aug 28, 2021

github-actions bot commented Nov 26, 2021

chandlerc commented Nov 29, 2021

chandlerc commented Dec 1, 2021

chandlerc commented Dec 3, 2021

chandlerc commented Dec 7, 2021

josh11b commented Dec 9, 2021 • edited Loading

Assignment

Equality comparison

Ordering comparison

Argument passing

chandlerc commented Dec 9, 2021

jonmeow commented Aug 11, 2022

chandlerc commented Aug 12, 2022

josh11b commented Aug 12, 2022

josh11b commented Aug 5, 2021 •

edited

Loading

josh11b commented Dec 9, 2021 •

edited

Loading