Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterogeneous equality #54

Closed
JimLarson opened this issue Feb 28, 2019 · 16 comments · Fixed by #232
Closed

Heterogeneous equality #54

JimLarson opened this issue Feb 28, 2019 · 16 comments · Fixed by #232
Labels

Comments

@JimLarson
Copy link
Contributor

Since [1, "foo"] is a valid list and _==_ is defined on lists, [1, "foo"] == ["foo", 1] should evaluate to false. It would be least surprising if list equality was determined by elementwise equality. Equivalently, [x] == [y] should have the same meaning as x == y. Therefore, _==_ should have signature A x B --> bool.

Similarly, _!=_ should work on heterogeneous types.

Restricting equality to homogeneous types was meant to prevent user errors. After all, heterogeneously 1 == 1u evaluates surprisingly to false. However the type checker can work with a stricter A x A --> bool signature for equality, catching these errors.

We might want to make heterogeneous order operators (_<_ and friends) too. We'll want an ordering across all values for deterministic map comprehensions, so why not expose it to users? The surprising consequences (e.g. if int < uint, then 1u < 2) can again be mitigated by having the type checker work heterogeneously.

@Alfus
Copy link
Contributor

Alfus commented Mar 5, 2019

Isn't the type of any list just 'list'? Which means comparing two lists is still A x A -> bool (where A is list)?

@Alfus
Copy link
Contributor

Alfus commented Mar 5, 2019

[x] == [y] vs x == y is somewhat compelling

@JimLarson
Copy link
Contributor Author

Yes, you're correct, lists are unparameterized at runtime, so list equality typechecks correctly with homogeneous equality. This is entirely an issue of explaining list equality in terms of elementwise equality, and [x] == [y] is just an extreme case of that.

@Alfus
Copy link
Contributor

Alfus commented Mar 6, 2019

It looks like [x] == [y] has the same result as x == y today (awesome). So then my only real issue is that making 1 == 1u -> false today, even if it is an error at type check time, would be really hard to undo. So implementing this is implicitly also deciding the number comparison case as well. In which case it seems like that should be decided explicitly before making this change.

I am actually of the opinion CEL should match json, python, SQL, Firestore, c++, java, etc and allow full comparisons between different number types (while still not allowing implicit conversion or heterogeneous arithmetic).

@Alfus
Copy link
Contributor

Alfus commented Mar 6, 2019

BTW, a deterministic 'total ordering' is a land mine :-). It is much better to leave it undefined, and the 'iteration' order of a map should be not just undefined, but randomized between invocations. Also note that comparisons are pretty much always independent of any 'total ordering' provided because of NaN in floats. "hi" < 1 should always be undefined. You basically end up with 3 classes of operators:

  1. ==/!= - equality, usually works across all types in dynamic languages
  2. <,<=, =>,> - compare operators, only works on 'comparable' types
  3. sort - almost always ends up with strange rules because of NaN

basically, where these overlap, they must be consistent or produce 'undefined'.

You can see this distinction in pretty much every language (for unfortunately good reasons)

@Alfus
Copy link
Contributor

Alfus commented Mar 6, 2019

The axiom of choice is actually a great example of how total ordering is a fundamental issue in human math ;-)

@Alfus
Copy link
Contributor

Alfus commented Mar 7, 2019

Ahh, just remembered the other wrinkle, hash values and map keys.

@Alfus
Copy link
Contributor

Alfus commented Mar 7, 2019

FYI, python ignores float vs int for dict keys:

>>> {1: "hi"}
{1: 'hi'}
>>> {1: "hi", 1.0: "bye"}
{1: 'bye'}
>>> {1.0: "bye"}
{1.0: 'bye'}
>>> 

@Alfus
Copy link
Contributor

Alfus commented Mar 19, 2019

(also note that python 3 explicitly removed global ordering so ["hi] > [1] is now an error.)

@JimLarson
Copy link
Contributor Author

Let's suspend consideration of a trans-type total order, whether exposed as an operator or not. I don't think it's adding clarity to this discussion.

We've recently identified more language features that would be convenient to explain in terms of a heterogeneous equality: the in operator.

Restating for the record: because there's no way for a CEL program to depend on a subexpression being an error, it's safe to convert error behavior to non-error (in the sense that all programs that produce a value will produce the same value), but more problematic to convert one value to another. So if we make heterogeneous equality legal, we shouldn't later switch the behavior of 1 == 1.0.

CEL, as a basic design decision, has decided to avoid automatic conversion between numeric types and mixed-type arithmetic operations. Allowing 1 == 1.0 seems to go against this strategy and complicates equational reasoning: does it make sense to have a == b yet not allow a to be substituted where b is used? Yes, CEL equality would be different than languages which feature automatic numeric conversions, but we also differ in our arithmetic, and IMHO I'd rather have a consistent difference than a smaller difference without the consistency.

In discussions, Tristan raised the issue of JSON conversions: some JSON parsers convert fraction-less numbers to an integer type, and some CEL deployments might want to directly import these imported JSON results without intermediate conversion to a google.protobuf.Value (which would use doubles uniformly for numbers). Ideally, the direct and indirect import of JSON data would behave the same in CEL, suggesting that 1 == 1.0. However, as above you'd also want 1 + 2 == 1.0 + 2. We'd probably just want to advertise possible differences when using direct vs indirect conversion, and authors of CEL expressions could use explicit int() or double() wrappers to get uniform behavior.

@TristonianJones
Copy link
Collaborator

I'm working on a proposal for numeric type coercion and a heterogeneous null comparison which should fix almost all of the core usability issues outside of protobuf equality which is a larger and separate topic. I'll get a bit of language council review on it and then submit the proposal for external consumption here.

@TristonianJones
Copy link
Collaborator

Also affects issue #208

@jsannemo
Copy link
Contributor

In the conformance tests there's several heterogenous comparisons supported at runtime (albeit not at typecheck time):

  test {
    name: "eq_dyn_int_uint"
    expr: "dyn(1) == 1u"
    value: { bool_value: true }
  }

However, the spec states: "Equality and inequality are homogeneous; comparing values of different runtime types results in a runtime error. Thus 2 == 3 is false, but 2 == 2.0 is an error.".

What's the current take on this?

@jsannemo
Copy link
Contributor

And simultaneously the tests contain

  test {
    name: "eq_mixed_types_error"
    description: "A mix of types fails during type checks but can't be captured in the conformance tests yet (See google/cel-go#155). Also, if you disable checks it yields {bool_value: false} where it should also yield an error"
    expr: "1.0 == 1"
    disable_check: true  # need to make it fail in the evaluation phase
    eval_error {
      errors: { message: "no such overload" }
    }
  }

@TristonianJones
Copy link
Collaborator

@jsannemo Hi Johan, with the changes proposed and accepted in https://github.com/google/cel-spec/wiki/proposal-210, as well as a forthcoming approval for changes in protobuf equality at runtime, the stance is as follows:

  • Heterogeneous equality will be supported at runtime
  • Numeric types will have special provisions which permit their comparison as though on a continuous number line
  • Protobuf equality will be close to, but not exactly the same as golang's definition of proto equality, as protobuf.Any messages will be unpacked before comparison, and NaN values will not be comparable.

Most of these changes are in development, but not yet surfaced in the spec or the configuration options.

@jsannemo
Copy link
Contributor

Thanks for the pointer! I had missed the wiki totally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants