Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identical() for strings is not transitive #27681

Closed
Hixie opened this issue Oct 27, 2016 · 8 comments
Closed

identical() for strings is not transitive #27681

Hixie opened this issue Oct 27, 2016 · 8 comments
Labels
area-front-end Use area-front-end for front end / CFE / kernel format related issues. area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. front-end-kernel P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@Hixie
Copy link
Contributor

Hixie commented Oct 27, 2016

void main() {
  String ab = 'ab';
  print(identical('a' + 'b', 'ab')); // true, compile-time constant evaluation                                                                                                 
  print(identical('ab',       ab)); // true, compile-time constant de-duplication                                                                                              
  print(identical('a' + 'b',  ab)); // false??                                                                                                                                 
}
@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) labels Oct 28, 2016
@lrhn
Copy link
Member

lrhn commented Oct 28, 2016

I assume this is the VM.

The rules for canonicalization of strings are not trivial.
Basically, string expressions, even compile-time constant ones, are not required to be canonicalized.
However, if both argument expressions of an identical call are compile-time constants, the result is computed at compile-time based on the string contents, not the object identity.

The expression ab is not a compile-time constant expression.
The string literal 'ab' is canonicalized by the VM (it's allowed, but not required, to do that, but if it didn't, things would break everywhere, so we should probably fix that requirement into the spec).
The compile-time constant string expression 'a' + 'b' evaluates to a different string object with the same content.

So, the first test is true because the test is compile-time constant.
The second test is true because the ab variable refers to the same string object as the expression 'ab'.
The third test is false because ab variable refers to a different string object than the compile-time constant 'a' + 'b' evaluated to.

Or, to put it differently: It's not a problem with transitivity - it's the fact that "a"+"b" (which is a compile-time constant) isn't canonicalized unless it's used in a compile-time constant context.
You can get the same result from just:

var x = "a" + "b";
print(identical("a" + "b", x));  // false

Even if "a" + "b" is a compile-time constant expression, it's just not canonicalized, and nothing in the spec says that it has to be.

So, the behavior is allowed by the specification.
I'm marking this as a VM bug for now, so they can consider whether they want to change behavior or not.

I'll check if we have a bug open for changing the language to require canonicalization of compile-time constant string expressions. Otherwise we might want that.

Considering the closing message on #1918, it's probably an oversight that the specification doesn't actually require canonicalization.

@lrhn
Copy link
Member

lrhn commented Nov 1, 2016

There actually does seem to be a bug in the VM:

var r = [];
for (var i = 0; i < r; i++) r.add("a" + "b");
print(identical(r[0], r[1]));

This prints false, and it must print true. The expression "a" + "b" is a compile-time constant expression and the spec says:

A constant expression is an expression whose value can never change, and that can be evaluated entirely at compile time.

This example shows the value of the constant expression "a" + "b" changing.

Arguably, the spec saying:

The predefined Dart function identical() is defined such that identical(c1, c2) iff:
...
• c1 and c2 are constant strings and c1 == c2, OR

does require that the identical function cannot distinguish constant strings (the problem is that "constant string" isn't a formally defined term - being constant is a property of syntax and evaluation time, not of the resulting value).

@eernstg
Copy link
Member

eernstg commented Nov 2, 2016

I think this confirms that the vm ought to consistently detect constant expressions of type String and generate code to return the canonicalized result for every such expression every time, in stead of, say, returning a fresh string each time for "a" + "b" in that for loop. It does require a bottom-up propagated bit in the abstract syntax: for instance, myRuntimeString + "a" + "b" associates to the left, so there's no canonical string "ab" inside that expression, only the two canonical strings "a" and "b". The spec needs a clarification to state unambiguously that all constant expressions of type String must yield a canonicalized result, but I don't think it is reasonable to interpret it differently even with the current wording.

About the invocations of identical: Assuming the strict approach I just described, all three of them should return true. The first one is a constant expression; the second one is a dynamic invocation of identical with two arguments which are both a reference to the canonical string 'ab', and so is the third.

@mhausner
Copy link
Contributor

mhausner commented Nov 7, 2016

See also issue #25024.

Back when Dart was still Dash, I remember discussions that the language shall not require all expressions that can be compile-time constants to be canonicalized, due to the runt-time cost. You don't want to spend the time to canonicalize every "a"+"b" (xyz+"b", "a"+X.c, ...) for the rare case where it's needed.

@lrhn
Copy link
Member

lrhn commented Nov 8, 2016

If we don't canonicalize all compile-time constant string expressions, then we need to make it predictable for users when they are canonicalized and when they are not. Is the VM following a rule or is it more ad-hoc which compile-time string expressions are canonicalized and which are not?
I'm guessing that string literals are always canonicalized, and likely also string interpolations and juxtapositions. I guess that only leaves string + operations.

I wouldn't mind if string constant expression + string constant expression was also always canonicalized. It would at least be a simple rule to explain. On the other hand, I expect the Kernel to do that for you, so it might not be worth it to change the VM just now.

@mhausner
Copy link
Contributor

I suspect this will be obsolete with the new front-end, but it needs to be verified.

@mhausner mhausner removed their assignment Mar 20, 2017
@sjindel-google sjindel-google self-assigned this Oct 17, 2017
@jensjoha jensjoha added the P2 A bug or feature request we're likely to work on label Jan 11, 2018
@sjindel-google sjindel-google removed their assignment Jan 11, 2018
@kmillikin kmillikin added area-front-end Use area-front-end for front end / CFE / kernel format related issues. front-end-kernel and removed front-end-kernel area-kernel labels Sep 19, 2018
@liamappelbe
Copy link
Contributor

I tried to verify that this was fixed, and write a regression test. It seems that the behavior has changed since this bug was opened, but 'a' + 'b' is not being canonicalized:

void main() {
  String ab = 'ab';
  print(identical('a' + 'b', 'ab'));  // false
  print(identical('ab',       ab));   // true
  print(identical('a' + 'b',  ab));   // false

  var x = "a" + "b";
  print(identical("a" + "b", x));     // false

  var r = [];
  for (var i = 0; i < 10; i++) r.add("a" + "b");
  print(identical(r[0], r[1]));       // false
}

Note that the first print has switched from true to false. Is this the intended behavior now?

@eernstg
Copy link
Member

eernstg commented May 4, 2022

We don't specify that every constant expression must be evaluated at compile time, only that an expression must satisfy a very strict set of rules in order to be a constant expression, and a compiler is then allowed to evaluate it at compile time. So it's OK for Dart compilers to postpone the evaluation of 'a' + 'b' to run time.

With specific expressions we have a more strict rule: A <constObjectExpression> must be canonicalized (which is currently only possible when it is also evaluated at compile-time). That's a constant expression whose top-level construct invokes a constant constructor, e.g., const C('a' + 'b'). In that case, and with the following declaration of C, the instance of C must be the same as the instance yielded by evaluating const C('ab'), because canonicalization only relies on the type of object (C) and the values of its instance variables:

class C {
  final String s;
  const C(this.s);
}

void main() => print(identical(const C('a' + 'b'), const C('ab'))); // 'true'.

I'll close this issue because the transitivity discrepancy has been resolved, and because the more general issue about canonicalization of strings is handled in dart-lang/language#985.

@eernstg eernstg closed this as completed May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-front-end Use area-front-end for front end / CFE / kernel format related issues. area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. front-end-kernel P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

8 participants