Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#627 broke RegExpBuiltinExec? #777

Closed
anba opened this issue Jan 23, 2017 · 5 comments
Closed

#627 broke RegExpBuiltinExec? #777

anba opened this issue Jan 23, 2017 · 5 comments

Comments

@anba
Copy link
Contributor

anba commented Jan 23, 2017

From https://bugzilla.mozilla.org/show_bug.cgi?id=1317397#c1:

The PR was accepted, but it seems to be erroneous and hence not implementable for us. Consider the following case:

var re = /(.)\1/gu;
re.lastIndex = {
 valueOf() {
   // Recompile without Unicode-flag.
   re.compile(re.source, "g");
   return 0;
 }
};
var r = re.exec("\u{D83D}\u{DCA3}\u{DCA3}");
print(r ? r[0] : null);

With the new semantics, it seems that the expected result value is null. And here's why:

In RegExpBuiltinExec, the [[OriginalFlags]] is retrieved and stored, followed by calling ToLength(Get(R, "lastIndex")). Side-effects in ToLength can recompile the RegExp object! When the [[RegExpMatcher]] tries to find a match at string index 0, a failure is returned, because the recompiled RegExp works on code units and the sequence <U+D83D, U+DCA3> doesn't match "(.)\1". AdvanceStringIndex is called to advance to the next string index, but AdvanceStringIndex uses fullUnicode=true, so it skips over the complete code point U+1F4A3 and returns 2. The matcher also won't be able to find a match at string index 2, and therefore RegExpBuiltinExec returns null.

The obvious band-aid fix is:

@@ -29385,7 +29385,11 @@ THH:mm:ss.sss
             1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*.
             1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*.
             1. If _global_ is *false* and _sticky_ is *false*, let _lastIndex_ be 0.
-            1. Else, let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
+            1. Else,
+              1. Let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
+              1. Let _flags_ be _R_.[[OriginalFlags]].
+              1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*.
+              1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*.
             1. Let _matcher_ be _R_.[[RegExpMatcher]].
             1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*.
             1. Let _matchSucceeded_ be *false*.

But I'm not sure that's the correct way to fix this bug.

(If anyone wonders why global and sticky are also reset, that's to avoid regressing #494, #489.)

@ljharb
Copy link
Member

ljharb commented Jan 23, 2017

@claudepache

@claudepache
Copy link
Contributor

In PR #627, I had incorrectly removed a “useless” read access to the lastIndex property for non-global-and-non-sticky regexps, not noticing that the apparently innocuous ToLength() operation may in fact produce a regexp recompilation!

The proper fix is probably to revert that part of the PR, i.e.:

             1. Assert: _R_ is an initialized RegExp instance.
             1. Assert: Type(_S_) is String.
             1. Let _length_ be the number of code units in _S_.
+            1. Let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
             1. Let _flags_ be _R_.[[OriginalFlags]].
             1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*.
             1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*.
             1. If _global_ is *false* and _sticky_ is *false*, let _lastIndex_ be 0.
-            1. Else, let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
             1. Let _matcher_ be _R_.[[RegExpMatcher]].
             1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*.
             1. Let _matchSucceeded_ be *false*.

@anba
Copy link
Contributor Author

anba commented Jan 23, 2017

In PR #627, I had incorrectly removed a “useless” read access to the lastIndex property for non-global-and-non-sticky regexps, not noticing that the apparently innocuous ToLength() operation may in fact produce a regexp recompilation!

I thought that part was required for #627, but if that's not the case, your fix is certainly the way to go!

@claudepache
Copy link
Contributor

I thought that part was required for #627

It was not. The motivation was Issue #625, and the required part is avoiding unnecessary writing to the lastIndex property.

In ”reasonable” situations, uselessly reading the lastIndex property has zero side-effect. I was just guilty of premature optimisation.

@littledan
Copy link
Member

littledan commented Jan 23, 2017

cc @schuay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants