Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for duplicate named capture groups #3704

Closed
ptomato opened this issue Oct 27, 2022 · 1 comment
Closed

Tests for duplicate named capture groups #3704

ptomato opened this issue Oct 27, 2022 · 1 comment
Assignees

Comments

@ptomato
Copy link
Contributor

ptomato commented Oct 27, 2022

Duplicate named capture groups reached Stage 3 in July 2022.

Summary

The proposal in a nutshell: (?<name>regexp) is the existing regexp syntax for "named capture group". Prior to this proposal it was forbidden to have two named capturing groups with the same name in one regexp. Now, it is permissible to have capturing groups with the same name, as long as they are in separate alternatives within the regexp. In other words:

  • OK: (?<a>x)|(?<a>y)
  • Not OK: (?<a>x)(?<a>y)

Testing plan

I'll make some tables. The checkmarks (☑) show tests that already exist.

There are syntax changes and behavioural changes to test.

Syntax

The syntax tests are basically:

  • Duplicate named capturing groups in separate alternatives ((?<a>x)|(?<a>y)) should compile correctly
  • Duplicate named capturing groups in the same alternative ((?<a>x)(?<a>y)) should not compile

There are three entry points that compile RegExp syntax:

  • RegExp literal evaluation (at parse time)
  • new RegExp() (at runtime)
  • RegExp.prototype.compile (at runtime)

These tests are pretty simple. The tests for RegExp literals already exist. To me it makes sense to have similar ones for compiling a RegExp from a string. If we test it for new RegExp() it's likely redundant for R.p.compile() but it's a small amount of work.

In table form:

Test RegExp literal new RegExp() RegExp.p.compile()
DNCG in separate alternatives
DNCG in same alternative

Behaviour

Behaviour that should be tested:

  • Basic behaviour of the method to which the RegExp is passed (what does and doesn't it match, and replace if applicable)
  • Backreference to a duplicated named capture group (\k<name> is the existing syntax for backreference to a named capture group) contains the reference to the alternative which actually matched (e.g. (?:(?<a>x)|(?<a>y))\k<a> matches xx, yy, not xy, yx)
  • Backreference to a duplicated named capture group that didn't match, matches unconditionally without consuming input (e.g. ^(?:(?<a>x)|(?<a>y)|z)\k<a>$ matches xx, yy, and z, not zz)
  • Backreference to a named capture group in a different alternative (e.g. (?<a>x)|\k<a> is treated as if the capture group didn't match (i.e., the backreference matches unconditionally)
  • Backreference to a named capture group inside an iteration (e.g., +, {2}, etc.), which didn't match but did match on a previous iteration, is still treated as if it didn't match (e.g. ^(?:(?<a>x)|(?<a>y)|z){2}\k<a>$ matches xz, yz, not xzx, yzy)

The entry points where we could test behaviour of a RegExp with duplicate named capture groups:

  • RegExp.prototype.exec
  • RegExp.prototype.test
  • String.prototype.match
  • String.prototype.replace
  • String.prototype.replaceAll
  • String.prototype.search
  • String.prototype.split
  • String.prototype.matchAll
  • RegExp.prototype[Symbol.match]
  • RegExp.prototype[Symbol.replace]
  • RegExp.prototype[Symbol.search]
  • RegExp.prototype[Symbol.split]
  • RegExp.prototype[Symbol.matchAll]

I'd say that from a perspective of usefulness to engines, it would be diminishing returns to test all of these entry points. After all, an implementation using different regular expression engines for different entry points doesn't seem a likely failure mode. @Ms2ger and I discussed this and we think it makes sense to write tests for the first item (basic functionality) for all non-symbol methods, and pick one RegExp and one String method to test the other four items (RegExp.p.exec and String.p.match).

In table form:

Test R.exec R.test S.match S.replace S.replaceAll S.search S.split S.matchAll @@match @@replace @@search @@split @@matchall
DNCG basic no no no no no
Backref to matched DNCG no no no no no no no no no no no
Backref to unmatched DNCG no no no no no no no no no no no
Backref to NCG in different alternative no no no no no no no no no no no
Backref to DNCG matched in prev iteration no no no no no no no no no no no

Results on match objects

On returned match objects, there are some additional things that should be tested. For each named capture group, there's a property with the same name on the match object's groups object, and if the RegExp had the d flag, on the match object's indices.groups object as well.

For each of these objects (match.groups, match.indices.groups) we should test:

  • Property corresponding to a duplicated named capture group which matched, contains the text/indices which actually matched
  • Property corresponding to a duplicated named capture group which didn't match, is present and has the value undefined
  • Property corresponding to a duplicated named capture group inside an iteration, which didn't match but did match in a previous iteration, is still treated as if it didn't match (property present and undefined)
  • Iteration order of properties corresponding to duplicated named capture groups is in order of first appearance in the RegExp's source text

The entry points that return match objects that can be influenced by RegExps with duplicate named capture groups:

  • RegExp.prototype.exec
  • String.prototype.match
  • String.prototype.matchAll
  • RegExp.prototype[Symbol.match]
  • RegExp.prototype[Symbol.matchAll]

Same here, I think it's sufficient to test this functionality for RegExp.p.exec and String.p.match.

In table form:

Test R.exec S.match S.matchAll @@match @@matchall
Matched DNCG on groups no no no
Unmatched DNCG on groups no no no
DNCG matched in previous iteration on groups no no no
Iteration order of properties on groups no no no
Matched DNCG on indices.groups no no no
Unmatched DNCG on indices.groups no no no
DNCG matched in previous iteration on indices.groups no no no
Iteration order of properties on indices.groups no no no

cc @bakkot

@ptomato ptomato self-assigned this Oct 27, 2022
@ptomato
Copy link
Contributor Author

ptomato commented Oct 27, 2022

@Ms2ger and I split up the work in the tables above.

ptomato added a commit to ptomato/test262 that referenced this issue Oct 28, 2022
Parse-time syntax for RegExp literals is already tested. These two files
test runtime RegExp compilation, with respect to duplicate named capture
groups.

See: tc39#3704
ptomato added a commit to ptomato/test262 that referenced this issue Oct 28, 2022
These tests should cover the full functionality of the .groups object (and
the .indices.groups object, in the case of the /d flag) for RegExp.p.exec
and String.p.match:

- Matched DNCG has a result
- Unmatched DNCG is present and undefined
- DNCG matched in previous iteration but not in current iteration is
  treated as unmatched
- Iteration order of properties corresponds with source order

See: tc39#3704
Ms2ger pushed a commit to ptomato/test262 that referenced this issue Nov 2, 2022
Parse-time syntax for RegExp literals is already tested. These two files
test runtime RegExp compilation, with respect to duplicate named capture
groups.

See: tc39#3704
Ms2ger pushed a commit to ptomato/test262 that referenced this issue Nov 2, 2022
These tests should cover the full functionality of the .groups object (and
the .indices.groups object, in the case of the /d flag) for RegExp.p.exec
and String.p.match:

- Matched DNCG has a result
- Unmatched DNCG is present and undefined
- DNCG matched in previous iteration but not in current iteration is
  treated as unmatched
- Iteration order of properties corresponds with source order

See: tc39#3704
Ms2ger pushed a commit that referenced this issue Nov 2, 2022
Parse-time syntax for RegExp literals is already tested. These two files
test runtime RegExp compilation, with respect to duplicate named capture
groups.

See: #3704
Ms2ger pushed a commit that referenced this issue Nov 2, 2022
These tests should cover the full functionality of the .groups object (and
the .indices.groups object, in the case of the /d flag) for RegExp.p.exec
and String.p.match:

- Matched DNCG has a result
- Unmatched DNCG is present and undefined
- DNCG matched in previous iteration but not in current iteration is
  treated as unmatched
- Iteration order of properties corresponds with source order

See: #3704
Ms2ger added a commit that referenced this issue Nov 3, 2022
Ms2ger added a commit that referenced this issue Nov 3, 2022
Ms2ger added a commit that referenced this issue Nov 3, 2022
Ms2ger added a commit that referenced this issue Nov 3, 2022
Ms2ger added a commit that referenced this issue Nov 4, 2022
Ms2ger added a commit that referenced this issue Nov 4, 2022
@ptomato ptomato closed this as completed in 745f3c0 Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant