Skip to content

Commit

Permalink
Editorial: Extract operation 'ParsePattern' (tc39#1866)
Browse files Browse the repository at this point in the history
... from common code in IsValidRegularExpressionLiteral and RegExpInitialize.

(This was originally presented as a series of 12 small refactorings.
For more info, see the PR page.)
  • Loading branch information
jmdyck authored and ljharb committed May 26, 2020
1 parent 51289e8 commit d14a282
Showing 1 changed file with 30 additions and 15 deletions.
45 changes: 30 additions & 15 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -12971,12 +12971,13 @@ <h1>Static Semantics: IsValidRegularExpressionLiteral ( _literal_ )</h1>
<emu-alg>
1. Assert: _literal_ is a |RegularExpressionLiteral|.
1. If FlagText of _literal_ contains any code points other than `g`, `i`, `m`, `s`, `u`, or `y`, or if it contains the same code point more than once, return *false*.
1. Let _P_ be BodyText of _literal_.
1. If FlagText of _literal_ contains `u`, then
1. Parse _P_ using the grammars in <emu-xref href="#sec-patterns"></emu-xref>. The goal symbol for the parse is |Pattern[+U, +N]|. If _P_ did not conform to the grammar, if any elements of _P_ were not matched by the parse, or if any Early Error conditions exist, return *false*. Otherwise, return *true*.
1. Let _stringValue_ be UTF16Encode(_P_).
1. Let _pText_ be the sequence of code points resulting from interpreting each of the 16-bit elements of _stringValue_ as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
1. Parse _pText_ using the grammars in <emu-xref href="#sec-patterns"></emu-xref>. The goal symbol for the parse is |Pattern[~U, ~N]|. If the result of parsing contains a |GroupName|, reparse with the goal symbol |Pattern[~U, +N]|. If _pText_ did not conform to the grammar, if any elements of _pText_ were not matched by the parse, or if any Early Error conditions exist, return *false*. Otherwise, return *true*.
1. Let _patternText_ be BodyText of _literal_.
1. If FlagText of _literal_ contains `u`, let _u_ be *true*; else let _u_ be *false*.
1. If _u_ is *false*, then
1. Let _stringValue_ be UTF16Encode(_patternText_).
1. Set _patternText_ to the sequence of code points resulting from interpreting each of the 16-bit elements of _stringValue_ as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
1. Let _parseResult_ be ParsePattern(_patternText_, _u_).
1. If _parseResult_ is a Parse Node, return *true*; else return *false*.
</emu-alg>
</emu-clause>

Expand Down Expand Up @@ -32488,23 +32489,37 @@ <h1>Runtime Semantics: RegExpInitialize ( _obj_, _pattern_, _flags_ )</h1>
1. If _flags_ is *undefined*, let _F_ be the empty String.
1. Else, let _F_ be ? ToString(_flags_).
1. If _F_ contains any code unit other than *"g"*, *"i"*, *"m"*, *"s"*, *"u"*, or *"y"* or if it contains the same code unit more than once, throw a *SyntaxError* exception.
1. If _F_ contains *"u"*, let _BMP_ be *false*; else let _BMP_ be *true*.
1. If _BMP_ is *true*, then
1. Let _pText_ be the sequence of code points resulting from interpreting each of the 16-bit elements of _P_ as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
1. Parse _pText_ using the grammars in <emu-xref href="#sec-patterns"></emu-xref>. The goal symbol for the parse is |Pattern[~U, ~N]|. If the result of parsing contains a |GroupName|, reparse with the goal symbol |Pattern[~U, +N]| and use this result instead. Throw a *SyntaxError* exception if _pText_ did not conform to the grammar, if any elements of _pText_ were not matched by the parse, or if any Early Error conditions exist.
1. Let _patternCharacters_ be a List whose elements are the code unit elements of _P_.
1. If _F_ contains *"u"*, let _u_ be *true*; else let _u_ be *false*.
1. If _u_ is *true*, then
1. Let _patternText_ be ! UTF16DecodeString(_P_).
1. Let _patternCharacters_ be a List whose elements are the code points of _patternText_.
1. Else,
1. Let _pText_ be ! UTF16DecodeString(_P_).
1. Parse _pText_ using the grammars in <emu-xref href="#sec-patterns"></emu-xref>. The goal symbol for the parse is |Pattern[+U, +N]|. Throw a *SyntaxError* exception if _pText_ did not conform to the grammar, if any elements of _pText_ were not matched by the parse, or if any Early Error conditions exist.
1. Let _patternCharacters_ be a List whose elements are the code points of _pText_.
1. Let _patternText_ be the result of interpreting each of _P_'s 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
1. Let _patternCharacters_ be a List whose elements are the code unit elements of _P_.
1. Let _parseResult_ be ParsePattern(_patternText_, _u_).
1. If _parseResult_ is a non-empty List of *SyntaxError* objects, throw a *SyntaxError* exception.
1. Assert: _parseResult_ is a Parse Node for |Pattern|.
1. Set _obj_.[[OriginalSource]] to _P_.
1. Set _obj_.[[OriginalFlags]] to _F_.
1. Set _obj_.[[RegExpMatcher]] to the Abstract Closure that evaluates the above parse by applying the semantics provided in <emu-xref href="#sec-pattern-semantics"></emu-xref> using _patternCharacters_ as the pattern's List of |SourceCharacter| values and _F_ as the flag parameters.
1. Set _obj_.[[RegExpMatcher]] to the Abstract Closure that evaluates _parseResult_ by applying the semantics provided in <emu-xref href="#sec-pattern-semantics"></emu-xref> using _patternCharacters_ as the pattern's List of |SourceCharacter| values and _F_ as the flag parameters.
1. Perform ? Set(_obj_, *"lastIndex"*, 0, *true*).
1. Return _obj_.
</emu-alg>
</emu-clause>

<emu-clause id="sec-parsepattern" aoid="ParsePattern">
<h1>Static Semantics: ParsePattern ( _patternText_, _u_ )</h1>
<p>The abstract operation ParsePattern takes arguments _patternText_ (a sequence of Unicode code points) and _u_ (a Boolean). It performs the following steps when called:</p>
<emu-alg>
1. If _u_ is *true*, then
1. Parse _patternText_ using the grammars in <emu-xref href="#sec-patterns"></emu-xref>. The goal symbol for the parse is |Pattern[+U, +N]|.
1. Else,
1. Parse _patternText_ using the grammars in <emu-xref href="#sec-patterns"></emu-xref>. The goal symbol for the parse is |Pattern[~U, ~N]|. If the result of parsing contains a |GroupName|, reparse with the goal symbol |Pattern[~U, +N]| and use this result instead.
1. If _patternText_ did not conform to the grammar, or any elements of _patternText_ were not matched by the parse, or any Early Error conditions exist, return a List of one or more *SyntaxError* objects representing the parsing errors and/or early errors.
1. Otherwise, return the Parse Node resulting from the parse.
</emu-alg>
</emu-clause>

<emu-clause id="sec-regexpcreate" aoid="RegExpCreate">
<h1>Runtime Semantics: RegExpCreate ( _P_, _F_ )</h1>
<p>The abstract operation RegExpCreate takes arguments _P_ and _F_. It performs the following steps when called:</p>
Expand Down

0 comments on commit d14a282

Please sign in to comment.