Skip to content

Commit

Permalink
PR feedback and move Input derivation out of CompilePattern
Browse files Browse the repository at this point in the history
  • Loading branch information
rbuckton committed Jan 24, 2022
1 parent 6a93695 commit 0fe72de
Showing 1 changed file with 20 additions and 18 deletions.
38 changes: 20 additions & 18 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -34923,26 +34923,25 @@ <h1>Notation</h1>
<h1>Runtime Semantics: CompilePattern</h1>
<dl class="header">
<dt>description</dt>
<dd>It returns an Abstract Closure that takes a String and a non-negative integer and returns a MatchResult.</dd>
<dd>It returns an Abstract Closure that takes a List of characters and a non-negative integer and returns a MatchResult.</dd>
</dl>
<emu-grammar>Pattern :: Disjunction</emu-grammar>
<emu-alg>
1. Let _m_ be CompileSubpattern of |Disjunction| with argument ~forward~.
1. Return a new Abstract Closure with parameters (_str_, _index_) that captures _m_ and performs the following steps when called:
1. Assert: Type(_str_) is String.
1. Assert: _index_ is a non-negative integer which is &le; the length of _str_.
1. If _Unicode_ is *true*, let _Input_ be ! StringToCodePoints(_str_). Otherwise, let _Input_ be a List whose elements are the code units that are the elements of _str_. _Input_ will be used throughout the algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref>. Each element of _Input_ is considered to be a character.
1. Return a new Abstract Closure with parameters (_input_, _index_) that captures _m_ and performs the following steps when called:
1. Assert: _input_ is a List of characters.
1. Assert: _index_ is a non-negative integer which is &le; the number of characters in _input_.
1. Let _Input_ be _input_. This alias will be used throughout the algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref>.
1. Let _InputLength_ be the number of characters contained in _Input_. This alias will be used throughout the algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref>.
1. Let _listIndex_ be the index into _Input_ of the character that was obtained from element _index_ of _str_.
1. Let _c_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called:
1. Assert: _y_ is a State.
1. Return _y_.
1. Let _cap_ be a List of _NcapturingParens_ *undefined* values, indexed 1 through _NcapturingParens_.
1. Let _x_ be the State (_listIndex_, _cap_).
1. Let _x_ be the State (_index_, _cap_).
1. Return _m_(_x_, _c_).
</emu-alg>
<emu-note>
<p>A Pattern compiles to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref> are designed so that compiling a pattern may throw a *SyntaxError* exception; on the other hand, once the pattern is successfully compiled, applying the resulting Abstract Closure to find a match in a String cannot throw an exception (except for any implementation-defined exceptions that can occur anywhere such as out-of-memory).</p>
<p>A Pattern compiles to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a List of characters and an offset within that List to determine whether the pattern would match starting at exactly that offset within the List, and, if it does match, what the values of the capturing parentheses would be. The algorithms in <emu-xref href="#sec-pattern-semantics"></emu-xref> are designed so that compiling a pattern may throw a *SyntaxError* exception; on the other hand, once the pattern is successfully compiled, applying the resulting Abstract Closure to find a match in a List of characters cannot throw an exception (except for any implementation-defined exceptions that can occur anywhere such as out-of-memory).</p>
</emu-note>
</emu-clause>

Expand Down Expand Up @@ -35985,12 +35984,15 @@ <h1>
1. Let _matcher_ be _R_.[[RegExpMatcher]].
1. If _flags_ contains *"u"*, let _fullUnicode_ be *true*; else let _fullUnicode_ be *false*.
1. Let _matchSucceeded_ be *false*.
1. If _fullUnicode_ is *true*, let _input_ be ! StringToCodePoints(_S_). Otherwise, let _input_ be a List whose elements are the code units that are the elements of _S_.
1. NOTE: Each element of _input_ is considered to be a character.
1. Repeat, while _matchSucceeded_ is *false*,
1. If _lastIndex_ &gt; _length_, then
1. If _global_ is *true* or _sticky_ is *true*, then
1. Perform ? Set(_R_, *"lastIndex"*, *+0*<sub>𝔽</sub>, *true*).
1. Return *null*.
1. Let _r_ be _matcher_(_S_, _lastIndex_).
1. Let _inputIndex_ be the index into _input_ of the character that was obtained from element _lastIndex_ of _S_.
1. Let _r_ be _matcher_(_input_, _inputIndex_).
1. If _r_ is ~failure~, then
1. If _sticky_ is *true*, then
1. Perform ? Set(_R_, *"lastIndex"*, *+0*<sub>𝔽</sub>, *true*).
Expand All @@ -36000,7 +36002,7 @@ <h1>
1. Assert: _r_ is a State.
1. Set _matchSucceeded_ to *true*.
1. Let _e_ be _r_'s _endIndex_ value.
1. If _fullUnicode_ is *true*, set _e_ to ! GetStringIndex(_S_, _Input_, _e_).
1. If _fullUnicode_ is *true*, set _e_ to ! GetStringIndex(_S_, _input_, _e_).
1. If _global_ is *true* or _sticky_ is *true*, then
1. Perform ? Set(_R_, *"lastIndex"*, 𝔽(_e_), *true*).
1. Let _n_ be the number of elements in _r_'s _captures_ List. (This is the same value as <emu-xref href="#sec-notation"></emu-xref>'s _NcapturingParens_.)
Expand Down Expand Up @@ -36031,8 +36033,8 @@ <h1>
1. Let _captureStart_ be _captureI_'s _startIndex_.
1. Let _captureEnd_ be _captureI_'s _endIndex_.
1. If _fullUnicode_ is *true*, then
1. Set _captureStart_ to ! GetStringIndex(_S_, _Input_, _captureStart_).
1. Set _captureEnd_ to ! GetStringIndex(_S_, _Input_, _captureEnd_).
1. Set _captureStart_ to ! GetStringIndex(_S_, _input_, _captureStart_).
1. Set _captureEnd_ to ! GetStringIndex(_S_, _input_, _captureEnd_).
1. Let _capture_ be the Match Record { [[StartIndex]]: _captureStart_, [[EndIndex]]: _captureEnd_ }.
1. Let _capturedValue_ be ! GetMatchString(_S_, _capture_).
1. Append _capture_ to _indices_.
Expand Down Expand Up @@ -36074,14 +36076,14 @@ <h1>
<h1>
GetStringIndex (
_S_: a String,
_Input_: a List,
_Input_: a List of characters derived from _S_,
_e_: a non-negative integer,
)
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Assert: _Input_ contains the code points of _S_ interpreted as a UTF-16 encoded string.
1. Assert: _Input_ contains the code points of StringToCodePoints(_S_).
1. If _S_ is the empty String, return 0.
1. Let _eUTF_ be the smallest index into _S_ that corresponds to the character at element _e_ of _Input_. If _e_ is greater than or equal to the number of elements in _Input_, then _eUTF_ is the number of code units in _S_.
1. Return _eUTF_.
Expand All @@ -36101,12 +36103,12 @@ <h1>Match Records</h1>
</tr>
<tr>
<td>[[StartIndex]]</td>
<td>An integer &ge; 0.</td>
<td>a non-negative integer</td>
<td>The number of code units from the start of a string at which the match begins (inclusive).</td>
</tr>
<tr>
<td>[[EndIndex]]</td>
<td>An integer &ge; [[StartIndex]].</td>
<td>an integer &ge; [[StartIndex]]</td>
<td>The number of code units from the start of a string at which the match ends (exclusive).</td>
</tr>
</table>
Expand Down Expand Up @@ -36149,8 +36151,8 @@ <h1>
<h1>
MakeIndicesArray (
_S_: a String,
_indices_: a List,
_groupNames_: a List or *undefined*,
_indices_: a List, each of whose elements is a Match Record or *undefined*,
_groupNames_: a List, each of whose elements is a String or *undefined*,
_hasGroups_: a Boolean,
)
</h1>
Expand Down

0 comments on commit 0fe72de

Please sign in to comment.