Skip to content

Commit

Permalink
Handle ambiguous ampersands of arbitrary length (closes #1257)
Browse files Browse the repository at this point in the history
  • Loading branch information
inikulin committed Jun 1, 2017
1 parent 32dbd7d commit b0e2b27
Showing 1 changed file with 59 additions and 62 deletions.
121 changes: 59 additions & 62 deletions source
Original file line number Diff line number Diff line change
Expand Up @@ -102097,6 +102097,19 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
of the last start tag to have been emitted from this tokenizer, if any. If no start tag has been
emitted from this tokenizer, then no end tag token is appropriate.</p>

<p>A <span data-x="syntax-charref">character reference</span> is said to be <dfn
data-x="charref-in-attribute">consumed as part of an attribute</dfn> if the <var data-x="return
state">return state</var> is either <span>attribute value (double-quoted) state</span>,
<span>attribute value (single-quoted) state</span> or <span>attribute value (unquoted)
state</span>.</p>

<p>When a state says to <dfn>flush code points consumed as a character reference</dfn>, it means
that for each <span>code point</span> in the <var data-x="temporary buffer">temporary
buffer</var> (in the order they were added to the buffer) user agent must append the code point
from the buffer to the current attribute's value if the character reference was <span
data-x="charref-in-attribute">consumed as part of an attribute</span>, or emit a character token
otherwise.</p>

<p id="check-parser-pause-flag">Before each step of the tokenizer, the user agent must first check
the <span>parser pause flag</span>. If it is true, then the tokenizer must abort the processing of
any nested invocations of the tokenizer, yielding control back to the caller.</p>
Expand Down Expand Up @@ -104185,26 +104198,17 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {

<dl class="switch">

<dt>U+0009 CHARACTER TABULATION (tab)</dt>
<dt>U+000A LINE FEED (LF)</dt>
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
<dt>U+003C LESS-THAN SIGN</dt>
<dt>U+0026 AMPERSAND</dt>
<dt>EOF</dt>

<dd><span>Reconsume</span> in the <span>character reference end state</span>.</dd>
<dt><span data-x="ASCII alphanumeric">ASCII alphanumeric</span></dt>
<dd><span>Reconsume</span> in the <span>named character reference state</span>.</dd>

<dt>U+0023 NUMBER SIGN (#)</dt>

<dd>Append the <span>current input character</span> to the <var data-x="temporary
buffer">temporary buffer</var>. Switch to the <span>numeric character reference
state</span>.</dd>

<dt>Anything else</dt>

<dd><span>Reconsume</span> in the <span>named character reference state</span>.</dd>
<dd><span>Flush code points consumed as a character reference</span>. <span>Reconsume</span> in
the <var data-x="return state">return state</var>.</dd>

</dl>

Expand All @@ -104221,13 +104225,12 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<dt>If there is a match</dt>

<dd>
<p>If the character reference was consumed as part of an attribute (<var data-x="return
state">return state</var> is either <span>attribute value (double-quoted) state</span>,
<span>attribute value (single-quoted) state</span> or <span>attribute value (unquoted)
state</span>), and the last character matched is not a U+003B SEMICOLON character (;), and the
<span>next input character</span> is either a U+003D EQUALS SIGN character (=) or an
<span>ASCII alphanumeric</span>, then, for historical reasons, switch to the <span>character
reference end state</span>.</p>
<p>If the character reference was <span data-x="charref-in-attribute">consumed as part of an
attribute</span>, and the last character matched is not a U+003B SEMICOLON character (;), and
the <span>next input character</span> is either a U+003D EQUALS SIGN character (=) or an
<span>ASCII alphanumeric</span>, then, for historical reasons, <span>flush code points consumed
as a character reference</span> and switch to the <var data-x="return state">return state</var>.
</p>
<!-- "=" added because of https://www.w3.org/Bugs/Public/show_bug.cgi?id=9207#c5 -->

<p>Otherwise:</p>
Expand All @@ -104242,21 +104245,19 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
Append one or two characters corresponding to the character reference name (as given by the
second column of the <span>named character references</span> table) to the <var
data-x="temporary buffer">temporary buffer</var>.</p></li>

<li><span>Flush code points consumed as a character reference</span>. Switch to the <var
data-x="return state">return state</var>.</li>
</ol>
</dd>

<dt>Otherwise</dt>

<dd>If the <var data-x="temporary buffer">temporary buffer</var> consists of a U+0026 AMPERSAND
character (&amp;) followed by a sequence of one or more <span data-x="ASCII alphanumeric">ASCII
alphanumerics</span> and a U+003B SEMICOLON character (;), then this is an <span
data-x="parse-error-unknown-named-character-reference">unknown-named-character-reference</span>
<span>parse error</span>.</dd>
<dd><span>Flush code points consumed as a character reference</span>. Switch to the
<span>ambiguous ampersand state</span>.</dd>

</dl>

<p>Switch to the <span>character reference end state</span>.</p>

<div class="example">

<p>If the markup contains (not in an attribute) the string <code data-x="">I'm &amp;notit; I
Expand All @@ -104272,6 +104273,29 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
</div>


<h5><dfn>Ambiguous ampersand state</dfn></h5>

<p>Consume the <span>next input character</span>:</p>

<dl class="switch">

<dt><span data-x="ASCII alphanumeric">ASCII alphanumeric</span></dt>
<dd>If the character reference was <span data-x="charref-in-attribute">consumed as part of an
attribute</span>, then append the <span>current input character</span> to the current
attribute's value. Otherwise, emit the <span>current input character</span> as a character
token.</dd>

<dt>U+003B SEMICOLON (;)</dt>
<dd>This is an <span
data-x="parse-error-unknown-named-character-reference">unknown-named-character-reference</span>
<span>parse error</span>. <span>Reconsume</span> in the <var data-x="return state">return
state</var>.

<dt>Anything else</dt>
<dd><span>Reconsume</span> in the <var data-x="return state">return state</var>.</dd>

</dl>

<h5><dfn>Numeric character reference state</dfn></h5>

<p>Set the <dfn><var data-x="character reference code">character reference code</var></dfn> to
Expand Down Expand Up @@ -104305,8 +104329,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<dt>Anything else</dt>
<dd>This is an <span
data-x="parse-error-absence-of-digits-in-numeric-character-reference">absence-of-digits-in-numeric-character-reference</span>
<span>parse error</span>. <span>Reconsume</span> in the <span>character reference end
state</span>.</dd>
<span>parse error</span>. <span>Flush code points consumed as a character reference</span>.
<span>Reconsume</span> in the <var data-x="return state">return state</var>.</dd>

</dl>

Expand All @@ -104323,8 +104347,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<dt>Anything else</dt>
<dd>This is an <span
data-x="parse-error-absence-of-digits-in-numeric-character-reference">absence-of-digits-in-numeric-character-reference</span>
<span>parse error</span>. <span>Reconsume</span> in the <span>character reference end
state</span>.</dd>
<span>parse error</span>. <span>Flush code points consumed as a character reference</span>.
<span>Reconsume</span> in the <var data-x="return state">return state</var>.</dd>

</dl>

Expand Down Expand Up @@ -104416,10 +104440,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<li><p>If the number is 0x0D<!-- CR is not allowed --> or a
<span data-x="control">control</span>, but not <span>ASCII whitespace</span>, then this is a
<span data-x="parse-error-control-character-reference">control-character-reference</span>
<span>parse error</span>.</p>

<p>If the number is one of the numbers in the first column of the following table, then find the
row with that number in the first column, and set the <var
<span>parse error</span>. If the number is one of the numbers in the first column of the
following table, then find the row with that number in the first column, and set the <var
data-x="character reference code">character reference code</var> to the number in the second
column of that row.</p>
<!-- these are Unicode C1 control characters -->
Expand Down Expand Up @@ -104466,33 +104488,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {

<p>Set the <var data-x="temporary buffer">temporary buffer</var> to the empty string. Append a
code point equal to the <var data-x="character reference code">character reference code</var> to
the <var data-x="temporary buffer">temporary buffer</var>. Switch to the <span>character reference
end state</span>.</p>


<h5><dfn>Character reference end state</dfn></h5>

<p>Consume the <span>next input character</span>.</p>

<p>Check the <var data-x="return state">return state</var>:</p>

<dl class="switch">

<dt><span>Attribute value (double-quoted) state</span></dt>
<dt><span>Attribute value (single-quoted) state</span></dt>
<dt><span>Attribute value (unquoted) state</span></dt>

<dd>Append each character in the <var data-x="temporary buffer">temporary buffer</var> (in the
order they were added to the buffer) to the current attribute's value.</dd>

<dt>Anything else</dt>

<dd>For each of the characters in the <var data-x="temporary buffer">temporary buffer</var> (in
the order they were added to the buffer), emit the character as a character token.</dd>

</dl>

<p><span>Reconsume</span> in the <var data-x="return state">return state</var>.</p>
the <var data-x="temporary buffer">temporary buffer</var>. <span>Flush code points consumed as a
character reference</span>. Switch to the <var data-x="return state">return state</var>.</p>

</div>

Expand Down

0 comments on commit b0e2b27

Please sign in to comment.