From ee198945e388bd3298b73fb842fa0175d5b6a0f9 Mon Sep 17 00:00:00 2001 From: Ivan Nikulin Date: Fri, 23 Jun 2017 20:00:15 +0100 Subject: [PATCH] Handle ambiguous ampersands of arbitrary length Closes #1257. --- source | 130 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 63 insertions(+), 67 deletions(-) diff --git a/source b/source index 53dd62e3789..b3f8e1b1f56 100644 --- a/source +++ b/source @@ -101822,6 +101822,19 @@ dictionary StorageEventInit : EventInit { of the last start tag to have been emitted from this tokenizer, if any. If no start tag has been emitted from this tokenizer, then no end tag token is appropriate.

+

A character reference is said to be consumed as part of an attribute if the return state is either attribute value (double-quoted) state, + attribute value (single-quoted) state or attribute value (unquoted) + state.

+ +

When a state says to flush code points consumed as a character reference, it means + that for each code point in the temporary + buffer (in the order they were added to the buffer) user agent must append the code point + from the buffer to the current attribute's value if the character reference was consumed as part of an attribute, or emit the code point as a + character token otherwise.

+

Before each step of the tokenizer, the user agent must first check the parser pause flag. If it is true, then the tokenizer must abort the processing of any nested invocations of the tokenizer, yielding control back to the caller.

@@ -103903,33 +103916,23 @@ dictionary StorageEventInit : EventInit {
Character reference state
-

Set the temporary buffer to the empty string. Append a - U+0026 AMPERSAND (&) character to the temporary buffer. -

Consume the next input character:

-
U+0009 CHARACTER TABULATION (tab)
-
U+000A LINE FEED (LF)
-
U+000C FORM FEED (FF)
- -
U+0020 SPACE
-
U+003C LESS-THAN SIGN
-
U+0026 AMPERSAND
-
EOF
- -
Reconsume in the character reference end state.
+
ASCII alphanumeric
+

Set the temporary buffer to the empty string. Append + a U+0026 AMPERSAND (&) character to the temporary + buffer. Reconsume in the named character reference state.

U+0023 NUMBER SIGN (#)
- -
Append the current input character to the temporary buffer. Switch to the numeric character reference +

Set the temporary buffer to the empty string. Append + a U+0026 AMPERSAND (&) character and the current input character to the temporary buffer. Switch to the numeric character reference state.

Anything else
- -
Reconsume in the named character reference state.
+
Reconsume in the return state.
@@ -103946,13 +103949,12 @@ dictionary StorageEventInit : EventInit {
If there is a match
-

If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, - attribute value (single-quoted) state or attribute value (unquoted) - state), and the last character matched is not a U+003B SEMICOLON character (;), and the - next input character is either a U+003D EQUALS SIGN character (=) or an - ASCII alphanumeric, then, for historical reasons, switch to the character - reference end state.

+

If the character reference was consumed as part of an + attribute, and the last character matched is not a U+003B SEMICOLON character (;), and + the next input character is either a U+003D EQUALS SIGN character (=) or an + ASCII alphanumeric, then, for historical reasons, flush code points consumed + as a character reference and switch to the return state. +

Otherwise:

@@ -103967,21 +103969,19 @@ dictionary StorageEventInit : EventInit { Append one or two characters corresponding to the character reference name (as given by the second column of the named character references table) to the temporary buffer.

+ +
  • Flush code points consumed as a character reference. Switch to the return state.
  • Otherwise
    -
    If the temporary buffer consists of a U+0026 AMPERSAND - character (&) followed by a sequence of one or more ASCII - alphanumerics and a U+003B SEMICOLON character (;), then this is an unknown-named-character-reference - parse error.
    +
    Flush code points consumed as a character reference. Switch to the + ambiguous ampersand state.
    -

    Switch to the character reference end state.

    -

    If the markup contains (not in an attribute) the string I'm &notit; I @@ -103997,6 +103997,29 @@ dictionary StorageEventInit : EventInit {

    +
    Ambiguous ampersand state
    + +

    Consume the next input character:

    + +
    + +
    ASCII alphanumeric
    +
    If the character reference was consumed as part of an + attribute, then append the current input character to the current + attribute's value. Otherwise, emit the current input character as a character + token.
    + +
    U+003B SEMICOLON (;)
    +
    This is an unknown-named-character-reference + parse error. Reconsume in the return + state. + +
    Anything else
    +
    Reconsume in the return state.
    + +
    +
    Numeric character reference state

    Set the character reference code to @@ -104030,8 +104053,8 @@ dictionary StorageEventInit : EventInit {

    Anything else
    This is an absence-of-digits-in-numeric-character-reference - parse error. Reconsume in the character reference end - state.
    + parse error. Flush code points consumed as a character reference. + Reconsume in the return state. @@ -104048,8 +104071,8 @@ dictionary StorageEventInit : EventInit {
    Anything else
    This is an absence-of-digits-in-numeric-character-reference - parse error. Reconsume in the character reference end - state.
    + parse error. Flush code points consumed as a character reference. + Reconsume in the return state. @@ -104141,10 +104164,8 @@ dictionary StorageEventInit : EventInit {
  • If the number is 0x0D or a control, but not ASCII whitespace, then this is a control-character-reference - parse error.

    - -

    If the number is one of the numbers in the first column of the following table, then find the - row with that number in the first column, and set the parse error. If the number is one of the numbers in the first column of the + following table, then find the row with that number in the first column, and set the character reference code to the number in the second column of that row.

    @@ -104191,33 +104212,8 @@ dictionary StorageEventInit : EventInit {

    Set the temporary buffer to the empty string. Append a code point equal to the character reference code to - the temporary buffer. Switch to the character reference - end state.

    - - -
    Character reference end state
    - -

    Consume the next input character.

    - -

    Check the return state:

    - -
    - -
    Attribute value (double-quoted) state
    -
    Attribute value (single-quoted) state
    -
    Attribute value (unquoted) state
    - -
    Append each character in the temporary buffer (in the - order they were added to the buffer) to the current attribute's value.
    - -
    Anything else
    - -
    For each of the characters in the temporary buffer (in - the order they were added to the buffer), emit the character as a character token.
    - -
    - -

    Reconsume in the return state.

    + the temporary buffer. Flush code points consumed as a + character reference. Switch to the return state.