From 92a152c861dbe460d472e9f989ad41036be6a2d9 Mon Sep 17 00:00:00 2001 From: Simon Pieters Date: Tue, 14 Sep 2021 12:56:50 +0200 Subject: [PATCH] Define speculative HTML parsing Fixes #5624. --- source | 271 ++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 249 insertions(+), 22 deletions(-) diff --git a/source b/source index 1dfdcb28b5b..7f29ee05f28 100644 --- a/source +++ b/source @@ -110297,6 +110297,21 @@ dictionary StorageEventInit : EventInit { particular intended parent, the UA must run the following steps:

    +
  1. If the active speculative HTML parser is not null, then return the result of + creating a speculative mock element + given given namespace, the tag name of the given token, and the attributes of the + given token.

  2. + +
  3. +

    Otherwise, optionally create a speculative mock element given given + namespace, the tag name of the given token, and the attributes of the given token.

    + +

    The result is not used. This step allows for a speculative fetch to + be initiated from non-speculative parsing. The fetch is still speculative at this point, + because, for example, by the time the element is inserted, intended parent might + have been removed from the document.

    +
  4. +
  5. Let document be intended parent's node document.

  6. Let local name be the tag name of the token.

  7. @@ -111030,20 +111045,27 @@ document.body.appendChild(text);

    Acknowledge the token's self-closing flag, if it is set.

    -

    If the element has a charset attribute, and getting an encoding from - its value results in an encoding, and the - confidence is currently tentative, then - change the encoding to the resulting encoding.

    +

    If the active speculative HTML parser is null, then:

    + +
      +
    1. If the element has a charset attribute, and getting an encoding from + its value results in an encoding, and the + confidence is currently tentative, + then change the encoding to the resulting encoding.

    2. + +
    3. Otherwise, if the element has an http-equiv + attribute whose value is an ASCII case-insensitive match for the string "Content-Type", and the element has a content attribute, and applying the algorithm for + extracting a character encoding from a meta element to that attribute's + value returns an encoding, and the + confidence is currently tentative, + then change the encoding to the extracted encoding.

    4. +
    -

    Otherwise, if the element has an http-equiv - attribute whose value is an ASCII case-insensitive match for the string "Content-Type", and the element has a content attribute, and applying the algorithm for - extracting a character encoding from a meta element to that attribute's - value returns an encoding, and the - confidence is currently tentative, then - change the encoding to the extracted encoding.

    +

    The speculative HTML parser doesn't speculatively apply character + encoding declarations in order to reduce implementation complexity.

    A start tag whose tag name is "title"
    @@ -112525,8 +112547,8 @@ document.body.appendChild(text);
    An end tag whose tag name is "script"
    -

    If the JavaScript execution context stack is empty, perform a microtask - checkpoint.

    +

    If the active speculative HTML parser is null and the JavaScript execution + context stack is empty, then perform a microtask checkpoint.

    Let script be the current node (which will be a script element).

    @@ -112541,10 +112563,11 @@ document.body.appendChild(text);

    Increment the parser's script nesting level by one.

    -

    Prepare the script. This might - cause some script to execute, which might cause new characters - to be inserted into the tokenizer, and might cause the tokenizer to output more tokens, - resulting in a reentrant invocation of the parser.

    +

    If the active speculative HTML parser is null, then prepare the script. This might cause some script to execute, which + might cause new characters to be inserted into the + tokenizer, and might cause the tokenizer to output more tokens, resulting in a reentrant invocation of the parser.

    Decrement the parser's script nesting level by one. If the parser's script nesting level is zero, then set the parser pause flag to false.

    @@ -112580,6 +112603,9 @@ document.body.appendChild(text);
  8. Let the script be the pending parsing-blocking script. There is no longer a pending parsing-blocking script.

  9. +
  10. Start the speculative HTML parser for this instance of the HTML + parser.

  11. +
  12. Block the tokenizer for this instance of the HTML parser, such that the event loop will not run tasks that invoke the Document.

  13. +
  14. Stop the speculative HTML parser for this instance of the HTML + parser.

  15. +
  16. Unblock the tokenizer for this instance of the HTML parser, such that tasks that invoke the tokenizer can again be run.

  17. @@ -114077,9 +114106,9 @@ document.body.appendChild(text);

    Increment the parser's script nesting level by one. Set the parser pause flag to true.

    -

    Process the - SVG script element according to the SVG rules, if the user agent - supports SVG.

    +

    If the active speculative HTML parser is null and the user agent supports SVG, + then Process the + SVG script element according to the SVG rules.

    Even if this causes new characters to be inserted into the tokenizer, the parser will not be executed reentrantly, since the @@ -114137,6 +114166,9 @@ document.body.appendChild(text);

      +
    1. If the active speculative HTML parser is not null, then stop the + speculative HTML parser and return.

    2. +
    3. Set the insertion point to undefined.

    4. Update the current document readiness to "

      Throw away any pending content in the input stream, and discard any future content that would have been added to it.

    5. +
    6. Stop the speculative HTML parser for this HTML parser.

    7. +
    8. Update the current document readiness to "interactive".

    9. @@ -114286,6 +114320,196 @@ document.body.appendChild(text); +
      + +

      Speculative HTML parsing

      + +

      User agents may implement an optimization, as described in this section, to speculatively fetch + resources that are declared in the HTML markup while the HTML parser is waiting for a + pending parsing-blocking script to be fetched and executed, or during normal parsing, + at the time an element is created for a token. + While this optimization is not defined in precise detail, there are some rules to consider for + interoperability.

      + +

      Each HTML parser can have an active speculative HTML parser. It + is initially null.

      + +

      The speculative HTML parser must act like the normal HTML parser (e.g., the + tree builder rules apply), with some exceptions:

      + +
        +
      • +

        The state of the normal HTML parser and the document itself must not be affected.

        + +

        For example, the next input character or the stack of open + elements for the normal HTML parser is not affected by the speculative HTML + parser.

        +
      • + +
      • +

        Bytes pushed into the HTML parser's input byte stream must also be pushed into + the speculative HTML parser's input byte stream. Bytes read from the streams must + be independent.

        +
      • + +
      • +

        The result of the speculative parsing is primarily a series of speculative fetches. Which kinds of resources to speculatively fetch is + implementation-defined, but user agents must not speculatively fetch resources that + would not be fetched with the normal HTML parser, under the assumption that the script that is + blocking the HTML parser does nothing.

        + +

        It is possible that the same markup is seen multiple times from the + speculative HTML parser and then the normal HTML parser. It is expected that + duplicated fetches will be prevented by caching rules, which are not yet fully specified.

        +
      • +
      + +

      A speculative fetch for a speculative mock element element + must follow these rules:

      + +

      Should some of these things be applied to the document "for real", even + though they are found speculatively?

      + +
        +
      • +

        If the speculative HTML parser encounters one of the following elements, then + act as if that element is processed for the purpose of its effect of subsequent speculative + fetches.

        + +
          +
        • A base element.
        • + +
        • A meta element whose http-equiv + attribute is in the Content + security policy state.
        • + +
        • A meta element whose name attribute is an + ASCII case-insensitive match for "referrer".
        • + +
        • A meta element whose name attribute is an + ASCII case-insensitive match for "viewport". (This can + affect whether a media query list matches the environment.)
        • +
        +
      • + +
      • Let url be the URL that element would fetch if it was + processed normally. If there is no such URL or if it is the empty string, then do + nothing. Otherwise, if url is already in the list of speculative fetch + URLs, then do nothing. Otherwise, fetch url as if the element was processed + normally, and add url to the list of speculative fetch URLs.

      • +
      + +

      Each Document has a list of speculative fetch URLs, which is a + list of URLs, initially empty.

      + +

      To start the speculative HTML parser for an instance of an HTML parser + parser:

      + +
        +
      1. +

        Optionally, return.

        + +

        This step allows user agents to opt out of speculative HTML parsing.

        +
      2. + +
      3. +

        If parser's active speculative HTML parser is not null, then + stop the speculative HTML parser for parser.

        + +

        This can happen when document.write() + writes another parser-blocking script. For simplicity, this specification always restarts + speculative parsing, but user agents can implement a more efficient strategy, so long as the end + result is equivalent.

        +
      4. + +
      5. Let speculativeParser be a new speculative HTML parser, with the + same state as parser.

      6. + +
      7. Let speculativeDoc be a new isomorphic representation of parser's + Document, where all elements are instead speculative mock elements. Let speculativeParser parse into + speculativeDoc.

      8. + +
      9. Set parser's active speculative HTML parser to + speculativeParser.

      10. + +
      11. In parallel, run speculativeParser until it is stopped or until it + reaches the end of its input stream.

      12. +
      + + +

      To stop the speculative HTML parser for an instance of an HTML parser + parser:

      + +
        +
      1. Let speculativeParser be parser's active speculative HTML + parser.

      2. + +
      3. If speculativeParser is null, then return.

      4. + +
      5. Throw away any pending content in speculativeParser's input + stream, and discard any future content that would have been added to it.

      6. + +
      7. Set parser's active speculative HTML parser to null.

      8. +
      + +

      The speculative HTML parser will create speculative mock elements instead of normal elements. DOM + operations that the tree builder normally does on elements are expected to work appropriately on + speculative mock elements.

      + +

      A speculative mock element is a struct with the following items:

      + +
        +
      • A string namespace, corresponding + to an element's namespace.

      • + +
      • A string local name, + corresponding to an element's local + name.

      • + +
      • A list attribute list, + corresponding to an element's attribute list.

      • + +
      • A list children, corresponding to + an element's children.

      • +
      + +

      To create a speculative mock element given a namespace, + tagName, and attributes:

      + +
        +
      1. Let element be a new speculative mock element.

      2. + +
      3. Set element's namespace to + namespace.

      4. + +
      5. Set element's local name to + tagName.

      6. + +
      7. Set element's attribute list + to attributes.

      8. + +
      9. Set element's children to a new + empty list.

      10. + +
      11. Optionally, perform a speculative fetch for element.

      12. + +
      13. Return element.

      14. +
      + +

      When the tree builder says to insert an element into a template element's + template contents, if that is a speculative mock element, instead do + nothing. URLs found speculatively inside template elements might themselves be + templates, and must not be speculatively fetched.

      + +
      + +

      Coercing an HTML DOM into an infoset

      @@ -125478,6 +125702,9 @@ INSERT INTERFACES HERE
      [CSSCOLORADJUST]
      CSS Color Adjustment Module, E. Etemad, R. Atanassov, R. Lillesveen, T. Atkins. W3C.
      +
      [CSSDEVICEADAPT]
      +
      CSS Device Adaption, F. Rivoal, M. Rakow. W3C.
      +
      [CSSDISPLAY]
      CSS Display, T. Atkins, E. Etemad. W3C.