Spec for innerHTML setter with non-HTML context element doesn't seem to match browsers #1376

bzbarsky · 2016-06-02T15:06:05Z

Consider this testcase:

<script>
  var div = document.createElementNS("foo", "div");
  div.innerHTML = "<body><p></p></body>";
  document.documentElement.appendChild(document.createTextNode(new XMLSerializer().serializeToString(div)));

</script>

this shows <div xmlns="foo"><p xmlns="http://www.w3.org/1999/xhtml"></p></div> in Safari, Chrome, Firefox. In Edge it shows <div xmlns="foo" />. No idea where the Edge behavior comes from... especially given that this testcase, which per spec should have identical output:

  <script>
    var div = document.createElementNS("foo", "div");
    var r = document.createRange();
    r.selectNodeContents(div);
    var frag = r.createContextualFragment("<body><p></p></body>");
    div.appendChild(frag);
    document.documentElement.appendChild(document.createTextNode(new XMLSerializer().serializeToString(div)));
  </script>

shows <div xmlns="foo"><p xmlns="http://www.w3.org/1999/xhtml"></p></div> in Edge (and Firefox; it throws in Chrome/Safari).

What does the spec say it should output? We start at https://w3c.github.io/DOM-Parsing/#widl-Element-innerHTML and set the context element to the foo|div. Then we invoke https://w3c.github.io/DOM-Parsing/#dfn-concept-parse-fragment. The node document is HTML, so we invoke https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments. The tokenization state is set to "data state" (step 4). We let root be an html|html and put it on the stack of open elements (step 7). The context element is not a template, so step 8 is irrelevant. In step 9 we create a start tag named "div" but don't really do anything with it so far. In step 10, we find that the stack of open elements has only one thing (the html|html|) and hence use our context node to set the insertion mode. In this case that puts us into the "in body" insertion mode in step 16 of https://html.spec.whatwg.org/multipage/syntax.html#reset-the-insertion-mode-appropriately.

Now we run the parser.

When the "body" start tag token is output, we look at https://html.spec.whatwg.org/multipage/syntax.html#tree-construction-dispatcher and see that the adjusted current node is the foo|div (because the stack of open elements only has the html|html on it and the foo|div is our context element). This adjusted current node is not in the HTML namespace, not a MathML or HTML integration point, so we use the "in foreign content" rules. That lands us at https://html.spec.whatwg.org/multipage/syntax.html#parsing-main-inforeign and we match the "A start tag whose tag name is one of .... body" case. This says:

If the parser was originally created for the HTML fragment parsing algorithm, then act as described in the "any other start tag" entry below. (fragment case)

So we go to "Any other start tag". The adjusted current node is not MathML or SVG, so we end up doing "Insert a foreign element for the token, in the same namespace as the adjusted current node." This should create a foo|body and insert it, as far as I can tell. Clearly this is NOT what any of the browsers actually do.

Am I just missing something? @hsivonen, @annevk, help!

// cc @nox

The text was updated successfully, but these errors were encountered:

ArkadiuszMichalski · 2016-06-02T15:22:02Z

@bzbarsky can you correct second testcase because now we don't see them?

bzbarsky · 2016-06-02T15:49:49Z

@ArkadiuszMichalski Oops, fixed.

ArkadiuszMichalski · 2016-06-02T15:55:50Z

But why this happend: "Now we run the parser. When the "body" start tag token is output,..."? We get "in body" insertion mode and this mode for "body" make "Parse error" and ignore token, or am I wrong?

bzbarsky · 2016-06-02T16:15:55Z

We get "in body" insertion mode

No, because https://html.spec.whatwg.org/multipage/syntax.html#tree-construction-dispatcher says to ignore the current insertion mode and just use the rules at https://html.spec.whatwg.org/multipage/syntax.html#parsing-main-inforeign instead.

ArkadiuszMichalski · 2016-06-02T16:43:37Z

Hmm but before we go to tree construction tokenizer should not output some tag?
"When a token is emitted, it must immediately be handled by the tree construction stage. The tree construction stage can affect the state of the tokenization stage, and can insert additional characters into the stream. (For example, the script element can result in scripts executing and using the dynamic markup insertion APIs to insert characters into the stream being tokenized.)"
And when tokenizer work then it respect insertion mode:
"The exact behaviour of certain states depends on the insertion mode and the stack of open elements....The output of the tokenization step is a series of zero or more of the following tokens..."
Basically I wondering why you analize <body> here, not just <p>.

Update: never mind, I read spec more precisely and now its clear.

bzbarsky · 2016-06-02T17:23:34Z

Hmm but before we go to tree construction tokenizer should not output some tag?

The tokenizer outputs tokens, not tags.

ArkadiuszMichalski · 2016-06-02T21:29:10Z

And don't forget about Element.insertAdjacentHTML() which per spec (in that case) also should have identical output:

  <script>
    var div = document.createElementNS("foo", "div");
    div.insertAdjacentHTML("afterbegin", "<body><p></p></body>");
    document.documentElement.appendChild(document.createTextNode(new XMLSerializer().serializeToString(div)));
  </script>

Firefox/Chrome output <div xmlns="foo"><p xmlns="http://www.w3.org/1999/xhtml"></p></div>
IE11/Edge/Presto throw (don't support insertAdjacentHTML in that element).

Looks like Gecko in all cases change context to html|body if context has other namespace than HTML, SVG or MathML but other engines are less consistent.

zcorpan · 2016-06-07T11:23:04Z

Looking at http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4256 it appears that WebKit and Chromium still parse innerHTML into the HTML namespace when the context is an SVG element. Gecko's implementation is different from the spec, seemingly as described in bug 7855.

Without further investigation, I think it would make sense to align the spec with Gecko here. In particular having the HTML parser be able to create elements in other namespaces than HTML/SVG/MathML seems like asking for trouble. cc @hsivonen

nox mentioned this issue Jun 3, 2016

Investigate the three remaining failures in createContextualFragment.html servo/servo#11577

Open

zcorpan added the topic: parser label Jun 7, 2016

annevk mentioned this issue Aug 30, 2018

Creating a parser for a fragment #3945

Open

jasonwilliams mentioned this issue Nov 5, 2018

innerHTML on <style> elements causes inconsistent results across browsers webcompat/web-bugs#20910

Closed

TRowbotham mentioned this issue Mar 11, 2021

HTML parser test failures after #6993 #6439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec for innerHTML setter with non-HTML context element doesn't seem to match browsers #1376

Spec for innerHTML setter with non-HTML context element doesn't seem to match browsers #1376

bzbarsky commented Jun 2, 2016 •

edited

Loading

ArkadiuszMichalski commented Jun 2, 2016

bzbarsky commented Jun 2, 2016

ArkadiuszMichalski commented Jun 2, 2016

bzbarsky commented Jun 2, 2016

ArkadiuszMichalski commented Jun 2, 2016 •

edited

Loading

bzbarsky commented Jun 2, 2016

ArkadiuszMichalski commented Jun 2, 2016 •

edited

Loading

zcorpan commented Jun 7, 2016

Spec for innerHTML setter with non-HTML context element doesn't seem to match browsers #1376

Spec for innerHTML setter with non-HTML context element doesn't seem to match browsers #1376

Comments

bzbarsky commented Jun 2, 2016 • edited Loading

ArkadiuszMichalski commented Jun 2, 2016

bzbarsky commented Jun 2, 2016

ArkadiuszMichalski commented Jun 2, 2016

bzbarsky commented Jun 2, 2016

ArkadiuszMichalski commented Jun 2, 2016 • edited Loading

bzbarsky commented Jun 2, 2016

ArkadiuszMichalski commented Jun 2, 2016 • edited Loading

zcorpan commented Jun 7, 2016

bzbarsky commented Jun 2, 2016 •

edited

Loading

ArkadiuszMichalski commented Jun 2, 2016 •

edited

Loading

ArkadiuszMichalski commented Jun 2, 2016 •

edited

Loading