Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec for innerHTML setter with non-HTML context element doesn't seem to match browsers #1376

Open
bzbarsky opened this issue Jun 2, 2016 · 8 comments

Comments

@bzbarsky
Copy link
Contributor

bzbarsky commented Jun 2, 2016

Consider this testcase:

<script>
  var div = document.createElementNS("foo", "div");
  div.innerHTML = "<body><p></p></body>";
  document.documentElement.appendChild(document.createTextNode(new XMLSerializer().serializeToString(div)));
</script>

this shows <div xmlns="foo"><p xmlns="http://www.w3.org/1999/xhtml"></p></div> in Safari, Chrome, Firefox. In Edge it shows <div xmlns="foo" />. No idea where the Edge behavior comes from... especially given that this testcase, which per spec should have identical output:

  <script>
    var div = document.createElementNS("foo", "div");
    var r = document.createRange();
    r.selectNodeContents(div);
    var frag = r.createContextualFragment("<body><p></p></body>");
    div.appendChild(frag);
    document.documentElement.appendChild(document.createTextNode(new XMLSerializer().serializeToString(div)));
  </script>

shows <div xmlns="foo"><p xmlns="http://www.w3.org/1999/xhtml"></p></div> in Edge (and Firefox; it throws in Chrome/Safari).

What does the spec say it should output? We start at https://w3c.github.io/DOM-Parsing/#widl-Element-innerHTML and set the context element to the foo|div. Then we invoke https://w3c.github.io/DOM-Parsing/#dfn-concept-parse-fragment. The node document is HTML, so we invoke https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments. The tokenization state is set to "data state" (step 4). We let root be an html|html and put it on the stack of open elements (step 7). The context element is not a template, so step 8 is irrelevant. In step 9 we create a start tag named "div" but don't really do anything with it so far. In step 10, we find that the stack of open elements has only one thing (the html|html|) and hence use our context node to set the insertion mode. In this case that puts us into the "in body" insertion mode in step 16 of https://html.spec.whatwg.org/multipage/syntax.html#reset-the-insertion-mode-appropriately.

Now we run the parser.

When the "body" start tag token is output, we look at https://html.spec.whatwg.org/multipage/syntax.html#tree-construction-dispatcher and see that the adjusted current node is the foo|div (because the stack of open elements only has the html|html on it and the foo|div is our context element). This adjusted current node is not in the HTML namespace, not a MathML or HTML integration point, so we use the "in foreign content" rules. That lands us at https://html.spec.whatwg.org/multipage/syntax.html#parsing-main-inforeign and we match the "A start tag whose tag name is one of .... body" case. This says:

If the parser was originally created for the HTML fragment parsing algorithm, then act as described in the "any other start tag" entry below. (fragment case)

So we go to "Any other start tag". The adjusted current node is not MathML or SVG, so we end up doing "Insert a foreign element for the token, in the same namespace as the adjusted current node." This should create a foo|body and insert it, as far as I can tell. Clearly this is NOT what any of the browsers actually do.

Am I just missing something? @hsivonen, @annevk, help!

// cc @nox

@ArkadiuszMichalski
Copy link

@bzbarsky can you correct second testcase because now we don't see them?

@bzbarsky
Copy link
Contributor Author

bzbarsky commented Jun 2, 2016

@ArkadiuszMichalski Oops, fixed.

@ArkadiuszMichalski
Copy link

But why this happend: "Now we run the parser. When the "body" start tag token is output,..."? We get "in body" insertion mode and this mode for "body" make "Parse error" and ignore token, or am I wrong?

@bzbarsky
Copy link
Contributor Author

bzbarsky commented Jun 2, 2016

We get "in body" insertion mode

No, because https://html.spec.whatwg.org/multipage/syntax.html#tree-construction-dispatcher says to ignore the current insertion mode and just use the rules at https://html.spec.whatwg.org/multipage/syntax.html#parsing-main-inforeign instead.

@ArkadiuszMichalski
Copy link

ArkadiuszMichalski commented Jun 2, 2016

Hmm but before we go to tree construction tokenizer should not output some tag?
"When a token is emitted, it must immediately be handled by the tree construction stage. The tree construction stage can affect the state of the tokenization stage, and can insert additional characters into the stream. (For example, the script element can result in scripts executing and using the dynamic markup insertion APIs to insert characters into the stream being tokenized.)"
And when tokenizer work then it respect insertion mode:
"The exact behaviour of certain states depends on the insertion mode and the stack of open elements....The output of the tokenization step is a series of zero or more of the following tokens..."
Basically I wondering why you analize <body> here, not just <p>.

Update: never mind, I read spec more precisely and now its clear.

@bzbarsky
Copy link
Contributor Author

bzbarsky commented Jun 2, 2016

Hmm but before we go to tree construction tokenizer should not output some tag?

The tokenizer outputs tokens, not tags.

@ArkadiuszMichalski
Copy link

ArkadiuszMichalski commented Jun 2, 2016

And don't forget about Element.insertAdjacentHTML() which per spec (in that case) also should have identical output:

  <script>
    var div = document.createElementNS("foo", "div");
    div.insertAdjacentHTML("afterbegin", "<body><p></p></body>");
    document.documentElement.appendChild(document.createTextNode(new XMLSerializer().serializeToString(div)));
  </script>

Firefox/Chrome output <div xmlns="foo"><p xmlns="http://www.w3.org/1999/xhtml"></p></div>
IE11/Edge/Presto throw (don't support insertAdjacentHTML in that element).

Looks like Gecko in all cases change context to html|body if context has other namespace than HTML, SVG or MathML but other engines are less consistent.

@zcorpan
Copy link
Member

zcorpan commented Jun 7, 2016

Related:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=7855
https://www.w3.org/Bugs/Public/show_bug.cgi?id=17924

Looking at http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4256 it appears that WebKit and Chromium still parse innerHTML into the HTML namespace when the context is an SVG element. Gecko's implementation is different from the spec, seemingly as described in bug 7855.

Without further investigation, I think it would make sense to align the spec with Gecko here. In particular having the HTML parser be able to create elements in other namespaces than HTML/SVG/MathML seems like asking for trouble. cc @hsivonen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants