HTML parser test failures after #6993 #6439

TRowbotham · 2021-03-03T21:11:07Z

I'm seeing 2 test failures with the recent change from #6399; in particular, the change to use the "in body insertion mode" directly, instead of reprocessing with the current insertion mode. In both cases[1][2], we lose out on foster parenting with the switch to using the "in body insertion mode", which causes the <p>baz part to be inserted into the table rather than outside the table. Switching back to previous behavior where reprocessing the token uses the current insertion mode seems to resolve the failures.

[1] https://github.com/html5lib/html5lib-tests/blob/master/tree-construction/tests9.dat#L270
[2] https://github.com/html5lib/html5lib-tests/blob/master/tree-construction/tests10.dat#L236

The text was updated successfully, but these errors were encountered:

This is intended to match Chromium and WebKit. Fixes #5117.

annevk · 2021-03-04T09:57:11Z

cc @whatwg/html-parser

zcorpan · 2021-03-04T12:07:38Z

Thanks @TRowbotham!

The rationale for changing the reprocess the token part was this:

However, the spec doesn't say to change insertion mode, or use the rules for "in body", only to reprocess the token. Won't this just be reprocessed in "in foreign content" again, at least if nothing was popped? Would things work if it said "Process the token using the rules for the "in body" insertion mode."?

#5117 (comment)

So, evidently, things wouldn't work with "process the token using the rules for". What about my original question?

Given this case:

<svg id=svg></svg>
<script>
svg.innerHTML = '<p>';
</script>

What happens if the spec says to reprocess the token? Does it reprocess the token still in the "in foreign content" mode, and again say to reprocess the token (i.e. get stuck in an infinite loop)?

If yes, maybe it can be fixed by setting a flag before reprocessing the token, and checking for that flag in the tree construction dispatcher (and resetting the flag there also)?

TRowbotham · 2021-03-04T13:57:52Z

Yes, the given case causes the parser to get stuck in an infinite loop if it just says reprocess the token.

It seems I slightly misunderstood the meaning of reprocess the token, but perhaps it works in our favor. I understood it as "reprocess the token using the current insertion mode", and this seemingly appears to yield the desired result; the given case does not result in an infinite loop and the 2 test failures I was seeing go away.

Is there anything wrong with changing the spec to read "reprocess the token using the current insertion mode" instead of "Process the token using the rules for the "in body" insertion mode."? If the intent is to break out of foreign content to whatever the last non-foreign element was, then using whatever the last insertion mode was kinda makes sense.

zcorpan · 2021-03-04T20:17:33Z

I understood it as "reprocess the token using the current insertion mode"

I can't confidently say whether or not this is the right interpretation. This should probably be made clearer. 🙂

Is there anything wrong with changing the spec to read "reprocess the token using the current insertion mode" instead of "Process the token using the rules for the "in body" insertion mode."?

As long as it does what we need and is unambiguous, it's good. I'd like to hear what others think, in particular @hsivonen

zcorpan · 2021-03-05T10:56:53Z

Is there any difference between:

+    <p>Reprocess the token according to the rules given in the section corresponding to the current
+    <span>insertion mode</span> in HTML content.</p>

and

+    <p>Process the token according to the rules given in the section corresponding to the current
+    <span>insertion mode</span> in HTML content.</p>

(i.e., "Reprocess the token" vs "Process the token")

The clause in "Any other end tag" in foreign content says "Process the token ...".

The regression was introduced in f690ad9 (PR #6399) For this case: ``` <table><math><p>foo ``` the `<p>` token would not have foster parenting enabled, thus inserting it into the table. Fixes #6439.

zcorpan · 2021-03-05T11:05:19Z

PR: #6455

hsivonen · 2021-03-10T13:52:00Z

This is the sort of thing that's hard to review without writing the code. I'll try to get to the code soon.

TRowbotham · 2021-03-11T19:59:32Z

PR #6455 also appears to resolve #1376

The regression was introduced in f690ad9 (#6399). For the case: <table><math><p>foo the <p> token would not have foster parenting enabled, thus inserting it into the table. Fixes #6439.

The regression was introduced in f690ad9 (whatwg#6399). For the case: <table><math><p>foo the <p> token would not have foster parenting enabled, thus inserting it into the table. Fixes whatwg#6439.

corresponding changes in HTML spec are: whatwg/html@f690ad9 and follow-up discussion at whatwg/html#6439

* wip on updating html5lib-tests * fix up parse error parsing * add better debug output * wip * wip * wip * wip * adjust all switches to BogusComment (according to html5gum) * wip * wip * wip * wip * wip * wip * wip (test3 done) * fix test1 * wip on entities.test * get rid of addnl_allowed in charref tokenizer * remove bogusname??? * fix escapeFlag.test: End tag surrounded by bogus comment in RCDATA or RAWTEXT (in state RawData(Rawtext)) * update html5lib tests * Revert "remove bogusname???" This reverts commit 575b077. * wip restore bogusname * more bugfixes * Revert "wip restore bogusname" This reverts commit eb28165. * fix a bug when peeking characters in BeforeAttributeValue * make eat() pre-process input characters input where it matters (JSON-escaped): "<!DOCTYPE0\r\nPUBLIC'" * update charref states * add regression tests, skip broken test * fix hang * fix bug where ignore_lf was not reset during unconsuming * fix webkit02.dat-26 test * fix wbekit02.dat-22 * fix ack self-closing * fix tests26.dat-19 * fix foreign-fragment.dat-65 corresponding changes in HTML spec are: whatwg/html@f690ad9 and follow-up discussion at whatwg/html#6439 * fix search-element.dat-0 * fix search-element.dat-1 * fix bug in charref tokenizer wrt newline normalization

TRowbotham referenced this issue Mar 3, 2021

HTML parser: make breaking out of foreign content apply in innerHTML

f690ad9

This is intended to match Chromium and WebKit. Fixes #5117.

annevk added the topic: parser label Mar 4, 2021

annevk changed the title ~~Test failures after #6993~~ HTML parser test failures after #6993 Mar 4, 2021

annevk assigned zcorpan Mar 4, 2021

zcorpan mentioned this issue Mar 5, 2021

HTML parser: fix a regression with foster parenting in foreign content #6455

Merged

3 tasks

whatwg deleted a comment May 7, 2021

annevk mentioned this issue Jul 19, 2021

Foster parenting doesn't happen for HTML elements in foreign content #6808

Closed

domenic closed this as completed in #6455 Jul 20, 2021

untitaker added a commit to untitaker/html5ever that referenced this issue Jul 18, 2023

fix foreign-fragment.dat-65

dcc293c

corresponding changes in HTML spec are: whatwg/html@f690ad9 and follow-up discussion at whatwg/html#6439

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML parser test failures after #6993 #6439

HTML parser test failures after #6993 #6439

TRowbotham commented Mar 3, 2021

annevk commented Mar 4, 2021

zcorpan commented Mar 4, 2021

TRowbotham commented Mar 4, 2021

zcorpan commented Mar 4, 2021

zcorpan commented Mar 5, 2021

zcorpan commented Mar 5, 2021

hsivonen commented Mar 10, 2021

TRowbotham commented Mar 11, 2021

HTML parser test failures after #6993 #6439

HTML parser test failures after #6993 #6439

Comments

TRowbotham commented Mar 3, 2021

annevk commented Mar 4, 2021

zcorpan commented Mar 4, 2021

TRowbotham commented Mar 4, 2021

zcorpan commented Mar 4, 2021

zcorpan commented Mar 5, 2021

zcorpan commented Mar 5, 2021

hsivonen commented Mar 10, 2021

TRowbotham commented Mar 11, 2021