Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML parser test failures after #6993 #6439

Closed
TRowbotham opened this issue Mar 3, 2021 · 8 comments · Fixed by #6455
Closed

HTML parser test failures after #6993 #6439

TRowbotham opened this issue Mar 3, 2021 · 8 comments · Fixed by #6455
Assignees

Comments

@TRowbotham
Copy link
Contributor

I'm seeing 2 test failures with the recent change from #6399; in particular, the change to use the "in body insertion mode" directly, instead of reprocessing with the current insertion mode. In both cases[1][2], we lose out on foster parenting with the switch to using the "in body insertion mode", which causes the <p>baz part to be inserted into the table rather than outside the table. Switching back to previous behavior where reprocessing the token uses the current insertion mode seems to resolve the failures.

[1] https://github.com/html5lib/html5lib-tests/blob/master/tree-construction/tests9.dat#L270
[2] https://github.com/html5lib/html5lib-tests/blob/master/tree-construction/tests10.dat#L236

TRowbotham referenced this issue Mar 3, 2021
This is intended to match Chromium and WebKit.

Fixes #5117.
@annevk
Copy link
Member

annevk commented Mar 4, 2021

cc @whatwg/html-parser

@annevk annevk changed the title Test failures after #6993 HTML parser test failures after #6993 Mar 4, 2021
@zcorpan
Copy link
Member

zcorpan commented Mar 4, 2021

Thanks @TRowbotham!

The rationale for changing the reprocess the token part was this:

However, the spec doesn't say to change insertion mode, or use the rules for "in body", only to reprocess the token. Won't this just be reprocessed in "in foreign content" again, at least if nothing was popped? Would things work if it said "Process the token using the rules for the "in body" insertion mode."?

#5117 (comment)

So, evidently, things wouldn't work with "process the token using the rules for". What about my original question?

Given this case:

<svg id=svg></svg>
<script>
svg.innerHTML = '<p>';
</script>

What happens if the spec says to reprocess the token? Does it reprocess the token still in the "in foreign content" mode, and again say to reprocess the token (i.e. get stuck in an infinite loop)?

If yes, maybe it can be fixed by setting a flag before reprocessing the token, and checking for that flag in the tree construction dispatcher (and resetting the flag there also)?

@TRowbotham
Copy link
Contributor Author

Yes, the given case causes the parser to get stuck in an infinite loop if it just says reprocess the token.

It seems I slightly misunderstood the meaning of reprocess the token, but perhaps it works in our favor. I understood it as "reprocess the token using the current insertion mode", and this seemingly appears to yield the desired result; the given case does not result in an infinite loop and the 2 test failures I was seeing go away.

Is there anything wrong with changing the spec to read "reprocess the token using the current insertion mode" instead of "Process the token using the rules for the "in body" insertion mode."? If the intent is to break out of foreign content to whatever the last non-foreign element was, then using whatever the last insertion mode was kinda makes sense.

@zcorpan
Copy link
Member

zcorpan commented Mar 4, 2021

I understood it as "reprocess the token using the current insertion mode"

I can't confidently say whether or not this is the right interpretation. This should probably be made clearer. 🙂

Is there anything wrong with changing the spec to read "reprocess the token using the current insertion mode" instead of "Process the token using the rules for the "in body" insertion mode."?

As long as it does what we need and is unambiguous, it's good. I'd like to hear what others think, in particular @hsivonen

@zcorpan
Copy link
Member

zcorpan commented Mar 5, 2021

Is there any difference between:

+    <p>Reprocess the token according to the rules given in the section corresponding to the current
+    <span>insertion mode</span> in HTML content.</p>

and

+    <p>Process the token according to the rules given in the section corresponding to the current
+    <span>insertion mode</span> in HTML content.</p>

(i.e., "Reprocess the token" vs "Process the token")

The clause in "Any other end tag" in foreign content says "Process the token ...".

zcorpan added a commit that referenced this issue Mar 5, 2021
The regression was introduced in f690ad9
(PR #6399)

For this case:
```
<table><math><p>foo
```
the `<p>` token would not have foster parenting enabled, thus inserting it
into the table.

Fixes #6439.
@zcorpan
Copy link
Member

zcorpan commented Mar 5, 2021

PR: #6455

@hsivonen
Copy link
Member

This is the sort of thing that's hard to review without writing the code. I'll try to get to the code soon.

@TRowbotham
Copy link
Contributor Author

PR #6455 also appears to resolve #1376

@whatwg whatwg deleted a comment May 7, 2021
domenic pushed a commit that referenced this issue Jul 20, 2021
The regression was introduced in f690ad9 (#6399).

For the case:

<table><math><p>foo

the <p> token would not have foster parenting enabled, thus inserting it into the table.

Fixes #6439.
mfreed7 pushed a commit to mfreed7/html that referenced this issue Jun 3, 2022
The regression was introduced in f690ad9 (whatwg#6399).

For the case:

<table><math><p>foo

the <p> token would not have foster parenting enabled, thus inserting it into the table.

Fixes whatwg#6439.
untitaker added a commit to untitaker/html5ever that referenced this issue Jul 18, 2023
corresponding changes in HTML spec are: whatwg/html@f690ad9

and follow-up discussion at whatwg/html#6439
github-merge-queue bot pushed a commit to servo/html5ever that referenced this issue Aug 11, 2023
* wip on updating html5lib-tests

* fix up parse error parsing

* add better debug output

* wip

* wip

* wip

* wip

* adjust all switches to BogusComment (according to html5gum)

* wip

* wip

* wip

* wip

* wip

* wip

* wip (test3 done)

* fix test1

* wip on entities.test

* get rid of addnl_allowed in charref tokenizer

* remove bogusname???

* fix escapeFlag.test: End tag surrounded by bogus comment in RCDATA or RAWTEXT (in state RawData(Rawtext))

* update html5lib tests

* Revert "remove bogusname???"

This reverts commit 575b077.

* wip restore bogusname

* more bugfixes

* Revert "wip restore bogusname"

This reverts commit eb28165.

* fix a bug when peeking characters in BeforeAttributeValue

* make eat() pre-process input characters

input where it matters (JSON-escaped):

"<!DOCTYPE0\r\nPUBLIC'"

* update charref states

* add regression tests, skip broken test

* fix hang

* fix bug where ignore_lf was not reset during unconsuming

* fix webkit02.dat-26 test

* fix wbekit02.dat-22

* fix ack self-closing

* fix tests26.dat-19

* fix foreign-fragment.dat-65

corresponding changes in HTML spec are: whatwg/html@f690ad9

and follow-up discussion at whatwg/html#6439

* fix search-element.dat-0

* fix search-element.dat-1

* fix bug in charref tokenizer wrt newline normalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants