Fix document.body not including content from DOMParser parsed <body> #108

MadLittleMods · 2022-02-09T20:05:04Z

Fix document.body not including content from DOMParser parsed <body>

Before:

new DOMParser().parseFromString(`<html><body><div>asdf</div></body></html>`, "text/html").body.outerHTML
// '<body></body>'

After:

new DOMParser().parseFromString(`<html><body><div>asdf</div></body></html>`, "text/html").body.outerHTML
// '<body><div>asdf</div></body>'

MadLittleMods · 2022-02-09T20:09:03Z

test/html/document.js

@@ -48,6 +48,11 @@ assert(document.all[1], document.querySelector('head'));
 assert(document.all[2], document.querySelector('title'));
 assert(document.all[3], document.querySelector('body'));

+document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html');
+assert(document.body.tagName, 'BODY');


Making sure document.body isn't null

MadLittleMods · 2022-02-09T20:09:22Z

test/html/document.js

@@ -48,6 +48,11 @@ assert(document.all[1], document.querySelector('head'));
 assert(document.all[2], document.querySelector('title'));
 assert(document.all[3], document.querySelector('body'));

+document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html');
+assert(document.body.tagName, 'BODY');
+assert(document.body, document.querySelector('body'));


Making sure document.body resolves to the same Node as document.querySelector('body')

MadLittleMods · 2022-02-09T20:09:53Z

test/html/document.js

+document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html');
+assert(document.body.tagName, 'BODY');
+assert(document.body, document.querySelector('body'));
+assert(document.body.textContent, 'foo');


Making sure the document.body includes the content we parsed.

MadLittleMods · 2022-02-09T20:15:20Z

esm/shared/parse-from-string.js

+          // Accessing the document.body property will create the <body>
+          node = document.body;
+          create = false;
+        }


The HTML spec says "The body element of a document is the first of the html element's children that is either a body element or a frameset element, or null if there is no such element." -- https://html.spec.whatwg.org/multipage/dom.html#dom-document-body-dev

But in terms of multiple <body> and how it behaves, the browser seems to resolve to the same Node.

new DOMParser().parseFromString('<html><body>foo</body><body>bar</body></html>', "text/html").documentElement.outerHTML // '<html><head></head><body>foobar</body></html>'

For reference, linkedom and the browser both behave the same here ✅

still it's no goal of this project to care about these shenanigans ... but good to know, thanks

Understood. In this case, it's the simplest way to support and turns out to behave identically 👌

WebReflection · 2022-02-09T20:25:23Z

Fix #106 - ... is the right commit message, so it relates to the issue, thanks!

MadLittleMods · 2022-02-09T20:27:18Z

@WebReflection Updated ✅

WebReflection · 2022-02-09T20:35:47Z

Build failing. This MR is ignoring SVG and XML constraints, plus it's not considering that the current logic works already for document.body so it's kinda patching it in the wrong place.

The body getter is likely the place to look for a change, not the parse-from-string one, as that file has zero responsibility on how that document or its getters should react .. and bear in mind document.head is another one to eventually fix in the original issue.

…parsed <body> Fix WebReflection#106 Before: ```js new DOMParser().parseFromString(`<html><body><div>asdf</div></body></html>`, "text/html").body.outerHTML // '<body></body>' ``` After: ```js new DOMParser().parseFromString(`<html><body><div>asdf</div></body></html>`, "text/html").body.outerHTML // '<body><div>asdf</div></body>' ```

MadLittleMods · 2022-02-09T21:13:47Z

cjs/shared/parse-from-string.js

@@ -59,7 +59,12 @@ const parseFromString = (document, isHTML, markupLanguage) => {
    onopentag(name, attributes) {


Build failing.

Build fixed.

MadLittleMods · 2022-02-09T21:13:59Z

cjs/shared/parse-from-string.js

@@ -59,7 +59,12 @@ const parseFromString = (document, isHTML, markupLanguage) => {
    onopentag(name, attributes) {


This MR is ignoring SVG and XML constraints

I'm not sure what you mean (can you explain this more obviously)? The logic was under the isHTML flag and doesn't apply to SVG and XML as far as I know.

MadLittleMods · 2022-02-09T23:43:50Z

worker.js

@@ -16692,7 +16692,7 @@ class HTMLDocument extends Document$1 {
  get head() {
    const {documentElement} = this;
    let {firstElementChild} = documentElement;
-    if (!firstElementChild) {
+    if (!firstElementChild || firstElementChild.tagName !== 'HEAD') {


[...] plus it's not considering that the current logic works already for document.body so it's kinda patching it in the wrong place.

The body getter is likely the place to look for a change, not the parse-from-string one, as that file has zero responsibility on how that document or its getters should react .. and bear in mind document.head is another one to eventually fix in the original issue.

Is this more what you mean?

Previously, the code would always assume the that first child was a <head> element but in the case of <!DOCTYPE html><html><body><div>foo</div></body></html>, it's a <body>. This means that when the body getter calls the head getter, it returned the existing <body> and created another <body> for itself after the supposed <head>.

The new code makes sure that the first child is a <head> element otherwise creates one.

This current assumptions about node order here has the downside that if someone defines stuff out of order, it will have 2 <head> when you try to access document.head

// notice <head> after <body> const document = new DOMParser().parseFromString(`<!DOCTYPE html><html><body></body><head><title>"hello"</title></head></html>`, "text/html").head.outerHTML // '<head></head>'

We could fix this up better in parse-from-string if we enforce <head> as the first child and <body> after when documents are initially parsed and created. But I'd go back to the document.body creation pattern like I had previously to keep things simple.

WebReflection · 2022-02-11T11:04:08Z

Fixed with latest.

MadLittleMods · 2022-02-11T21:22:17Z

Thanks for incorporating the changes @WebReflection 😀

Looks pretty similar: 535c5bc

…rowser and linkedom (for SSR) Context: - WebReflection/linkedom#106 - WebReflection/linkedom#108

…er and linkedom (for SSR) Before: ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` ❌ new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body></body>' ``` After (consistent matching output): ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' ``` --- `linkedom` goal is to be close to the current DOM standard, but [not too close](https://github.com/WebReflection/linkedom#faq). Focused on the streamlined cases for server-side rendering (SSR). Here is some context around getting `DOMParser` to interpret things better. The conclusion was to only support the explicit standard cases with a `<html><body></body></html>` specified instead of adding the magic HTML document creation and massaging that the browser does. - WebReflection/linkedom#106 - WebReflection/linkedom#108

…dom`](https://github.com/WebReflection/linkedom) (SSR). - [`linkedom`](https://github.com/WebReflection/linkedom) is being used https://github.com/matrix-org/matrix-public-archive to server-side render (SSR) Hydrogen (`hydrogen-view-sdk`) - This is being fixed by using a explicit HTML wrapper boilerplate with `DOMParser` to get a matching result in the browser and `linkedom`. Currently `parseHTML` is only used for HTML content bodies in events. Events with replies have content bodies that look like `<mx-reply>Hello</mx-reply> What's up` so they're parsed as HTML to strip out the `<mx-reply>` part. Before | After --- | --- ![](https://user-images.githubusercontent.com/558581/153692011-2f0e7114-fcb4-481f-b217-49f461b1740a.png) | ![](https://user-images.githubusercontent.com/558581/153692016-52582fdb-abd9-439d-9dce-3f04da6959db.png) Before: ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` ❌ new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body></body>' ``` After (consistent matching output): ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' ``` `linkedom` goal is to be close to the current DOM standard, but [not too close](https://github.com/WebReflection/linkedom#faq). Focused on the streamlined cases for server-side rendering (SSR). Here is some context around getting `DOMParser` to interpret things better. The conclusion was to only support the explicit standard cases with a `<html><body></body></html>` specified instead of adding the magic HTML document creation and massaging that the browser does. - WebReflection/linkedom#106 - WebReflection/linkedom#108 --- Part of #653 to support server-side rendering Hydrogen for the [`matrix-public-archive`](https://github.com/matrix-org/matrix-public-archive) project.

MadLittleMods mentioned this pull request Feb 9, 2022

Draft: Scratch changes for matrix-public-archive element-hq/hydrogen-web#653

Draft

MadLittleMods commented Feb 9, 2022

View reviewed changes

MadLittleMods force-pushed the 106-fix-parsed-document-body-not-including-children branch from 068a70d to 8edc4e3 Compare February 9, 2022 20:26

MadLittleMods force-pushed the 106-fix-parsed-document-body-not-including-children branch from 8edc4e3 to 176bec3 Compare February 9, 2022 21:13

Make sure the first element is actually a <head>

7fb0a46

MadLittleMods commented Feb 9, 2022

View reviewed changes

Remove extra console.log

bb97c96

MadLittleMods requested a review from WebReflection February 9, 2022 23:47

WebReflection closed this Feb 11, 2022

MadLittleMods added a commit to element-hq/hydrogen-web that referenced this pull request Feb 12, 2022

Use explicit HTML document boilerplate to get consistent results in b…

48825ea

…rowser and linkedom (for SSR) Context: - WebReflection/linkedom#106 - WebReflection/linkedom#108

MadLittleMods mentioned this pull request Feb 12, 2022

Fix missing reply text when message body is parsed as HTML in linkedom (SSR) element-hq/hydrogen-web#666

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix document.body not including content from DOMParser parsed <body> #108

Fix document.body not including content from DOMParser parsed <body> #108

MadLittleMods commented Feb 9, 2022

MadLittleMods Feb 9, 2022

MadLittleMods Feb 9, 2022

MadLittleMods Feb 9, 2022

MadLittleMods Feb 9, 2022

WebReflection Feb 9, 2022

MadLittleMods Feb 9, 2022 •

edited

Loading

WebReflection commented Feb 9, 2022

MadLittleMods commented Feb 9, 2022

WebReflection commented Feb 9, 2022

MadLittleMods Feb 9, 2022

MadLittleMods Feb 9, 2022 •

edited

Loading

MadLittleMods Feb 9, 2022

WebReflection commented Feb 11, 2022

MadLittleMods commented Feb 11, 2022

		@@ -59,7 +59,12 @@ const parseFromString = (document, isHTML, markupLanguage) => {
		onopentag(name, attributes) {

Fix document.body not including content from DOMParser parsed <body> #108

Fix document.body not including content from DOMParser parsed <body> #108

Conversation

MadLittleMods commented Feb 9, 2022

MadLittleMods Feb 9, 2022

Choose a reason for hiding this comment

MadLittleMods Feb 9, 2022

Choose a reason for hiding this comment

MadLittleMods Feb 9, 2022

Choose a reason for hiding this comment

MadLittleMods Feb 9, 2022

Choose a reason for hiding this comment

WebReflection Feb 9, 2022

Choose a reason for hiding this comment

MadLittleMods Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

WebReflection commented Feb 9, 2022

MadLittleMods commented Feb 9, 2022

WebReflection commented Feb 9, 2022

MadLittleMods Feb 9, 2022

Choose a reason for hiding this comment

MadLittleMods Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

MadLittleMods Feb 9, 2022

Choose a reason for hiding this comment

WebReflection commented Feb 11, 2022

MadLittleMods commented Feb 11, 2022

MadLittleMods Feb 9, 2022 •

edited

Loading

MadLittleMods Feb 9, 2022 •

edited

Loading