-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix document.body not including content from DOMParser parsed <body> #108
Fix document.body not including content from DOMParser parsed <body> #108
Conversation
@@ -48,6 +48,11 @@ assert(document.all[1], document.querySelector('head')); | |||
assert(document.all[2], document.querySelector('title')); | |||
assert(document.all[3], document.querySelector('body')); | |||
|
|||
document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html'); | |||
assert(document.body.tagName, 'BODY'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure document.body
isn't null
@@ -48,6 +48,11 @@ assert(document.all[1], document.querySelector('head')); | |||
assert(document.all[2], document.querySelector('title')); | |||
assert(document.all[3], document.querySelector('body')); | |||
|
|||
document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html'); | |||
assert(document.body.tagName, 'BODY'); | |||
assert(document.body, document.querySelector('body')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure document.body
resolves to the same Node
as document.querySelector('body')
document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html'); | ||
assert(document.body.tagName, 'BODY'); | ||
assert(document.body, document.querySelector('body')); | ||
assert(document.body.textContent, 'foo'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure the document.body
includes the content we parsed.
esm/shared/parse-from-string.js
Outdated
// Accessing the document.body property will create the <body> | ||
node = document.body; | ||
create = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTML spec says "The body element of a document is the first of the html
element's children that is either a body
element or a frameset
element, or null if there is no such element." -- https://html.spec.whatwg.org/multipage/dom.html#dom-document-body-dev
But in terms of multiple <body>
and how it behaves, the browser seems to resolve to the same Node
.
new DOMParser().parseFromString('<html><body>foo</body><body>bar</body></html>', "text/html").documentElement.outerHTML
// '<html><head></head><body>foobar</body></html>'
For reference, linkedom
and the browser both behave the same here ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still it's no goal of this project to care about these shenanigans ... but good to know, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. In this case, it's the simplest way to support and turns out to behave identically 👌
|
068a70d
to
8edc4e3
Compare
@WebReflection Updated ✅ |
Build failing. This MR is ignoring SVG and XML constraints, plus it's not considering that the current logic works already for The |
…parsed <body> Fix WebReflection#106 Before: ```js new DOMParser().parseFromString(`<html><body><div>asdf</div></body></html>`, "text/html").body.outerHTML // '<body></body>' ``` After: ```js new DOMParser().parseFromString(`<html><body><div>asdf</div></body></html>`, "text/html").body.outerHTML // '<body><div>asdf</div></body>' ```
8edc4e3
to
176bec3
Compare
cjs/shared/parse-from-string.js
Outdated
@@ -59,7 +59,12 @@ const parseFromString = (document, isHTML, markupLanguage) => { | |||
onopentag(name, attributes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build failing.
Build fixed.
cjs/shared/parse-from-string.js
Outdated
@@ -59,7 +59,12 @@ const parseFromString = (document, isHTML, markupLanguage) => { | |||
onopentag(name, attributes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This MR is ignoring SVG and XML constraints
I'm not sure what you mean (can you explain this more obviously)? The logic was under the isHTML
flag and doesn't apply to SVG and XML as far as I know.
@@ -16692,7 +16692,7 @@ class HTMLDocument extends Document$1 { | |||
get head() { | |||
const {documentElement} = this; | |||
let {firstElementChild} = documentElement; | |||
if (!firstElementChild) { | |||
if (!firstElementChild || firstElementChild.tagName !== 'HEAD') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] plus it's not considering that the current logic works already for document.body so it's kinda patching it in the wrong place.
The body getter is likely the place to look for a change, not the parse-from-string one, as that file has zero responsibility on how that document or its getters should react .. and bear in mind document.head is another one to eventually fix in the original issue.
Is this more what you mean?
Previously, the code would always assume the that first child was a <head>
element but in the case of <!DOCTYPE html><html><body><div>foo</div></body></html>
, it's a <body>
. This means that when the body
getter calls the head
getter, it returned the existing <body>
and created another <body>
for itself after the supposed <head>
.
The new code makes sure that the first child is a <head>
element otherwise creates one.
This current assumptions about node order here has the downside that if someone defines stuff out of order, it will have 2 <head>
when you try to access document.head
// notice <head> after <body>
const document = new DOMParser().parseFromString(`<!DOCTYPE html><html><body></body><head><title>"hello"</title></head></html>`, "text/html").head.outerHTML
// '<head></head>'
We could fix this up better in parse-from-string
if we enforce <head>
as the first child and <body>
after when documents are initially parsed and created. But I'd go back to the document.body
creation pattern like I had previously to keep things simple.
Fixed with latest. |
Thanks for incorporating the changes @WebReflection 😀 Looks pretty similar: 535c5bc |
…rowser and linkedom (for SSR) Context: - WebReflection/linkedom#106 - WebReflection/linkedom#108
…er and linkedom (for SSR) Before: ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` ❌ new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body></body>' ``` After (consistent matching output): ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' ``` --- `linkedom` goal is to be close to the current DOM standard, but [not too close](https://github.com/WebReflection/linkedom#faq). Focused on the streamlined cases for server-side rendering (SSR). Here is some context around getting `DOMParser` to interpret things better. The conclusion was to only support the explicit standard cases with a `<html><body></body></html>` specified instead of adding the magic HTML document creation and massaging that the browser does. - WebReflection/linkedom#106 - WebReflection/linkedom#108
…dom`](https://github.com/WebReflection/linkedom) (SSR). - [`linkedom`](https://github.com/WebReflection/linkedom) is being used https://github.com/matrix-org/matrix-public-archive to server-side render (SSR) Hydrogen (`hydrogen-view-sdk`) - This is being fixed by using a explicit HTML wrapper boilerplate with `DOMParser` to get a matching result in the browser and `linkedom`. Currently `parseHTML` is only used for HTML content bodies in events. Events with replies have content bodies that look like `<mx-reply>Hello</mx-reply> What's up` so they're parsed as HTML to strip out the `<mx-reply>` part. Before | After --- | --- ![](https://user-images.githubusercontent.com/558581/153692011-2f0e7114-fcb4-481f-b217-49f461b1740a.png) | ![](https://user-images.githubusercontent.com/558581/153692016-52582fdb-abd9-439d-9dce-3f04da6959db.png) Before: ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` ❌ new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body></body>' ``` After (consistent matching output): ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' ``` `linkedom` goal is to be close to the current DOM standard, but [not too close](https://github.com/WebReflection/linkedom#faq). Focused on the streamlined cases for server-side rendering (SSR). Here is some context around getting `DOMParser` to interpret things better. The conclusion was to only support the explicit standard cases with a `<html><body></body></html>` specified instead of adding the magic HTML document creation and massaging that the browser does. - WebReflection/linkedom#106 - WebReflection/linkedom#108 --- Part of #653 to support server-side rendering Hydrogen for the [`matrix-public-archive`](https://github.com/matrix-org/matrix-public-archive) project.
Fix
document.body
not including content fromDOMParser
parsed<body>
Fix #106
Before:
After: