Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute DOM representation and parsing is inconsistent #4275

Closed
gijsk opened this issue Jan 7, 2019 · 2 comments
Closed

Attribute DOM representation and parsing is inconsistent #4275

gijsk opened this issue Jan 7, 2019 · 2 comments

Comments

@gijsk
Copy link

gijsk commented Jan 7, 2019

(Filed as a result of mozilla/readability#392 ; I'm not 100% sure whether this should be considered a DOM issue or an HTML parser issue; feel free to move as appropriate )

STR:

  1. open https://opinion.udn.com/opinion/story/10124/3561413 in recent versions of Chrome or Firefox
  2. in their respective devtools console, run something like this:
console.log(Array.from(document.querySelectorAll("table")).map(t => t.outerHTML))

At the moment, the DOM includes 2 or 3 tables with an attribute whose name is "0", as evidenced from the console log.

The original markup of the page as inspected via "View Source", at time of writing, looked something like:

<table width="90% border="0>

Note the opening quote before 90% and 'closing' quote after border=.

Obviously the markup's intent is to have a table with 2 attributes, width="90%" and border="0". But both browsers parse this as attributes with name '0' and the empty string as a value. I assume this parsing is proscribed by the spec, but I haven't tried to look for the specifics there.

The problem arises when rote DOM manipulation reads through element.attributes, and on a new element, tries to set these same attributes. Element.setAttribute throws an InvalidCharacterError because as noted in https://dom.spec.whatwg.org/#dom-element-setattribute , 0 "does not match the Name production in XML", viz. https://www.w3.org/TR/xml/#NT-Name .

Scripts can currently work around this issue (in reasonably complete DOM implementations) by using element.attributes.setNamedItem(otherElement.attributes[i].cloneNode()), though this isn't very elegant.

I think the inconsistency here is unfortunate. I would argue for one of the following improvements:

  • parsing an HTML document should validate attributes the same way the DOM spec says to validate them (cf. https://dom.spec.whatwg.org/#validate and https://dom.spec.whatwg.org/#dom-element-setattribute ), or if that is too problematic for backwards compatibility reasons (ie where document authors apparently intend for the element to have an attribute e.g. with name "1" or "." or somesuch), that it should only do so where it is doing parsing for questionable markup such as the above.
  • setAttribute DOM API validation should be relaxed to the same standard that the HTML parsing uses; if not possible for backwards compatibility reasons, it should be relaxed for documents with text/html content types and/or HTML (rather than XHTML/XML-based) parsing models.
@gijsk gijsk changed the title Attribute DOM representation and parsing is confusing Attribute DOM representation and parsing is inconsistent Jan 7, 2019
@annevk
Copy link
Member

annevk commented Jan 7, 2019

See also whatwg/dom#449. To summarize, this problem is known, but resolving it requires a lot of careful compatibility testing that nobody seems to be willing to invest in.

@domenic
Copy link
Member

domenic commented Jan 7, 2019

Yep. Let's discuss over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants