-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/html: void element <link> has child nodes #10535
Comments
Similar crash for |
Similar crash for `" |
the same for area element |
A "correct" parse according to the HTML5 spec can produce a parse tree that is not well-formed. But what should we do about it in Render? My initial thought would be that if a void element has child nodes, we should just render it as if it weren't a void element. Of course that is garbage, but it is in keeping with the principle of garbage in, garbage out. |
If an input is parsed, then I would expect it to be serialized into its original form. The next HTML5-complaint parser in the chain should be able to parse it successfully again, right? |
That sounds good, but it is very far from the case with HTML5. HTML5 takes Postel's principle to the extreme: any random string of data can be parsed as an HTML5 document. (It probably won't validate, but it will parse.) There are places in the spec where parsers are allowed to return errors, but they also specify what the result should be if they don't. In all those cases, we chose to have the html package continue parsing instead of returning an error. Having the rendered output exactly the same as the input is actually quite rare, because of things like capitalization and attribute quoting. |
Well, at least it should be parsed by net/html parser again and maybe by some subset of other parsers.
|
I agree that returning an error from Render is probably not what we want to do in this case. And we should do what we reasonably can to make invalid trees render sensibly. (Actually I suspect that this tree is valid; does SVG allow children for those elements?) But we don't want to fill the Render function with special-case heuristics that attempt to reconstruct the markup underlying invalid parse trees. For your use case #2, it would be very desirable for Render to never return an error unless the destination Writer returns one. That would make its operation parallel to Parse, which accepts all possible inputs. @nigeltao, what do you think? Should we eliminate errors from Render, and just make it produce the least-wrong output practical? |
I think this is WAI. Render's doc comment already says, "Rendering is done on a 'best effort' basis". The HTML5 parsing algorithm (https://html.spec.whatwg.org/multipage/syntax.html#tree-construction) is an enormous nest of special cases. It prints at 130 pages. I don't think that it guarantees to be idempotent, or even self-consistent (e.g. "void elements" are separately listed in a separate document at http://www.w3.org/TR/html5/syntax.html#void-elements), and I can't see an obvious proof that it is either. Given that, the Rendering algorithm is naive, and doesn't promise to render everything you can parse. In any case, the error in the original post is indeed accurate: the void element has child nodes, and it shouldn't. You could argue that Render could return a special RenderingError, separate from I/O errors, that callers are free to ignore. But in general, the space of "semi-invalid documents" is ill-defined, and as I said earlier, it's not clear that the parser algorithm guarantees to produce a "valid document", so I don't think Render should never return errors (other than I/O errors). You could argue, then that the parser shouldn't have produced a tree that was like that, but the parser is what it is, specified by a 130-page spec that's already too complicated. It would be nice if HTML5 was based on sensible, consistent foundations, but it isn't. |
As I think about this more, I think the real issue is that we are applying the HTML void element list to SVG elements. I'm pretty sure that the input element that has children is an svg:input (though input doesn't mean anything that I know of in that namespace), not a regular HTML input. So maybe we should check the namespace before we check the void element list, since the concept of void elements doesn't apply to foreign content. |
The following program crashes:
Render must be able handle Parse output. Or otherwise Parse must not accept the input as valid.
On commit 6f62f426de90c0ed6a55207b51476115fcb17237.
The text was updated successfully, but these errors were encountered: