-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser failure on unclosed <head> inside unclosed <html> #75
Comments
i guess youre confusing this with in other words, youre looking for a fault-tolerant html parser as used by web browsers |
This isn't malformed HTML; it's listed explicitly in the spec:
The end tags for |
wow, never knew that. thanks for the links |
Small note, but this isn't 100% fixed; you can also implicitly close a <!DOCTYPE><html><head><body> The specific wording of "ASCII whitespace" and "comment" is used to detail the way that content is inferred to be in either the head or body if the end (or start!) tags are missing. Basically, there are two:
By these rules, you can explicitly omit the head and body altogether and it'll interpret what is what based upon where the tags are usually located, but you can also choose to simply omit the I included the
I don't think that tree-sitter needs to explicitly sort the tags into a head and body (it's fine with other elements inside Also to add a bit more context: the spec has a more specific explanation of the algorithm for parsing documents that goes over the way these two elements are parsed: https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inhtml |
Hello, I ran into this issue with difftastic and managed to trace it back to this parser. After narrowing down the HTML that was failing to parse, I managed this:
Essentially, instead of being labelled as an implicitly closed element inside an implicitly closed element, it's labelled as an error with two start tags.
This feels undesirable, considering how a missing
</head>
,</body>
, or</html>
will cause the entire document (or half of it) to be enclosed in an error node, which breaks the parsing of the individual parts.The text was updated successfully, but these errors were encountered: