Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect parsing of tag name #277

Closed
noway opened this issue May 20, 2024 · 0 comments
Closed

Incorrect parsing of tag name #277

noway opened this issue May 20, 2024 · 0 comments
Labels

Comments

@noway
Copy link

noway commented May 20, 2024

node-html-parser currently uses the following regex pattern to parse tag name:

https://github.com/taoqf/node-html-parser/blob/v6.1.14/src/nodes/html.ts#L924-L925

This is incorrect, since tag name can not only be for a custom element, but for any element. The correct part of the spec for parsing tag name is here: https://html.spec.whatwg.org/multipage/parsing.html#tag-name-state

Test case:

const parse = require('parse5').parse
const Parser = require('htmlparser2').Parser
const { parse: parseNhp } = require('node-html-parser')

const root2 = parse('<h@1>')
console.log('parse5:', root2.childNodes[0].childNodes[1].childNodes[0].nodeName)

const parser = new Parser({
  onopentag(name) {
    console.log('htmlparser2:', name)
  }
})
parser.write('<h@1>')
parser.end()

const root = parseNhp('<h@1>')
console.log('node-html-parser:', root.childNodes[0].rawTagName)

Output:

parse5: h@1
htmlparser2: h@1
node-html-parser:

HTML:

<h@1>

Chrome:
image

Firefox:
image

As you see above, h@1 tag name is correctly parsed by parse5, htmlparser2, Chrome and Firefox, but isn't parsed by node-html-parser.


In terms of the question of whether code containing h@1 is 'broken' or 'malformatted' HTML - it's not. Although h@1 is not permitted by any content models, it is permitted inside elements with 'nothing' content model.

The following code:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>test</title>
  </head>
  <body>
  <template>
    <h@1>Smile!</h@1>
  </template>
  </body>
</html>

passes HTML5 validator:
image

@taoqf taoqf added the bug label Jun 18, 2024
@taoqf taoqf closed this as completed in 432a3e7 Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants