Incorrect parsing of tag name #277

noway · 2024-05-20T05:05:06Z

node-html-parser currently uses the following regex pattern to parse tag name:

https://github.com/taoqf/node-html-parser/blob/v6.1.14/src/nodes/html.ts#L924-L925

This is incorrect, since tag name can not only be for a custom element, but for any element. The correct part of the spec for parsing tag name is here: https://html.spec.whatwg.org/multipage/parsing.html#tag-name-state

Test case:

const parse = require('parse5').parse
const Parser = require('htmlparser2').Parser
const { parse: parseNhp } = require('node-html-parser')

const root2 = parse('<h@1>')
console.log('parse5:', root2.childNodes[0].childNodes[1].childNodes[0].nodeName)

const parser = new Parser({
  onopentag(name) {
    console.log('htmlparser2:', name)
  }
})
parser.write('<h@1>')
parser.end()

const root = parseNhp('<h@1>')
console.log('node-html-parser:', root.childNodes[0].rawTagName)

Output:

parse5: h@1
htmlparser2: h@1
node-html-parser:

HTML:

<h@1>

Chrome:

Firefox:

As you see above, h@1 tag name is correctly parsed by parse5, htmlparser2, Chrome and Firefox, but isn't parsed by node-html-parser.

In terms of the question of whether code containing h@1 is 'broken' or 'malformatted' HTML - it's not. Although h@1 is not permitted by any content models, it is permitted inside elements with 'nothing' content model.

The following code:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>test</title>
  </head>
  <body>
  <template>
    <h@1>Smile!</h@1>
  </template>
  </body>
</html>

passes HTML5 validator:

The text was updated successfully, but these errors were encountered:

taoqf added the bug label Jun 18, 2024

taoqf closed this as completed in 432a3e7 Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect parsing of tag name #277

Incorrect parsing of tag name #277

noway commented May 20, 2024 •

edited

Loading

Incorrect parsing of tag name #277

Incorrect parsing of tag name #277

Comments

noway commented May 20, 2024 • edited Loading

noway commented May 20, 2024 •

edited

Loading