Decode attribute content differently from text node content #255

mikesamuel · 2022-03-28T17:32:12Z

As described in issue #254 &para is a full complete character
reference when decoding text node content, but not when
decoding attribute content which causes problems for URL attribute
values like

/test?param1=foo&param2=bar

As shown via JS test code in that issue, a small set of
next characters prevent a character reference name match
from being considered complete.

This commit:

modifies the decode functions to take an extra parameter
boolean inAttribute, and modifies the Trie traversal
loops to not store a longest match so far based on that
parameter and some next character tests
modifies the HTML lexer to pass that attribute appropriately
for backwards compat, leaves the old APIs in place but @deprecated
adds unit tests for the decode functions
adds a unit test for the specific input from the issue

This change should make us more conformant with observed
browser behaviour so is not expected to cause compatibility
problems for existing users.

Fixes #254

As described in issue #254 `&para` is a full complete character reference when decoding text node content, but not when decoding attribute content which causes problems for URL attribute values like /test?param1=foo&param2=bar As shown via JS test code in that issue, a small set of next characters prevent a character reference name match from being considered complete. This commit: - modifies the decode functions to take an extra parameter `boolean inAttribute`, and modifies the Trie traversal loops to not store a longest match so far based on that parameter and some next character tests - modifies the HTML lexer to pass that attribute appropriately - for backwards compat, leaves the old APIs in place but `@deprecated` - adds unit tests for the decode functions - adds a unit test for the specific input from the issue This change should make us more conformant with observed browser behaviour so is not expected to cause compatibility problems for existing users. Fixes #254

mikesamuel merged commit 5372c74 into main Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decode attribute content differently from text node content #255

Decode attribute content differently from text node content #255

mikesamuel commented Mar 28, 2022

Decode attribute content differently from text node content #255

Decode attribute content differently from text node content #255

Conversation

mikesamuel commented Mar 28, 2022