Skip to content

Commit

Permalink
Merge pull request #656 from gkumar9891/allow-tagged-html
Browse files Browse the repository at this point in the history
added option disallowedTagsMode: 'completelyDiscard'
  • Loading branch information
BoDonkey authored Mar 14, 2024
2 parents d2925db + 31aebae commit e410f6e
Show file tree
Hide file tree
Showing 4 changed files with 67 additions and 5 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

- Documentation update regarding minimum supported TypeScript version.

- Added disallowedTagsMode: 'completelyDiscard' option to remove the content also in html

## 2.12.1 (2024-02-22)

- Do not parse sourcemaps in `post-css`. This fixes a vulnerability in which information about the existence or non-existence of files on a server could be disclosed via properly crafted HTML input when the `style` attribute is allowed by the configuration. Thanks to the [Snyk Security team](https://snyk.io/) for the disclosure and to [Dylan Armstrong](https://dylan.is/) for the fix.
Expand Down
33 changes: 32 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,8 @@ allowedAttributes: {}

If you set `disallowedTagsMode` to `discard` (the default), disallowed tags are discarded. Any text content or subtags are still included, depending on whether the individual subtags are allowed.

If you set `disallowedTagsMode` to `completelyDiscard`, disallowed tags and any content they contain are discarded. Any subtags are still included, as long as those individual subtags are allowed.

If you set `disallowedTagsMode` to `escape`, the disallowed tags are escaped rather than discarded. Any text or subtags are handled normally.

If you set `disallowedTagsMode` to `recursiveEscape`, the disallowed tags are escaped rather than discarded, and the same treatment is applied to all subtags, whether otherwise allowed or not.
Expand Down Expand Up @@ -705,7 +707,36 @@ disallowedTagsMode: 'escape'

This will transform `<disallowed>content</disallowed>` to `&lt;disallowed&gt;content&lt;/disallowed&gt;`

Valid values are: `'discard'` (default), `'escape'` (escape the tag) and `'recursiveEscape'` (to escape the tag and all its content).
Valid values are: `'discard'` (default), `'completelyDiscard'` (remove disallowed tag's content), `'escape'` (escape the tag) and `'recursiveEscape'` (to escape the tag and all its content).

#### Discard disallowed but but the inner content of disallowed tags is kept.

If you set `disallowedTagsMode` to `discard`, disallowed tags are discarded but don't remove inner content of disallowed tags.

```js
disallowedTagsMode: 'discard'
```
This will transform `<disallowed>content</disallowed>` to `content`

#### Discard entire content of a disallowed tag

If you set `disallowedTagsMode` to `completelyDiscard`, disallowed tags and any content they contain are discarded. Any subtags are still included, as long as those individual subtags are allowed.

```js
disallowedTagsMode: 'completelyDiscard'
```

This will transform `<disallowed>content <allowed>content</allowed> </disallowed>` to `<allowed>content</allowed>`

#### Escape the disallowed tag and all its children even for allowed tags.

if you set `disallowedTagsMode` to `recursiveEscape`, disallowed tag and its children will be escaped even for allowed tags

```js
disallowedTagsMode: `recursiveEscape`
```

This will transform `<disallowed>hello<p>world</p></disallowed>` to `&lt;disallowed&gt;hello&lt;p&gt;world&lt;/p&gt;&lt;/disallowed&gt;`

### Ignore style attribute contents

Expand Down
10 changes: 6 additions & 4 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ function sanitizeHtml(html, options, _recursing) {
if (!tagAllowed(name) || (options.disallowedTagsMode === 'recursiveEscape' && !isEmptyObject(skipMap)) || (options.nestingLimit != null && depth >= options.nestingLimit)) {
skip = true;
skipMap[depth] = true;
if (options.disallowedTagsMode === 'discard') {
if (options.disallowedTagsMode === 'discard' || options.disallowedTagsMode === 'completelyDiscard') {
if (nonTextTagsArray.indexOf(name) !== -1) {
skipText = true;
skipTextDepth = 1;
Expand All @@ -272,7 +272,7 @@ function sanitizeHtml(html, options, _recursing) {
}
depth++;
if (skip) {
if (options.disallowedTagsMode === 'discard') {
if (options.disallowedTagsMode === 'discard' || options.disallowedTagsMode === 'completelyDiscard') {
// We want the contents but not this tag
return;
}
Expand Down Expand Up @@ -511,7 +511,9 @@ function sanitizeHtml(html, options, _recursing) {
text = lastFrame.innerText !== undefined ? lastFrame.innerText : text;
}

if (options.disallowedTagsMode === 'discard' && ((tag === 'script') || (tag === 'style'))) {
if (options.disallowedTagsMode === 'completelyDiscard' && !tagAllowed(tag)) {
text = '';
} else if ((options.disallowedTagsMode === 'discard' || options.disallowedTagsMode === 'completelyDiscard') && ((tag === 'script') || (tag === 'style'))) {
// htmlparser2 gives us these as-is. Escaping them ruins the content. Allowing
// script tags is, by definition, game over for XSS protection, so if that's
// your concern, don't allow them. The same is essentially true for style tags
Expand Down Expand Up @@ -559,7 +561,7 @@ function sanitizeHtml(html, options, _recursing) {
const skip = skipMap[depth];
if (skip) {
delete skipMap[depth];
if (options.disallowedTagsMode === 'discard') {
if (options.disallowedTagsMode === 'discard' || options.disallowedTagsMode === 'completelyDiscard') {
frame.updateParentNodeText();
return;
}
Expand Down
27 changes: 27 additions & 0 deletions test/test.js
Original file line number Diff line number Diff line change
Expand Up @@ -1667,5 +1667,32 @@ describe('sanitizeHtml', function() {
}
}), '<a style="background-image:url(&quot;/*# sourceMappingURL=../index.js */&quot;)"></a>');
});
it('should completely remove disallowed tags with nested content', () => {
const inputHtml = '<div>Some Text<p>Allowed content</p><script>var x = "Disallowed script";</script><span>More allowed content</span> Another Text</div>';
const expectedOutput = '<p>Allowed content</p><span>More allowed content</span>';
const sanitizedHtml = sanitizeHtml(inputHtml, {
allowedTags: [ 'p', 'span' ],
disallowedTagsMode: 'completelyDiscard'
});
assert.equal(sanitizedHtml, expectedOutput);
});
it('should remove top level tag\'s content', () => {
const inputHtml = 'Some Text<p>paragraph content</p> content';
const expectedOutput = '<p>paragraph content</p>';
const sanitizedHtml = sanitizeHtml(inputHtml, {
allowedTags: [ 'p' ],
disallowedTagsMode: 'completelyDiscard'
});
assert.equal(sanitizedHtml, expectedOutput);
});
it('should completely remove disallowed tag with unclosed tag', () => {
const inputHtml = '<div>Some Text<p>paragraph content</p>some text';
const expectedOutput = '<p>paragraph content</p>';
const sanitizedHtml = sanitizeHtml(inputHtml, {
allowedTags: [ 'p' ],
disallowedTagsMode: 'completelyDiscard'
});

assert.equal(sanitizedHtml, expectedOutput);
});
});

0 comments on commit e410f6e

Please sign in to comment.