Skip to content

Commit

Permalink
Always remove <noscript> elements
Browse files Browse the repository at this point in the history
...even if `noscript` is in the allowlist.

A `<noscript>` element's content is parsed differently in browsers
depending on whether or not scripting is enabled. Since Nokogiri doesn't
support scripting, it always parses `<noscript>` elements as if
scripting is disabled. This results in edge cases where it's not
possible to reliably sanitize the contents of a `<noscript>` element
because Nokogiri can't fully replicate the parsing behavior of a
scripting-enabled browser. The safest thing to do is to simply remove
all `<noscript>` elements.

Fixes GHSA-fw3g-2h3j-qmm7
  • Loading branch information
rgrove committed Jan 27, 2023
1 parent b4ee521 commit ec14265
Show file tree
Hide file tree
Showing 5 changed files with 73 additions and 5 deletions.
27 changes: 27 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,33 @@

### Bug Fixes

* Sanitize now always removes `<noscript>` elements and their contents, even
when `noscript` is in the allowlist.

This fixes a sanitization bypass that could occur when `noscript` was allowed
by a custom allowlist. In this scenario, carefully crafted input could sneak
arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
scripting) attack.

Sanitize's default configs don't allow `<noscript>` elements and are not
vulnerable. This issue only affects users who are using a custom config that
adds `noscript` to the element allowlist.

The root cause of this issue is that HTML parsing rules treat the contents of
a `<noscript>` element differently depending on whether scripting is enabled
in the user agent. Nokogiri doesn't support scripting so it follows the
"scripting disabled" rules, but a web browser with scripting enabled will
follow the "scripting enabled" rules. This means that Sanitize can't reliably
make the contents of a `<noscript>` element safe for scripting enabled
browsers, so the safest thing to do is to remove the element and its contents
entirely.

See the following security advisory for additional details:
[GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)

Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
(@leeN) for reporting this issue.

* Fixed an edge case in which the contents of an "unescaped text" element (such
as `<noembed>` or `<xmp>`) were not properly escaped if that element was
allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
Expand Down
14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ properties, @ rules, and URL protocols in elements or attributes containing CSS.
Any HTML or CSS that you don't explicitly allow will be removed.

Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
exactly the same way modern browsers do, and [Crass][crass], which parses CSS
exactly the same way modern browsers do. As long as your allowlist config only
allows safe markup and CSS, even the most malformed or malicious input will be
transformed into safe output.
the same way modern browsers do, and [Crass][crass], which parses CSS the same
way modern browsers do. As long as your allowlist config only allows safe markup
and CSS, even the most malformed or malicious input will be transformed into
safe output.

[![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
[![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
Expand Down Expand Up @@ -427,6 +427,12 @@ elements not in this array will be removed.
>
> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
> **Note**
>
> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
>
> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
#### :parser_options (Hash)

[Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
Expand Down
10 changes: 10 additions & 0 deletions lib/sanitize/transformers/clean_element.rb
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,16 @@ def call(env)

node['content'] = node['content'].gsub(/;\s*charset\s*=.+\z/, ';charset=utf-8')
end

# A `<noscript>` element's content is parsed differently in browsers
# depending on whether or not scripting is enabled. Since Nokogiri doesn't
# support scripting, it always parses `<noscript>` elements as if scripting
# is disabled. This results in edge cases where it's not possible to
# reliably sanitize the contents of a `<noscript>` element because Nokogiri
# can't fully replicate the parsing behavior of a scripting-enabled browser.
# The safest thing to do is to simply remove all `<noscript>` elements.
when 'noscript'
node.unlink
end
end

Expand Down
7 changes: 7 additions & 0 deletions test/test_clean_element.rb
Original file line number Diff line number Diff line change
Expand Up @@ -541,5 +541,12 @@
)).must_equal "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\"></head><body>Howdy!</body></html>"
end

it 'always removes `<noscript>` elements even if `noscript` is in the allowlist' do
assert_equal(
'',
Sanitize.fragment('<noscript>foo</noscript>', elements: ['noscript'])
)
end

end
end
20 changes: 19 additions & 1 deletion test/test_malicious_html.rb
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,6 @@
unescaped_content_elements = %w[
noembed
noframes
noscript
plaintext
script
xmp
Expand All @@ -255,6 +254,7 @@
]

removed_elements = %w[
noscript
style
]

Expand Down Expand Up @@ -318,4 +318,22 @@
end
end
end

describe 'sanitization bypass by exploiting scripting-disabled <noscript> behavior' do
before do
@s = Sanitize.new(
Sanitize::Config.merge(
Sanitize::Config::RELAXED,
elements: Sanitize::Config::RELAXED[:elements] + ['noscript']
)
)
end

it 'is prevented by removing `<noscript>` elements regardless of the allowlist' do
assert_equal(
'',
@s.fragment(%[<noscript><div id='</noscript>&lt;img src=x onerror=alert(1)&gt; '>])
)
end
end
end

0 comments on commit ec14265

Please sign in to comment.