Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize element names. #97

Merged
merged 5 commits into from
Aug 24, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 70 additions & 22 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -292,8 +292,8 @@ handle additional, application-specific use cases.

* The <dfn constructor for=Sanitizer lt="Sanitizer(config)">
<code>new Sanitizer(<var>config</var>)</code></dfn> constructor steps
are to create a new Sanitizer instance, and to retain a copy of |config|
as its [=configuration object=].
are to run the [=create a sanitizer=] algorithm steps with |config| as
parameter.
* The <dfn method for=Sanitizer><code>sanitize(<var>input</var>)</code></dfn>
method steps are to return the result of running the [=sanitize=]
algorithm on |input|,
Expand Down Expand Up @@ -467,9 +467,6 @@ is used as the configuration object.
};
</pre>

Note: Element names are expected to be ascii lowercase and those that don't
conform will be lowercased.

: allowElements
:: The <dfn>element allow list</dfn> is a sequence of strings with
elements that the sanitizer should retain in the input.
Expand Down Expand Up @@ -500,6 +497,10 @@ Note: `allowElements` creates a sanitizer that defaults to dropping elements,
elements. Using both types is possible, but is probably of little practical
use. The same applies to `allowAttributes` and `dropAttributes`.

Note: Element names are normalized, following the rules of the HTML Parser.
This means elements are usually lowercase, except for a small-ish number
of mixed case element names in non-HTML namespaces (SVG, MathML).

<div class="example">
```js
const sample = "Some text <b><i>with</i></b> <blink>tags</blink>.";
Expand Down Expand Up @@ -563,6 +564,9 @@ A sanitizer's configuration can be queried using the
// (For illustration purposes only. There are better ways of implementing
// object equality in JavaScript.)
JSON.stringify(Sanitizer.getDefaultConfiguration()) == JSON.stringify(new Sanitizer().getConfiguration()); // true

// Element names are normalized.
new Sanitizer({allowElements: ["EM", "sPAn"]}).config().allowElements // ["em", "span"]
```
</div>

Expand All @@ -571,8 +575,8 @@ A sanitizer's configuration can be queried using the
An <dfn>attribute match list</dfn> is a map of attribute names to element names,
where the special name "*" stands for all elements. A given |attribute|
belonging to an |element| matches an [=attribute match list=], if the
attribute's local name is a key in the match list, and element's local name
or `"*"` are found in the attribute's value list.
attribute's [=Attr/local name=] is a key in the match list, and element's
[=Element/local name=] or `"*"` are found in the attribute's value list.

<pre class="idl">
typedef record&lt;DOMString, sequence&lt;DOMString>> AttributeMatchList;
Expand Down Expand Up @@ -604,6 +608,42 @@ Examples for attributes and attribute match lists:

## API Implementation ## {#api-algorithms}

<div algorithm="create a sanitizer">
To <dfn>create a Sanitizer</dfn> with an optional |config| parameter, run
these steps:
1. Let |sanitizer| be a newly created Sanitizer instance.
1. Create a copy of |config|.
1. Normalize all element names in |config|'s copy by running the
[=normalize element name=] algorithm on each of them.
1. Return |sanitizer|, with |config|'s copy as its [=configuration object=].
</div>

Note: The configuration object contains element names in the
[=element allow list=], [=element block list=], and [=element drop list=], and
in the mapped values in the [=attribute allow list=] and [=attribute drop list=].

<div algorithm="normalize element name">
To <dfn>normalize element name</dfn> |name|, run these steps:
1. Convert |name| to [=ASCII lowercase=].
1. Return |name|.

<div class="issue">
This method will not work for SVG and/or MathML elements, which are not
currently supported. When they are, replace the steps above with:

1. Convert |name| to [=ASCII lowercase=].
1. Let |prefix| be the empty string.
1. If |name| contains a ":" (U+003E), then split the string on it and
set |prefix| to the part before, and update |name| with the part after.
1. If |prefix| is either "svg" or "math", then adjust the name as described
in the "any other start tag" branch of the
[The rules for parsing tokens in foreign content](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inforeign)
subchapter in the HTML parsing spec.
1. Return |name|.
</div>

</div>

<div algorithm="sanitize">
To <dfn>sanitize</dfn> a given |input| of type `Document or DocumentFragment`
run these steps:
Expand Down Expand Up @@ -760,7 +800,7 @@ To <dfn>handle funky elements</dfn> on a given |element|, run these steps:
1. Remove the `formaction` attribute from |element|.
</div>

## The Effective Configuration ## {#configuration-algorithms}
### The Effective Configuration ### {#configuration}

A Sanitizer is potentially complex, so we will define a helper
construct, the *effective configuration*. This is mostly a specification
Expand Down Expand Up @@ -813,10 +853,11 @@ Before describing how an effective configuration is derived, we need a
helper definition:

<div algorithm="element kind">
The <dfn>element kind</dfn> of an |element| is one of `regular`, `unknown`, or `custom`. Let <var ignore>element kind</var> be:
- `custom`, if |element|'s tag name is a [=valid custom element name=],
The <dfn>element kind</dfn> of an |element| is one of `regular`, `unknown`,
or `custom`. Let <var ignore>element kind</var> be:
- `custom`, if |element|'s [=Element/local name=] is a [=valid custom element name=],
- `unknown`, if |element| is not in the [[HTML]] namespace or if |element|'s
tag name denotes an unknown element &mdash; that is, if the
[=Element/local name=] denotes an unknown element &mdash; that is, if the
[=element interface=] the [[HTML]] specification assigns to it would
be {{HTMLUnknownElement}},
- `regular`, otherwise.
Expand All @@ -840,7 +881,7 @@ given a [=configuration object=] |config|, run these steps:
1. If |element|'s [=element kind=] is `custom` and if |config|'s
[=allow custom elements option=] is unset or set to anything other
than `true`: Return `drop`.
1. Let |name| be |element|'s tag name.
1. Let |name| be the |element|'s [=Element/local name=].
1. If |name| is in |config|'s [=element drop list=]: Return `drop`.
1. If |name| is in |config|'s [=element block list=]: Return `block`.
1. If |config| has a non-empty [=element allow list=] and |name| is not
Expand All @@ -856,18 +897,24 @@ To <dfn>determine the effective configuration for an attribute</dfn> |attr|,
attached to an element |element|, and given a [=configuration object=] |config|,
run these steps:

1. if |config|'s [=attribute drop list=] contains |attr|'s local
name as key, and the associated value contains either |element|'s tag
name or the string `"*"`: Return `drop`.
1. if |config|'s [=attribute drop list=] contains |attr|'s [=Attr/local name=]
as key, and the associated value contains either |element|'s
]=Element/local name=] or the string `"*"`: Return `drop`.
1. If |config| has a non-empty [=attribute allow list=] and it does not
contain |attr|'s local name, or |attr|'s associated value
contains neither |element|'s tag name nor the string `"*"`:
Return `drop`.
contain |attr|'s [=Attr/local name=], or
|attr|'s associated value contains neither
|element|'s [=Element/local name=] nor the string `"*"`: Return `drop`.
1. if |config| does not have a non-empty [=attribute allow list=] and
[=default configuration=]'s [=attribute allow list=] does not contain
|attr|'s local name, or |attr|'s associated value contains
neither |element|'s tag name nor the string `"*"`: Return `drop`.
|attr|'s [=Attr/local name=], or |attr|'s associated value contains
neither |element|'s [=Element/local name=] nor the string `"*"`:
Return `drop`.
1. Return `keep`.

Note: The element names in the Sanitizer configuration are normalized according
to normalization step in the HTML Parser, just like elements'
[=Element/local names=] are. Thus, the comparison is effectively case
insensitive.
</div>

## Baseline and Defaults ## {#defaults}
Expand All @@ -879,8 +926,9 @@ Issue: The sanitizer baseline and defaults need to be carefully vetted, and
<div algorithm="determine the baseline configuration for an element">
To <dfn>determine the baseline configuration for an element</dfn>
|element|, run these steps:
1. if |element|'s [=element kind=] is `regular` and if |element|'s tag name
is not in the [=baseline element allow list=]: Return `drop`.
1. if |element|'s [=element kind=] is `regular` and if |element|'s
[=Element/local name=] is not in the [=baseline element allow list=]:
Return `drop`.
1. Return `keep`.
</div>

Expand Down