From 3d07b093846104ec4ac0fb528524c111a8b293a8 Mon Sep 17 00:00:00 2001 From: Daniel Vogelheim Date: Thu, 27 May 2021 16:32:54 +0200 Subject: [PATCH 1/5] Draft. --- index.bs | 42 +++++++++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/index.bs b/index.bs index 1212901..045bdda 100644 --- a/index.bs +++ b/index.bs @@ -467,9 +467,6 @@ is used as the configuration object. }; -Note: Element names are expected to be ascii lowercase and those that don't - conform will be lowercased. - : allowElements :: The element allow list is a sequence of strings with elements that the sanitizer should retain in the input. @@ -563,6 +560,9 @@ A sanitizer's configuration can be queried using the // (For illustration purposes only. There are better ways of implementing // object equality in JavaScript.) JSON.stringify(Sanitizer.getDefaultConfiguration()) == JSON.stringify(new Sanitizer().getConfiguration()); // true + + // Element names are normalized. + new Sanitizer({allowElements: ["EM", "sPAn"]}).config().allowElements // ["em", "span"] ``` @@ -760,7 +760,25 @@ To handle funky elements on a given |element|, run these steps: 1. Remove the `formaction` attribute from |element|. -## The Effective Configuration ## {#configuration-algorithms} +
+To query the sanitizer config of a given sanitizer instance, +run these steps: + 1. Let |sanitizer| be the current Sanitizer. + 2. Let |config| be |sanitizer|'s [=configuration object=], or the + [=default configuration=] if no [=configuration object=] was given. + 3. Let |result| be a newly constructed SanitizerOptions dictionary. + 4. For any non-empty member of |config| whose key is declared in + SanitizerOptions, copy the value to |result|. + 5. All element names in |result| should beconverted to lower case. + Element names are contained in: + - The values in the [=element allow list=], [=element block list=], and + [=element drop list=]. + - The mapped values in the [=attribute allow list=] and + [=attribute drop list=]. + 6. Return |result|. +
+ +### The Effective Configuration ### {#configuration} A Sanitizer is potentially complex, so we will define a helper construct, the *effective configuration*. This is mostly a specification @@ -813,7 +831,8 @@ Before describing how an effective configuration is derived, we need a helper definition:
-The element kind of an |element| is one of `regular`, `unknown`, or `custom`. Let element kind be: +The element kind of an |element| is one of `regular`, `unknown`, +or `custom`. Let element kind be: - `custom`, if |element|'s tag name is a [=valid custom element name=], - `unknown`, if |element| is not in the [[HTML]] namespace or if |element|'s tag name denotes an unknown element — that is, if the @@ -840,7 +859,7 @@ given a [=configuration object=] |config|, run these steps: 1. If |element|'s [=element kind=] is `custom` and if |config|'s [=allow custom elements option=] is unset or set to anything other than `true`: Return `drop`. - 1. Let |name| be |element|'s tag name. + 1. Let |name| be the |element|'s tag name in lower case. 1. If |name| is in |config|'s [=element drop list=]: Return `drop`. 1. If |name| is in |config|'s [=element block list=]: Return `block`. 1. If |config| has a non-empty [=element allow list=] and |name| is not @@ -858,15 +877,16 @@ run these steps: 1. if |config|'s [=attribute drop list=] contains |attr|'s local name as key, and the associated value contains either |element|'s tag - name or the string `"*"`: Return `drop`. + name (with case-insensitve comparison) or the string `"*"`: Return `drop`. 1. If |config| has a non-empty [=attribute allow list=] and it does not - contain |attr|'s local name, or |attr|'s associated value - contains neither |element|'s tag name nor the string `"*"`: - Return `drop`. + contain |attr|'s local name, or |attr|'s associated value contains neither + |element|'s tag name (with case-insensitive comparison) nor the string + `"*"`: Return `drop`. 1. if |config| does not have a non-empty [=attribute allow list=] and [=default configuration=]'s [=attribute allow list=] does not contain |attr|'s local name, or |attr|'s associated value contains - neither |element|'s tag name nor the string `"*"`: Return `drop`. + neither |element|'s tag name (with case-insensitive comparison) nor the + string `"*"`: Return `drop`. 1. Return `keep`.
From 538be6b71aade80bca58e569e0c2e347b19cf541 Mon Sep 17 00:00:00 2001 From: Daniel Vogelheim Date: Thu, 27 May 2021 16:41:44 +0200 Subject: [PATCH 2/5] Minor fixes. --- index.bs | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/index.bs b/index.bs index 045bdda..fbab7bc 100644 --- a/index.bs +++ b/index.bs @@ -769,15 +769,14 @@ run these steps: 3. Let |result| be a newly constructed SanitizerOptions dictionary. 4. For any non-empty member of |config| whose key is declared in SanitizerOptions, copy the value to |result|. - 5. All element names in |result| should beconverted to lower case. - Element names are contained in: - - The values in the [=element allow list=], [=element block list=], and - [=element drop list=]. - - The mapped values in the [=attribute allow list=] and - [=attribute drop list=]. + 5. All element names in |result| should be converted to lower case. 6. Return |result|. +Note: The configuration object contains element names in the + [=element allow list=], [=element block list=], and [=element drop list=], and + in the mapped values in the [=attribute allow list=] and [=attribute drop list=]. + ### The Effective Configuration ### {#configuration} A Sanitizer is potentially complex, so we will define a helper From be7a637ebc06063470a6fe686dc69015c55dda5a Mon Sep 17 00:00:00 2001 From: Daniel Vogelheim Date: Mon, 31 May 2021 15:44:36 +0200 Subject: [PATCH 3/5] Review feedback. --- index.bs | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/index.bs b/index.bs index fbab7bc..4ff7e91 100644 --- a/index.bs +++ b/index.bs @@ -292,8 +292,8 @@ handle additional, application-specific use cases. * The new Sanitizer(config) constructor steps - are to create a new Sanitizer instance, and to retain a copy of |config| - as its [=configuration object=]. + are to run the steps [=create a sanitizer=] algorithm with |config| as + parameter. * The sanitize(input) method steps are to return the result of running the [=sanitize=] algorithm on |input|, @@ -604,6 +604,19 @@ Examples for attributes and attribute match lists: ## API Implementation ## {#api-algorithms} +
+To create a Sanitizer with an optional |config| parameter, run +these steps: + 1. Let |sanitizer| be a newly created Sanitizer instance. + 1. Create a copy of |config|. + 1. Convert all element names in |config|'s copy to ASCII lower case. + 1. Return |sanitizer|, with |config|'s copy as its [=configuration object=]. +
+ +Note: The configuration object contains element names in the + [=element allow list=], [=element block list=], and [=element drop list=], and + in the mapped values in the [=attribute allow list=] and [=attribute drop list=]. +
To sanitize a given |input| of type `Document or DocumentFragment` run these steps: @@ -764,19 +777,14 @@ To handle funky elements on a given |element|, run these steps: To query the sanitizer config of a given sanitizer instance, run these steps: 1. Let |sanitizer| be the current Sanitizer. - 2. Let |config| be |sanitizer|'s [=configuration object=], or the + 1. Let |config| be |sanitizer|'s [=configuration object=], or the [=default configuration=] if no [=configuration object=] was given. - 3. Let |result| be a newly constructed SanitizerOptions dictionary. - 4. For any non-empty member of |config| whose key is declared in + 1. Let |result| be a newly constructed SanitizerOptions dictionary. + 1. For any non-empty member of |config| whose key is declared in SanitizerOptions, copy the value to |result|. - 5. All element names in |result| should be converted to lower case. - 6. Return |result|. + 1. Return |result|.
-Note: The configuration object contains element names in the - [=element allow list=], [=element block list=], and [=element drop list=], and - in the mapped values in the [=attribute allow list=] and [=attribute drop list=]. - ### The Effective Configuration ### {#configuration} A Sanitizer is potentially complex, so we will define a helper From 5818b0d8c4ba3dccc760f71ce5dceb51087a9bfe Mon Sep 17 00:00:00 2001 From: Daniel Vogelheim Date: Mon, 5 Jul 2021 16:20:44 +0200 Subject: [PATCH 4/5] Change lowercase to normalize. --- index.bs | 69 +++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 18 deletions(-) diff --git a/index.bs b/index.bs index 4ff7e91..2c89021 100644 --- a/index.bs +++ b/index.bs @@ -292,7 +292,7 @@ handle additional, application-specific use cases. * The new Sanitizer(config) constructor steps - are to run the steps [=create a sanitizer=] algorithm with |config| as + are to run the [=create a sanitizer=] algorithm steps with |config| as parameter. * The sanitize(input) method steps are to return the result of running the [=sanitize=] @@ -497,6 +497,10 @@ Note: `allowElements` creates a sanitizer that defaults to dropping elements, elements. Using both types is possible, but is probably of little practical use. The same applies to `allowAttributes` and `dropAttributes`. +Note: Element names are normalized, following the rules of the HTML Parser. + This means elements are usually lowercase, except for a small-ish number + of mixed case element names in non-HTML namespaces (SVG, MathML). +
```js const sample = "Some text with tags."; @@ -571,8 +575,8 @@ A sanitizer's configuration can be queried using the An attribute match list is a map of attribute names to element names, where the special name "*" stands for all elements. A given |attribute| belonging to an |element| matches an [=attribute match list=], if the -attribute's local name is a key in the match list, and element's local name -or `"*"` are found in the attribute's value list. +attribute's [=Attr/local name=] is a key in the match list, and element's +[=Element/local name=] or `"*"` are found in the attribute's value list.
   typedef record<DOMString, sequence<DOMString>> AttributeMatchList;
@@ -609,7 +613,8 @@ To create a Sanitizer with an optional |config| parameter, run
 these steps:
   1. Let |sanitizer| be a newly created Sanitizer instance.
   1. Create a copy of |config|.
-  1. Convert all element names in |config|'s copy to ASCII lower case.
+  1. Normalize all element names in |config|'s copy by running the
+     [=normalize element name=] algorithm on each of them.
   1. Return |sanitizer|, with |config|'s copy as its [=configuration object=].
 
@@ -617,6 +622,28 @@ Note: The configuration object contains element names in the [=element allow list=], [=element block list=], and [=element drop list=], and in the mapped values in the [=attribute allow list=] and [=attribute drop list=]. +
+To normalize element name |name|, run these steps: + 1. Convert |name| to [=ASCII lowercase=]. + 1. Return |name|. + +
+This method will not work for SVG and/or MathML elements, which are not + currently supported. When they are, replace the steps above with: + + 1. Convert |name| to [=ASCII lowercase=]. + 1. Let |prefix| be the empty string. + 1. If |name| contains a ":" (U+003E), then split the string on it and + set |prefix| to the part before, and update |name| with the part after. + 1. If |prefix| is either "svg" or "math", then adjust the name as described + in the "any other start tag" branch of the + [The rules for parsing tokens in foreign content](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inforeign) + subchapter in the HTML parsing spec. + 1. Return |name|. +
+ +
+
To sanitize a given |input| of type `Document or DocumentFragment` run these steps: @@ -840,9 +867,9 @@ helper definition:
The element kind of an |element| is one of `regular`, `unknown`, or `custom`. Let element kind be: - - `custom`, if |element|'s tag name is a [=valid custom element name=], + - `custom`, if |element|'s [=Element/local name=] is a [=valid custom element name=], - `unknown`, if |element| is not in the [[HTML]] namespace or if |element|'s - tag name denotes an unknown element — that is, if the + [=Element/local name=] denotes an unknown element — that is, if the [=element interface=] the [[HTML]] specification assigns to it would be {{HTMLUnknownElement}}, - `regular`, otherwise. @@ -866,7 +893,7 @@ given a [=configuration object=] |config|, run these steps: 1. If |element|'s [=element kind=] is `custom` and if |config|'s [=allow custom elements option=] is unset or set to anything other than `true`: Return `drop`. - 1. Let |name| be the |element|'s tag name in lower case. + 1. Let |name| be the |element|'s [=Element/local name=]. 1. If |name| is in |config|'s [=element drop list=]: Return `drop`. 1. If |name| is in |config|'s [=element block list=]: Return `block`. 1. If |config| has a non-empty [=element allow list=] and |name| is not @@ -882,19 +909,24 @@ To determine the effective configuration for an attribute |attr|, attached to an element |element|, and given a [=configuration object=] |config|, run these steps: - 1. if |config|'s [=attribute drop list=] contains |attr|'s local - name as key, and the associated value contains either |element|'s tag - name (with case-insensitve comparison) or the string `"*"`: Return `drop`. + 1. if |config|'s [=attribute drop list=] contains |attr|'s [=Attr/local name=] + as key, and the associated value contains either |element|'s + ]=Element/local name=] or the string `"*"`: Return `drop`. 1. If |config| has a non-empty [=attribute allow list=] and it does not - contain |attr|'s local name, or |attr|'s associated value contains neither - |element|'s tag name (with case-insensitive comparison) nor the string - `"*"`: Return `drop`. + contain |attr|'s [=Attr/local name=], or + |attr|'s associated value contains neither + |element|'s [=Element/local name=] nor the string `"*"`: Return `drop`. 1. if |config| does not have a non-empty [=attribute allow list=] and [=default configuration=]'s [=attribute allow list=] does not contain - |attr|'s local name, or |attr|'s associated value contains - neither |element|'s tag name (with case-insensitive comparison) nor the - string `"*"`: Return `drop`. + |attr|'s [=Attr/local name=], or |attr|'s associated value contains + neither |element|'s [=Element/local name=] nor the string `"*"`: + Return `drop`. 1. Return `keep`. + +Note: The element names in the Sanitizer configuration are normalized according + to normalization step in the HTML Parser, just like elements' + [=Element/local names=] are. Thus, the comparison is effectively case + insensitive.
## Baseline and Defaults ## {#defaults} @@ -906,8 +938,9 @@ Issue: The sanitizer baseline and defaults need to be carefully vetted, and
To determine the baseline configuration for an element |element|, run these steps: - 1. if |element|'s [=element kind=] is `regular` and if |element|'s tag name - is not in the [=baseline element allow list=]: Return `drop`. + 1. if |element|'s [=element kind=] is `regular` and if |element|'s + [=Element/local name=] is not in the [=baseline element allow list=]: + Return `drop`. 1. Return `keep`.
From 213533a2ba899621eb7f0c5226315cb93539698c Mon Sep 17 00:00:00 2001 From: Daniel Vogelheim Date: Tue, 24 Aug 2021 11:42:46 +0200 Subject: [PATCH 5/5] Post-Rebase fixup. --- index.bs | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/index.bs b/index.bs index 2c89021..cd2dd24 100644 --- a/index.bs +++ b/index.bs @@ -800,18 +800,6 @@ To handle funky elements on a given |element|, run these steps: 1. Remove the `formaction` attribute from |element|.
-
-To query the sanitizer config of a given sanitizer instance, -run these steps: - 1. Let |sanitizer| be the current Sanitizer. - 1. Let |config| be |sanitizer|'s [=configuration object=], or the - [=default configuration=] if no [=configuration object=] was given. - 1. Let |result| be a newly constructed SanitizerOptions dictionary. - 1. For any non-empty member of |config| whose key is declared in - SanitizerOptions, copy the value to |result|. - 1. Return |result|. -
- ### The Effective Configuration ### {#configuration} A Sanitizer is potentially complex, so we will define a helper